Research·arXiv cs.LG·15h ago

Estimating the expected output of wide random MLPs more efficiently than sampling

Researchers have developed a method to estimate neural network outputs at initialization using analytical techniques rather than Monte Carlo sampling, reducing computational cost for wide MLPs on Gaussian inputs. The approach uses cumulants and Hermite expansions to approximate activation distributions layer-by-layer, achieving target accuracy with substantially fewer FLOPs than traditional sampling. This work matters for practitioners optimizing initialization schemes and for theorists studying network behavior at scale, particularly when rare-event probabilities matter. The technique hints at broader possibilities for replacing empirical estimation with closed-form approximations in deep learning.

Modelwire context

Explainer

The key insight is that this method only applies to wide MLPs at initialization with Gaussian inputs, a narrower scope than the summary's framing suggests. The real novelty isn't replacing sampling entirely, but identifying a specific regime where closed-form cumulant propagation beats Monte Carlo on FLOPs per target accuracy.

This connects to the MIT scaling laws work from early May, which identified superposition as the mechanistic driver behind why larger models behave predictably. Where that paper explains why scaling works, this one offers a tool for understanding network behavior before training begins. Both aim to replace empirical guessing with analytical grounding. The approach also echoes the modularity-first strategy in HyCOP (the PDE operator paper), where structured approximations outperform black-box learned mappings. Here, structured Hermite expansions replace unstructured sampling.

If researchers publish follow-up work extending this beyond Gaussian inputs or demonstrating that initialization estimates from this method correlate with final trained model performance on downstream tasks, that confirms the practical value. If the method remains confined to theoretical initialization analysis without adoption in actual training pipelines by late 2026, it's a useful theoretical tool but not a practical infrastructure shift.

Coverage we drew on

MIT study explains why scaling language models works so reliably · The Decoder

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsMLP · Monte Carlo sampling · Hermite expansions · Gaussian inputs

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.