Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Researchers have identified a phenomenon where language models spontaneously forget learned rules mid-training despite continued evidence in the data, termed natural ungrokking. The survival of learned behaviors depends not on model capacity or loss curves, but on a single corpus statistic: how frequently the training distribution reinforces each rule. This finding reshapes understanding of what determines which capabilities persist through pretraining, suggesting that data composition, not just scale or architecture, fundamentally governs which learned patterns become stable features versus ephemeral artifacts.

Modelwire context

Explainer

The finding inverts a common assumption in pretraining folklore: that more data or more parameters reliably consolidates learned behaviors. What actually matters, according to this work, is the reinforcement frequency of a rule within the distribution, not whether the model has seen enough examples in aggregate.

This connects directly to the self-distillation paper covered the same day ('On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity'), which found that optimizing for single-attempt accuracy systematically narrows the behaviors a model will produce. Both papers point toward the same uncomfortable conclusion: training dynamics can quietly erase capabilities without any visible signal in aggregate loss curves. Together they suggest that practitioners relying on benchmark performance to confirm what a model has learned may be measuring a survivor population, not the full set of rules the model once held. The model forensics paper from the same batch is also relevant here, since distinguishing 'the model forgot this rule' from 'the model never consolidated it' is exactly the kind of root-cause question that forensic evaluation frameworks need to address.

Watch whether pretraining teams at major labs begin publishing corpus composition audits that track per-rule reinforcement frequency. If that practice emerges within the next year, it signals this finding has crossed from academic curiosity into operational concern.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsarXiv

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.