Where does output diversity collapse in post-training?

Researchers traced output diversity collapse across three post-training lineages of Olmo 3, finding that semantic diversity loss correlates with training data composition rather than post-training method alone. The finding matters for inference-time scaling and creative tasks that depend on varied model outputs.

Modelwire context

Explainer

The headline result is not that post-training kills diversity (that was already assumed) but that the method matters less than what data went in before it. That reframes the problem: teams optimizing RLHF or DPO recipes may be pulling the wrong lever entirely.

This connects most directly to the RISE paper from arXiv on April 17th, which proposed tracing which training data drives LLM behavior at the output layer. RISE is trying to attribute influence; this Olmo 3 study is showing that influence is already baked in before post-training begins. Together they point toward a coming focus on pre-training data curation as the real control surface for output quality, not just fine-tuning choices. The STOP paper from the same day, which prunes low-value reasoning paths at inference time, also becomes relevant here: if semantic diversity is already collapsed at the model level, inference-time path pruning has less variance to work with, which could quietly cap the ceiling on parallel reasoning gains.

Watch whether the Olmo 3 team releases per-dataset diversity breakdowns that let practitioners identify which data categories are the primary collapse drivers. If they do, that would give the RISE-style attribution methods a concrete downstream use case to validate against.

Coverage we drew on

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsOlmo 3 · Think · Instruct · RL-Zero

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.