Research·arXiv cs.CL·6d ago

Pretraining Exposure Explains Popularity Judgments in Large Language Models

Researchers using OLMo and its open Dolma corpus have conducted the first direct measurement of how pretraining data exposure shapes LLM popularity bias, analyzing 7.4 trillion tokens across 2,000 entities. The work separates statistical artifact from genuine world knowledge, revealing that what appears as popularity preference may largely reflect corpus composition rather than learned real-world rankings. This matters for practitioners building systems where entity ranking affects downstream applications, and for researchers interpreting model behavior as evidence of learned priors versus memorized training distributions.

Modelwire context

Explainer

The critical contribution here is not just that popularity bias exists in LLMs, which has been assumed for years, but that OLMo's open training corpus finally makes it measurable directly rather than inferred. Prior work could only correlate model outputs with external popularity proxies; this study closes the loop by counting actual token exposure.

This connects most directly to the Q-DAPS work covered earlier today ('Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring'), which also grapples with separating surface-level signals from genuine model reasoning. Both papers are, at root, asking the same question: when a model behaves a certain way, is that behavior evidence of learned understanding or an artifact of training data composition? The ORCE paper on confidence calibration ('ORCE: Order-Aware Alignment of Verbalized Confidence') adds another angle, since miscalibrated confidence in popular entities could compound the bias this paper documents. Together, these three papers from the same day reflect a broader push toward diagnosing what LLM outputs actually represent before trusting them in production.

The real test is whether this methodology replicates on a closed-corpus model, even partially, using indirect frequency proxies. If researchers can approximate the same exposure-bias correlation without direct corpus access, the finding becomes actionable for practitioners who cannot audit their model's training data.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsOLMo · Dolma · Wikipedia

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.