On the Proper Treatment of Units in Surprisal Theory

A new framework clarifies how surprisal theory, a foundational model of human language comprehension, maps onto modern language models. The core problem: researchers measure human reading effort against linguistic units like words, but LLMs assign probability to fixed token vocabularies that rarely align. This mismatch has forced ad hoc workarounds that conflate unit definition with evaluation scope. By disentangling these choices, the work enables more rigorous comparison between human cognition and neural language models, directly improving how surprisal-based metrics validate LLM behavior against psycholinguistic data.
Modelwire context
ExplainerThe paper's practical contribution isn't just theoretical tidiness: by formalizing which choices belong to unit definition versus evaluation scope, it gives researchers a principled basis for comparing results across studies that previously couldn't be directly compared, because they were quietly making different assumptions without labeling them as such.
This work sits largely disconnected from the governance, investment, and product stories dominating recent Modelwire coverage, including the OpenAI litigation and the ChatGPT Images 2.0 regional rollout from early May 2026. It belongs instead to a quieter but consequential thread: the question of whether LLM behavior actually maps onto human cognition, or whether the metrics researchers use to claim that alignment are methodologically compromised. As LLMs get deployed in reading-assistance, education, and accessibility tools, the validity of surprisal-based evaluation becomes less academic. Sloppy measurement here doesn't just affect papers; it affects which model behaviors get treated as cognitively plausible and which get filtered out during development.
Watch whether psycholinguistics benchmarks like the Natural Reading Corpus studies or Dundee Eye-Tracking data begin citing this framework within the next 12 months as a standard preprocessing step. Adoption there would confirm the field accepted the proposed disentanglement as a genuine fix rather than a marginal refinement.
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsSurprisal Theory · Language Models · Pretrained Language Models
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.