Research Tools & Code·arXiv cs.CL·May 6

Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset

Researchers have released an eye-tracking dataset capturing how non-native English speakers process idiomatic expressions across proficiency levels, revealing measurable cognitive load differences between literal and figurative interpretation pathways. The work validates that consumer-grade 60 Hz eye-tracking hardware can reliably detect reading-level cognitive events, opening a practical avenue for linguists and NLP researchers to ground language model training and evaluation in human processing data. This bridges psycholinguistics and AI by providing empirical evidence of the cognitive friction that current models may replicate or fail to capture when handling figurative language.

Modelwire context

Explainer

The dataset's real contribution isn't just measuring cognitive load in idiom processing, but validating that cheap hardware (60 Hz Tobii) can reliably capture the fine-grained reading events that linguists and ML researchers need to ground language model evaluation in actual human processing patterns rather than task accuracy alone.

This connects directly to the interpretability work from early May, particularly the encoding probe paper that asked what language models actually encode versus what we assume they do. Where that work reconstructed model internals from linguistic features, this dataset provides the human ground truth for what 'correct' processing of figurative language looks like cognitively. It also echoes the memory and procedural execution papers: models fail not just on reasoning but on tracking the cognitive steps humans take. Having empirical eye-tracking data on how proficiency levels affect idiom interpretation gives researchers a measurable target for what training objectives should optimize toward.

If papers citing this dataset appear by Q4 2026 showing that language models trained on eye-tracking-derived loss functions outperform standard supervised baselines on idiom comprehension benchmarks, that confirms the data has real training value. If adoption stays limited to academic psycholinguistics labs without NLP uptake, the bridge between the two fields remains theoretical.

Coverage we drew on

Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsTobii Pro Spark · CEFR · Portuguese L1 speakers

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.