Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

Researchers have developed Logit-Contribution Scoring (LOCOS), a new method for identifying attention heads that perform semantic synthesis rather than literal token copying in long-context LLM inference. Existing interpretability tools miss these heads because they measure where models read information, not what they compute through their output-value circuits. LOCOS scores heads by projecting their outputs onto answer-token directions, enabling more precise mechanistic understanding of how models generate answers from context. This matters for practitioners building retrieval-augmented systems and for researchers mapping the internal operations that enable long-context reasoning, a critical capability as context windows expand.
Modelwire context
ExplainerLOCOS doesn't just find attention heads; it identifies which ones perform computation (semantic synthesis) versus which ones merely route tokens. The key insight is that standard interpretability tools measure input attention patterns but miss what happens in the output-value circuit, creating a blind spot for long-context reasoning.
This connects directly to the mechanistic understanding framework covered in 'Understanding Large Language Models' from early July. That survey synthesized how transformers achieve generalist performance through attention-driven scaling and emergent reasoning. LOCOS provides a concrete tool for that agenda: it makes one specific internal operation (semantic synthesis in retrieval contexts) legible in a way prior methods couldn't. It's also relevant to the evidence-grounding work in FinKG-News, since both papers assume that understanding what models compute (not just what they attend to) is essential for high-stakes applications where hallucination matters.
If LOCOS-identified heads show consistent semantic patterns across different model scales and architectures (GPT-4 class, Llama 3.1, Claude variants), that validates the method as a general interpretability primitive. If practitioners building RAG systems report that LOCOS-guided head pruning reduces hallucination rates compared to baseline retrieval-only approaches, the tool has crossed from research artifact to practical utility.
Coverage we drew on
- Understanding Large Language Models · arXiv cs.CL
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsLOCOS · LLM · attention heads · output-value circuits
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.