DiscoTrace: Representing and Comparing Answering Strategies of Humans and LLMs in Information-Seeking Question Answering

Illustration accompanying: DiscoTrace: Representing and Comparing Answering Strategies of Humans and LLMs in Information-Seeking Question Answering

DiscoTrace, a new framework, maps how humans and LLMs construct answers to information-seeking questions using discourse acts and rhetorical structure. Analysis of nine human communities shows diverse answering strategies, while LLMs lack rhetorical variety and systematically favor breadth over human-like selectivity.

Modelwire context

Explainer

The more pointed finding isn't just that LLMs lack variety — it's that they systematically favor breadth, covering more ground rather than making the selective, rhetorically purposeful choices that characterize expert human answers. That's a structural bias, not a capability gap, and it has real consequences for how we interpret fluency as a proxy for quality.

This connects most directly to the reliability concerns surfaced in 'Diagnosing LLM Judge Reliability' from the same week, which found that LLM evaluators show logical inconsistencies in pairwise comparisons roughly a third to two-thirds of the time. DiscoTrace adds a complementary angle: if models also produce rhetorically homogeneous outputs, then using LLMs to evaluate LLM-generated answers compounds the problem — judges and generators may share the same structural blind spots. The IG-Search paper's focus on rewarding effective information retrieval is adjacent, but that work targets search query quality rather than how retrieved information gets organized into a response, so the overlap is limited.

Watch whether DiscoTrace's discourse act taxonomy gets adopted in any upcoming instruction-tuning or RLHF dataset construction pipelines. If a major fine-tuning effort cites it as a filtering criterion within the next six months, the framework has moved from diagnostic to prescriptive.

Coverage we drew on

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsDiscoTrace · LLMs

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Research

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

arXiv cs.CL·5d ago

Research

From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning

arXiv cs.CL·5d ago

Research

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

arXiv cs.CL·5d ago

DiscoTrace: Representing and Comparing Answering Strategies of Humans and LLMs in Information-Seeking Question Answering

Modelwire context

Coverage we drew on

Related

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning