UsefulBench: Towards Decision-Useful Information as a Target for Information Retrieval

Researchers introduce UsefulBench, a dataset that distinguishes between textual relevance and practical usefulness for information retrieval tasks. The work reveals that similarity-based IR systems optimize for relevance while LLM-based approaches better capture usefulness, challenging conventional retrieval metrics.
Modelwire context
ExplainerThe deeper provocation here is that most IR benchmarks have been measuring the wrong thing entirely. Relevance, the traditional target, asks whether a document relates to a query. Usefulness asks whether retrieving it actually helps someone make a decision or take action, and those two properties diverge more than the field has acknowledged.
This connects directly to the tension surfaced in the DiscoTrace paper from April 16, which found that LLMs systematically favor breadth over human-like selectivity when answering information-seeking questions. Both papers are pointing at the same gap: systems optimized on aggregate similarity metrics miss the judgment calls that make information actionable. The finding that LLM-based retrieval better captures usefulness also rhymes with IG-Search's argument, covered the same day, that step-level information gain is a more meaningful reward signal than trajectory-level relevance. Together these papers suggest a quiet but significant methodological shift is underway in how the field thinks about retrieval quality.
Watch whether established IR benchmarks like BEIR or TREC adopt UsefulBench-style usefulness annotations in their next evaluation rounds. If they do, the relevance-versus-usefulness distinction moves from a research curiosity into standard practice; if they don't, this remains an interesting critique without structural impact.
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsUsefulBench · LLM
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.