OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations

Researchers introduce OnePred, a framework addressing a fundamental limitation in conversational AI: systems today remain reactive, responding only after users submit queries. The work tackles next-query prediction by compressing dialogue history into an evolving intent trajectory rather than naively concatenating full context, solving a critical efficiency-accuracy tradeoff that scales poorly with conversation length. This shift toward proactive interaction represents a meaningful step in making LLM assistants anticipatory rather than purely responsive, with implications for how dialogue systems might evolve beyond turn-by-turn reactivity.

Modelwire context

Explainer

The paper doesn't just predict next queries; it reframes the efficiency problem by treating conversation history as a learnable compression task rather than a storage problem. Most systems either truncate context (losing information) or concatenate it all (hitting scaling walls). OnePred's intent trajectory approach suggests a middle path where what matters gets preserved and what doesn't gets discarded automatically.

This connects directly to the NLG evaluation piece from this week. As conversational AI moves from research into production, the field faces pressure to move beyond turn-by-turn reactivity toward systems that anticipate user needs. But anticipation requires both prediction capability (what OnePred tackles) and rigorous evaluation of whether those predictions actually help users or just create false positives. The NLG evaluation paper flagged that LLM-as-Judge metrics often miss safety and usability concerns; next-query prediction systems will face similar pressure to prove their forecasts improve real dialogue outcomes, not just benchmark scores.

If OnePred's intent compression method shows measurable latency gains on multi-turn benchmarks (SQuAD-style or conversational QA) without accuracy loss compared to full-context baselines, that validates the approach. If accuracy only holds on short conversations (under 10 turns) but degrades on longer ones, the compression is losing signal and the method remains a niche optimization rather than a general solution.

Coverage we drew on

NLG Evaluation: Past, Present, Future · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsOnePred · LLM

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.