Research Models & Releases·arXiv cs.CL·May 28

Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection

Loong introduces a reinforcement-learning-driven translation agent that mimics human reasoning to navigate the core LLM constraint: context windows. Rather than naively stuffing all available history into prompts, the system maintains a structured memory of summaries, examples, and entities, then learns which pieces matter for each translation decision. This addresses a persistent gap in document-level work where global coherence clashes with token limits. The adaptive context selection approach signals a broader shift toward agents that reason about their own information needs instead of relying on static retrieval or attention mechanisms.

Modelwire context

Explainer

The key detail the summary underplays is the reinforcement learning framing: Loong is not just a smarter retrieval system but an agent trained to evaluate its own information needs as a decision problem, treating context selection as a policy rather than a heuristic. That distinction matters because it means the selection behavior can improve with feedback rather than being frozen at design time.

This sits at the intersection of two threads running through recent coverage. The paper on 'Locally Coherent, Globally Incoherent' multi-agent systems formalized exactly the failure mode Loong is trying to solve at the translation level: local decisions that look valid but compound into global incoherence. Loong's structured memory of summaries and entities is essentially a domain-specific answer to that coherence gap. Separately, 'On Language Generation in the Limit with Bounded Memory' established theoretical bounds on what generators can learn when context is discarded, which gives Loong's selective retention approach a useful theoretical frame even if the two papers don't cite each other.

The real test is whether Loong's RL-trained selection policy generalizes across language pairs and document genres beyond those in its training distribution. If a follow-up evaluation on low-resource or morphologically complex language pairs shows degraded coherence scores relative to high-resource baselines, that would suggest the policy is fitting to data characteristics rather than learning a transferable reasoning strategy.

Coverage we drew on

Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLoong · Large Language Models · Reinforcement Learning

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.