The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

Researchers have identified a counterintuitive failure mode in multi-agent LLM systems: expanding context windows systematically undermines cooperative behavior across diverse models and game settings. Through analysis of 378,000 reasoning traces, the team pinpoints the culprit as degraded forward-looking intent rather than increased distrust, and demonstrates that fine-tuning on forward-looking reasoning patterns can restore cooperation even in novel scenarios. This finding challenges the assumption that larger context is uniformly beneficial and suggests that scaling memory without corresponding alignment work may inadvertently erode the collaborative reasoning needed for multi-agent coordination.

Modelwire context

Explainer

The paper's most underreported detail is the mechanism: the problem isn't that agents become more suspicious of each other as they accumulate history, but that they lose the capacity to reason about future cooperative payoffs. That's a subtle but important distinction, because it means the fix isn't trust calibration or memory filtering, it's targeted fine-tuning on a specific reasoning pattern.

This connects directly to the AutoTTS work covered the same day ('LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling'), which treats expanded inference-time computation as straightforwardly beneficial. The memory curse finding complicates that picture: more context is a form of expanded computation at inference time, and this paper shows it can degrade performance on tasks requiring coordination. Together, the two papers suggest that scaling inference-side resources without alignment-aware design introduces failure modes that raw capability metrics won't catch.

The fine-tuning fix was demonstrated on novel scenarios, but watch whether the same LoRA-based forward-looking intervention holds when agents operate across heterogeneous model families rather than within a single model class. If it degrades under cross-model conditions, the fix is narrower than the paper implies.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLLM · LoRA · arXiv

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.