Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall

A new retrieval-first memory architecture challenges the dominant extraction-at-ingestion paradigm by preserving raw event logs and deferring filtering to query time. True Memory, deployable as a single SQLite file without external infrastructure, substantially outperforms existing agent memory systems (Mem0, Supermemory, Zep) on multi-session conversation benchmarks, reaching 93% accuracy on LoCoMo. The shift from schema-centric storage to pipeline-centric retrieval addresses a fundamental limitation in current agentic systems: information discarded before a question is asked cannot be recovered later. This work signals growing recognition that agent memory design requires rethinking beyond vector databases.

Modelwire context

Explainer

The deeper provocation here is philosophical: most current agent memory systems are lossy by design, because they make irreversible filtering decisions at ingestion time, before any query exists to guide what matters. True Memory's SQLite-only deployment model also directly challenges the assumption that production-grade agent memory requires vector database infrastructure.

This connects directly to the MemCoE paper covered May 1st, which approached the same problem from the opposite direction: rather than deferring filtering, it proposed learning what to store through contrastive and RL-based optimization. The two papers together expose a genuine architectural fork in agent memory design, one that bets on learned compression, the other on deferred retrieval. The H-RAG work from the same week is also relevant, since hierarchical parent-child retrieval similarly preserves fuller document context to avoid chunking-induced information loss at query time. The pattern across recent coverage is consistent: the field is converging on the view that premature summarization is a liability, not a feature.

Watch whether Mem0 or Zep respond with retrieval-first variants within the next two quarters. If neither adjusts their ingestion architecture after a public benchmark loss of this magnitude, it likely signals that their commercial deployments depend on latency or cost constraints that raw-log approaches cannot yet meet.

Coverage we drew on

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsTrue Memory · Mem0 · Supermemory · Zep · EverMemOS · LoCoMo

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.