Staying In Character: Perspective-Bounded Memory For Book-Based Role-Playing Agents

Researchers propose REVERIEMEM, a three-layer memory system that constrains LLM role-playing agents to character perspective, solving two persistent problems in narrative-driven AI: agents leaking knowledge outside their viewpoint and flattening distinct voices into generic profiles. The work introduces KBF-QA, a 4,386-question benchmark across eight novels to measure knowledge boundaries. This addresses a real friction point in character-driven applications where memory architecture directly shapes believability and immersion, relevant to anyone building interactive fiction, game NPCs, or dialogue systems that demand consistent persona.
Modelwire context
ExplainerThe three-layer memory system isn't just a constraint mechanism; it actively partitions what an agent can retrieve based on narrative position, preventing the common failure where a character suddenly knows plot points they haven't encountered. This is distinct from simple prompt-based persona injection because it operates at the retrieval level, not just the generation level.
This work sits alongside the multi-agent safety framework from MedGuards (released same day), which also uses compositional, specialized components to enforce constraints rather than relying on a single monolithic model. Both papers signal a shift toward architectural solutions for reliability. However, REVERIEMEM is largely disconnected from the broader agent reliability work in concurrent coverage. The uncertainty quantification benchmark (Argus, same date) addresses confidence calibration in vision-language agents, while this paper tackles knowledge leakage in text-based narrative agents. The problems are orthogonal: one is about knowing when to refuse; this is about knowing what to remember.
If the KBF-QA benchmark shows that standard fine-tuned LLMs score below 60% on knowledge boundary questions while REVERIEMEM-augmented agents exceed 85%, that confirms the memory architecture is doing real work. If performance collapses on out-of-distribution novels not in the training set, the approach may not generalize beyond the benchmark's eight books.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsREVERIEMEM · KBF-QA
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.