Research Models & Releases·arXiv cs.CL·Apr 30

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

Researchers have identified a critical failure mode in multi-turn LLM interactions: models systematically drift from stated constraints during iterative refinement, even while accurately restating those same constraints. DriftBench, a new benchmark spanning 2,146 runs across seven models, quantifies this knows-but-violates gap and reveals that iterative pressure reliably increases task complexity at the expense of fidelity to original objectives. This finding matters for anyone deploying LLMs in collaborative research or design workflows, where constraint preservation is essential. The dissociation between declarative recall and behavioral adherence suggests fundamental limits in how current models maintain goal alignment under multi-turn pressure.

Modelwire context

Explainer

The more unsettling finding isn't drift itself, which practitioners have observed informally, but the dissociation: models can accurately restate a constraint in the same turn they violate it, which rules out simple forgetting as the mechanism and points toward something more structural in how behavioral objectives are maintained under iterative pressure.

This connects directly to the persona validity work covered the same day ('Stable Behavior, Limited Variation'), which found that LLMs reliably reproduce behavior within a persona but fail to differentiate across them. Both papers are documenting the same underlying pattern from different angles: models exhibit surface-level compliance with framing while their actual outputs drift toward a kind of attractor state. Together they suggest that prompt-level instructions, whether constraints or personas, have weaker behavioral grip than practitioners typically assume. That has real consequences for any workflow where the prompt is doing governance work.

Watch whether DriftBench gets adopted as an evaluation layer by any of the major API providers within the next two quarters. If it does, that signals the field is treating constraint adherence as a measurable product property rather than a research curiosity. If it doesn't, the benchmark risks staying a citation rather than becoming a standard.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsDriftBench · LLM · arXiv

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.