Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action

Researchers are expanding Theory of Mind evaluation beyond passive Q&A to test whether LLMs can manipulate belief states through planned actions rather than dialogue. The NCP-ExploreToM framework reveals a critical gap in current benchmarking: as models become increasingly autonomous agents, their capacity to induce specific worldviews through environmental intervention poses both deployment opportunities and genuine manipulation risks. This work matters because it surfaces a capability frontier that existing evals miss, forcing the field to reckon with agentic persuasion as a measurable, trainable property separate from conversational ability.
Modelwire context
ExplainerThe paper's sharpest contribution is not the benchmark itself but the conceptual split it forces: a model can score well on conversational ToM tasks while being entirely untested on whether it can engineer belief changes by manipulating the environment around a person, without ever speaking to them. That distinction has been invisible in prior evals.
This connects directly to the MECoBench work covered the same day, which exposed how multiagent embodied systems create coordination and communication dynamics that single-model benchmarks miss entirely. NCP-ExploreToM is probing a parallel blind spot: once models act in environments rather than just respond in dialogues, the safety-relevant capabilities shift in ways current evals cannot see. The surrogate fidelity study from the same period adds a compounding concern, since if open models diverge internally from closed ones even when predictions match, then auditing agentic persuasion capacity in proprietary systems through proxy models becomes even less reliable than it already looks.
Watch whether any major safety team (Anthropic, DeepMind, or OpenAI) incorporates a non-conversational planning ToM task into their next published eval suite. Adoption there within six months would signal the field accepts agentic persuasion as a first-class safety property rather than an academic edge case.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsLarge Language Models · Theory of Mind · NCP-ExploreToM · Non-Conversational Planning ToM
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.