Research Models & Releases·arXiv cs.CL·Apr 20

PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking

Researchers propose PDDL-Mind, a neuro-symbolic framework that grounds LLM theory-of-mind reasoning in explicit state representations using Planning Domain Definition Language. The approach decouples world state tracking from belief inference, addressing failures on benchmarks like MMToM-QA by replacing implicit reasoning with logically consistent symbolic states.

Modelwire context

Explainer

The core insight isn't that LLMs are bad at social reasoning — it's that they conflate two distinct problems: tracking what is objectively true in the world versus modeling what a specific agent believes to be true. PDDL-Mind treats these as separate computational steps, which is what makes the symbolic grounding useful rather than decorative.

This connects directly to the recursive instability finding in 'Generalization in LLM Problem Solving: The Case of the Shortest Path' (arXiv, mid-April), where models failed at longer planning horizons precisely because implicit state tracking degraded with depth. PDDL-Mind is essentially the same diagnosis applied to social cognition: when reasoning chains get long, unstructured token prediction accumulates errors that explicit state representations would prevent. The broader thread running through recent coverage is that pure neural approaches keep hitting ceilings on tasks requiring logical consistency across steps, and symbolic scaffolding keeps re-emerging as the practical patch.

The real test is whether PDDL-Mind's gains on MMToM-QA hold on MuMA scenarios involving more than two agents with conflicting beliefs, since multi-agent state explosion is where PDDL representations historically become unwieldy. If the authors release ablations on that condition, it will clarify whether the framework scales or just tidies up the easy cases.

Coverage we drew on

Generalization in LLM Problem Solving: The Case of the Shortest Path · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsPDDL-Mind · MMToM-QA · MuMA

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.