Modelwire
Subscribe

Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion

Illustration accompanying: Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion

Researchers benchmarked cloud and open-source LLMs on system dynamics tasks, finding cloud models hit 77-89% accuracy on causal diagram extraction while the best local model (Kimi K2.5) matched mid-tier cloud performance. Local models struggled with error-fixing in interactive coaching scenarios, revealing a gap in long-context reasoning.

MentionsKimi K2.5 · CLD Leaderboard · Discussion Leaderboard

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion · Modelwire