Modelwire
Subscribe

Even the best AI models lose about half their performance when charts get complicated, new benchmark finds

Illustration accompanying: Even the best AI models lose about half their performance when charts get complicated, new benchmark finds

The RealChart2Code benchmark tested 14 leading AI models on complex, real-world chart visualizations and found that even top proprietary models saw performance drop by roughly 50% compared to simpler chart tasks, revealing a significant capability gap.

Modelwire context

Explainer

The benchmark specifically tests whether models can reconstruct real-world chart visualizations as executable code, not just describe or caption them. That distinction matters: it probes structured reasoning and visual parsing simultaneously, which is a harder and more practically relevant task than most multimodal evals currently in circulation.

The Stanford 2026 AI Index, covered here via MIT Technology Review on April 13, argued that conflicting narratives about AI capabilities persist partly because benchmark design rarely reflects real-world complexity. RealChart2Code is a direct illustration of that problem: models that score well on cleaner, curated tasks fall apart when the inputs look like something pulled from an actual financial report or research paper. That gap between lab performance and deployment reality is exactly what the Index flagged as underappreciated. The QuantCode-Bench paper from April 16 is also relevant here, since generating executable trading strategies from financial data shares the same failure mode: models struggle when domain-specific visual or structural complexity is introduced.

Watch whether any of the 14 tested models release targeted fine-tuning runs on complex chart data within the next two quarters. If top proprietary models close more than 20 percentage points of that gap without a corresponding drop on simpler chart tasks, the benchmark is doing its job as a training signal rather than just a diagnostic.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsRealChart2Code

Modelwire summarizes — we don’t republish. The full article lives on the-decoder.com. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Even the best AI models lose about half their performance when charts get complicated, new benchmark finds · Modelwire