Modelwire
Subscribe

GPT-5.5 tops benchmarks but still hallucinates frequently at a 20 percent higher API cost

Illustration accompanying: GPT-5.5 tops benchmarks but still hallucinates frequently at a 20 percent higher API cost

OpenAI's GPT-5.5 reclaims top benchmark performance but costs 20 percent more per API call and continues to produce hallucinations at elevated rates, raising questions about whether capability gains justify the pricing increase for production users.

Modelwire context

Skeptical read

The 20 percent cost increase lands on top of a known reliability problem, not despite one. Paying more for a model that hallucinates at elevated rates isn't a straightforward trade-off for production teams; it's a regression in the cost-per-reliable-output metric that benchmarks don't capture.

This release is directly entangled with the Codex consolidation story we covered from The Decoder on April 26, where OpenAI folded its dedicated coding model into GPT-5.5 and claimed improved agentic coding performance with reduced token consumption. That framing made GPT-5.5 sound like an efficiency win. The hallucination data complicates that narrative considerably: if the model is less reliable on factual outputs, the token-efficiency gains for coding agents may not translate to the broader production use cases OpenAI is pitching. The two stories together suggest a model that is being positioned as a consolidation win while carrying reliability debt that neither announcement leads with.

Watch whether enterprise customers on existing GPT-5 contracts report renegotiating or delaying upgrades over the next 60 days. If adoption among high-volume API users stalls despite the benchmark gains, that's a signal the hallucination rate is the real ceiling here, not pricing.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsOpenAI · GPT-5.5

Modelwire summarizes — we don’t republish. The full article lives on the-decoder.com. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

GPT-5.5 tops benchmarks but still hallucinates frequently at a 20 percent higher API cost · Modelwire