Our evaluation of OpenAI's GPT-5.5 cyber capabilities

The UK's AI Security Institute has completed a formal evaluation of GPT-5.5's ability to identify security vulnerabilities, finding it matches Claude Mythos in capability but with a critical advantage: immediate public availability. This benchmark matters because it signals that frontier models are now reaching parity on high-stakes cybersecurity tasks, raising both the bar for responsible deployment and the urgency around access controls for dual-use AI capabilities. The comparison to Mythos positions GPT-5.5 as the more accessible threat vector for security teams to monitor.
Modelwire context
Analyst takeThe evaluation's most consequential finding isn't parity with Claude Mythos on capability scores, it's that GPT-5.5's public availability means the dual-use risk calculus is already live, while Mythos remains gated. Capability equality with asymmetric access is a materially different threat posture than a simple benchmark tie.
This lands directly against the Anthropic valuation story from April 30, where investor appetite north of $900 billion is explicitly tied to confidence in Claude's competitive positioning against OpenAI. A formal government evaluation finding GPT-5.5 at parity with Mythos on cybersecurity tasks complicates that narrative: if the capability gap has closed on one of the highest-stakes dimensions, Anthropic's differentiation increasingly rests on access controls and safety process rather than raw model performance. That's a defensible moat, but a narrower one than investors may be pricing in.
Watch whether Anthropic responds by accelerating Mythos's public release timeline or by leaning harder into restricted access as a selling point for enterprise and government contracts. If Mythos remains gated six months after this evaluation publishes, that tells you Anthropic has made a deliberate strategic choice to cede the accessibility argument entirely.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsOpenAI · GPT-5.5 · Claude Mythos · UK AI Security Institute · Simon Willison
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on simonwillison.net. If you’re a publisher and want a different summarization policy for your work, see our takedown page.