Microsoft's SkillOpt boosts GPT-5.5 by using nothing but a trained Markdown file

Microsoft and Chinese academic partners have unveiled SkillOpt, a training method that refines instruction documents as standalone Markdown files to measurably improve model performance. The technique lifted GPT-5.5 scores by 23 points on procedural reasoning tasks and proved portable across different models and agent frameworks, including Codex and Claude Code. This signals a shift toward treating prompt engineering as a trainable artifact rather than manual craft, potentially lowering the barrier for practitioners to systematically optimize agent behavior without retraining model weights.
Modelwire context
Skeptical readThe 23-point gain is reported on 'procedural reasoning tasks,' but the summary never names the specific benchmark suite, which makes the claim impossible to independently verify at this stage. The involvement of unnamed Chinese academic partners also raises questions about reproducibility and whether the technique has been peer-reviewed or is pre-publication.
This sits in a broader pattern of labs and researchers racing to extract performance from prompt-level artifacts rather than weight updates, a trend that has picked up pace across the industry in early 2026. The connection to recent Modelwire coverage is limited: the Gemini-SQL2 story from The Decoder on June 13 is about benchmark performance in a narrow domain (text-to-SQL), not prompt optimization methodology, so the overlap is thin. SkillOpt belongs more squarely in the emerging conversation around agent scaffolding and instruction tuning as a discipline, a space Modelwire has not yet covered in depth.
Watch whether the SkillOpt paper surfaces on arXiv with a named benchmark and ablation tables within the next four to six weeks. If it does not, the 23-point claim should be treated as preliminary marketing rather than reproducible science.
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsMicrosoft · GPT-5.5 · SkillOpt · Codex · Claude Code · The Decoder
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on the-decoder.com. If you’re a publisher and want a different summarization policy for your work, see our takedown page.