From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution

Researchers tested two approaches for encoding reusable experience in AI systems across 4,590 code-solving trials. A compact "Gene" representation outperformed documentation-heavy "Skill" packages, proving more robust to structural changes and effective as a substrate for test-time evolution.

Modelwire context

Explainer

The paper's core provocation is that richer documentation actually hurts reusability: smaller, more abstract experience representations survive structural code changes better than verbose skill packages, which is the opposite of how most agent memory systems are currently designed.

This sits in direct tension with the direction Google and OpenAI are both taking in production. Google's Skills feature in Chrome, covered here two days ago, bets on prompt templates as the reusable unit of AI behavior. OpenAI's upgraded Codex, also from this week, leans into persistent memory and expanded context as the substrate for agentic continuity. Both approaches resemble what this paper calls the 'Skill' paradigm, the one that underperformed. The MIT Technology Review piece on enterprise AI as an operating layer is the better conceptual neighbor here: it argues that the real competition is over the infrastructure where AI is refined over time, and Gene-style representations are essentially a proposal for what that refinement substrate should look like at the model level.

Watch whether any of the major coding agent teams (OpenAI Codex, Anthropic Claude Code) publish ablations comparing compact versus documentation-heavy memory formats in the next two quarters. If they do and compact representations win there too, this paper's framing will have moved from academic to directly actionable for product decisions.

Coverage we drew on

Treating enterprise AI as an operating layer · MIT Technology Review — AI

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.