Research Models & Releases·arXiv cs.CL·6d ago

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

Researchers demonstrate that LLMs can transfer optimization knowledge across 371 distinct tasks, moving beyond single-problem search scaffolds to embed iterative refinement capabilities directly into model weights. This work challenges the assumption that evolutionary search requires task-specific engineering, suggesting models can learn generalizable mutation and backtracking strategies. The finding has implications for scaling LLM-guided optimization to novel domains like mathematical conjectures and hardware design without rebuilding search infrastructure each time.

Modelwire context

Explainer

The key move here is architectural: rather than wrapping an LLM in a task-specific search scaffold, the researchers bake mutation and backtracking strategies into the weights themselves through fine-tuning on 371 tasks. That means the generalization claim lives at the level of learned behavior, not prompt engineering, which is a meaningfully different kind of transferability to evaluate.

This connects directly to the screening rule paper covered yesterday ('Knowing in Advance When an Evolutionary Outer Loop Will Not Help'), which asks a complementary question: when should you bother running evolutionary search at all? That paper offers a pre-registered filter for skipping expensive population-based loops; this paper argues the loops themselves can become cheaper by being internalized. Read together, they sketch a more disciplined picture of where evolutionary search earns its compute budget. The evidence-informed LLM beliefs work ('Evidence-Informed LLM Beliefs for Continual Scientific Discovery') is also relevant, since both papers are probing whether LLMs can sustain iterative reasoning across steps rather than collapsing to single-shot outputs.

The real test is whether the fine-tuned models maintain performance on genuinely out-of-distribution domains, specifically hardware design tasks like GPU kernel optimization, that share no surface similarity with the 371 training tasks. If benchmark gains degrade sharply there, the transferability claim is narrower than advertised.

Coverage we drew on

Knowing in Advance When an Evolutionary Outer Loop Will Not Help: A Pre-Registered Cheap-Baseline Screening Rule · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLarge Language Models · GPU kernel design · evolutionary search · mathematical conjectures

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.