What Makes an LLM a Good Optimizer? A Trajectory Analysis of LLM-Guided Evolutionary Search

A large-scale study of 15 LLMs across 8 optimization tasks reveals that raw problem-solving ability predicts only part of final performance; the strongest optimizers excel at incremental refinement and semantic localization rather than raw capability alone.
Modelwire context
ExplainerThe study's most actionable finding is buried in the framing: 'incremental refinement' and 'semantic localization' are measurable behavioral signatures, not vague intuitions, which means they could eventually serve as selection criteria when choosing which model to deploy for search-based tasks rather than defaulting to the highest benchmark scorer.
This connects directly to the shortest-path generalization paper from April 16, which found that LLMs fail not on initial problem setup but when tasks require sustained, recursive progress over longer horizons. That paper diagnosed a ceiling on systematic problem-solving; this new work essentially explains part of why that ceiling exists: models that can't refine incrementally collapse under extended search. The logical reasoning faithfulness study from April 21 adds a complementary angle — models prefer to report failure rather than produce a wrong answer, which suggests the 'good optimizer' behavior identified here may be rarer than benchmark scores imply.
Watch whether follow-up work tests these trajectory signatures against reinforcement-trained models specifically, since the EVPO paper from April 21 suggests critic quality during post-training directly shapes the kind of iterative signal a model learns to trust. If trajectory quality correlates with explained-variance metrics from that training regime, the two research threads converge into something practically useful for model selection.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.