MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings

MTRouter addresses a critical pain point in multi-turn LLM deployment: the runaway inference costs of sequential model calls across long tasks. By jointly embedding conversation history and model capabilities, the system learns to route each turn to the most cost-efficient model without sacrificing quality. Real-world results show 40-60% cost reductions while matching or exceeding single-model baselines on complex reasoning benchmarks. This work signals a shift toward pragmatic model selection as a first-class optimization problem, directly relevant to anyone operating heterogeneous model pools under budget constraints.
Modelwire context
Analyst takeThe joint embedding of conversation history alongside model capability profiles is the architectural bet worth scrutinizing. Most routing systems treat each turn as stateless; MTRouter's claim is that accumulated context changes which model is optimal, which is a meaningfully different problem formulation than single-turn cascading.
This lands directly alongside RouteNLP, covered the same day, which targets the same enterprise inference cost problem through conformal cascading and distillation feedback loops. The two papers are essentially competing architectural philosophies for the same budget-constrained operator: RouteNLP bets on closed-loop distillation to improve cheaper models over time, while MTRouter bets on smarter per-turn routing without retraining the pool. That both appeared simultaneously suggests the routing layer is consolidating as a distinct product category, not just a research curiosity. AgentEval, also from this cycle, adds a third angle: if multi-step agentic workflows need DAG-level failure tracking, routing decisions mid-conversation become even higher stakes than single-query cost optimization.
If MTRouter's cost gains replicate on agentic benchmarks like ScienceWorld under adversarial long-horizon conditions (not just the controlled splits reported here), the stateful routing thesis holds. If RouteNLP's distillation loop closes faster in production pilots, the stateless cascade approach wins on operational simplicity.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsMTRouter · GPT-5 · ScienceWorld · Humanity's Last Exam
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.