The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback

Researchers prove that uncoupled learning algorithms in zero-sum games with bandit feedback face a fundamental tradeoff: guaranteeing Nash equilibrium convergence requires accepting a slower O(T^-1/4) rate versus the standard O(T^-1/2) for averaged iterates. Two new algorithms achieve this optimal lower bound.

Modelwire context

Explainer

The real finding isn't the two new algorithms but the impossibility result beneath them: any uncoupled learner operating with bandit feedback is provably forced to choose between knowing where it's going (last-iterate convergence to Nash) and getting there quickly. The O(T^-1/4) rate isn't a gap to close later; it's the ceiling.

This paper lands one day after 'Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier' (arXiv, April 16), which proved the log-barrier regularizer matches that same Omega(t^-1/4) lower bound from the algorithm side. Together the two papers close the question from both ends: one constructs the optimal algorithm, this one proves no algorithm can do better. That pairing is unusually tidy for a single week of preprints. The broader context is the ongoing effort to understand what game-theoretic learning looks like when agents can only observe outcomes, not each other's strategies, which also connects loosely to the CoopEval benchmark work from April 16 examining how agents behave in repeated social dilemmas.

Watch whether the log-barrier paper (reference [1]) and this one are submitted together or cite each other in revision, which would confirm coordinated research rather than coincidence. If they merge into a unified treatment, the combined result would represent a complete characterization of bandit feedback learning in zero-sum games.

Coverage we drew on

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.