Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Researchers prove that log-barrier regularization achieves optimal last-iterate convergence in zero-sum matrix games with bandit feedback, matching a recently established lower bound of Omega(t^{-1/4}) and extending the result to extensive-form games.
MentionsFiegel et al. · log-barrier regularization
Read full story at arXiv cs.LG →(arxiv.org)
Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.