Modelwire
Subscribe

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Researchers prove that log-barrier regularization achieves optimal last-iterate convergence in zero-sum matrix games with bandit feedback, matching a recently established lower bound of Omega(t^{-1/4}) and extending the result to extensive-form games.

MentionsFiegel et al. · log-barrier regularization

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Related

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

arXiv cs.CL·

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

arXiv cs.CL·

The ‘AI is inevitable’ trap

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier · Modelwire