Online combinatorial optimization with stochastic decision sets and adversarial losses

Researchers tackle a practical gap in sequential decision-making: most online learning algorithms assume a static action set, but real systems face dynamic constraints like sensor failures, road closures, or inventory depletion. This paper extends regret-minimization theory to handle stochastic action availability through a new loss estimation method called Counting Asleep Times, grounded in Follow-The-Perturbed-Leader prediction. The work bridges theory and deployment by formalizing learning under unreliable composite actions across full-information and bandit feedback regimes, relevant to robotics, logistics, and resource-constrained systems where action feasibility is uncertain.

Modelwire context

Explainer

The real contribution here is not just handling missing actions but doing so without assuming the learner knows when or why an action is unavailable, which is the condition that breaks most existing bandit algorithms in deployment. The 'Counting Asleep Times' estimator corrects for bias introduced by those silent absences, a problem that prior work typically sidesteps by assumption.

This paper sits in a cluster of bandit and online learning work appearing this week. The 'Online learning with Erdos-Renyi side-observation graphs' paper from the same day addresses a related structural gap, partial observability of unchosen arms, and both papers are essentially asking the same underlying question: what guarantees survive when the learner's information channel is unreliable? The difference is that the Erdos-Renyi work treats observation noise as a graph property, while this paper treats action availability itself as stochastic. Together they suggest the field is systematically revisiting which classical assumptions are load-bearing for regret bounds.

The practical test is whether the Counting Asleep Times estimator holds up under correlated unavailability, where sensor failures or road closures cluster in time rather than arriving independently. If follow-on empirical work shows regret degrading under autocorrelated dropout, the stochastic-availability assumption will need revisiting.

Coverage we drew on

Online learning with Erdős-Rényi side-observation graphs · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsFollow-The-Perturbed-Leader · Counting Asleep Times

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.