Research·arXiv cs.LG·4d ago

In-Context Learning for Data-Driven Censored Inventory Control

Researchers propose in-context generative posterior sampling (ICGPS), a method that combines offline meta-training with online decision-making to solve inventory control under demand censoring. The approach leverages modern generative models to impute latent demand signals and make ordering decisions, addressing a core limitation of traditional Thompson sampling when prior assumptions fail. This work bridges offline learning and online deployment patterns increasingly central to practical ML systems, offering a template for how foundation models can be adapted to sequential decision problems where data collection itself depends on past actions.

Modelwire context

Explainer

The core novelty isn't just applying generative models to inventory control, but using them to recover unobserved demand signals during training so that Thompson sampling's posterior estimates become reliable. Traditional approaches assume you know true demand; here, you only see what you ordered, creating a feedback loop that breaks classical assumptions.

This work sits in the same family as the GPart and XFP papers from the same day: methods that invert how practitioners typically approach a constrained problem. Instead of accepting Thompson sampling's limitations and working around them, ICGPS uses offline meta-training to build a generative model that imputes the missing data, then deploys it online. Like GPart's shift from low-rank approximation to geometric preservation, this reframes the problem upstream rather than patching downstream. The approach also echoes the adaptive, specification-driven logic in XFP, where the system learns what it needs rather than engineers prescribing it.

If follow-up work applies ICGPS to real retail or supply-chain datasets and shows it outperforms censored Thompson sampling by more than 10% in cumulative regret, the method moves from theory to practice. If it remains confined to synthetic newsvendor benchmarks through 2026, the gap between offline meta-training and real demand patterns likely remains unsolved.

Coverage we drew on

XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsThompson sampling · in-context generative posterior sampling · generative models · repeated newsvendor problem

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.