How Good Can Linear Models Be for Time-Series Forecasting?

A new study challenges the industry's scaling-first approach to time-series forecasting by demonstrating that careful preprocessing tuning on simple linear models can close most of the accuracy gap versus large transformers and foundation models. Using Ridge regression as a controlled testbed, researchers identified that optimal context windows are series-specific and non-monotonic across forecast horizons, suggesting practitioners may be overspending on model capacity when data engineering delivers comparable results at lower computational cost. This finding has immediate implications for resource-constrained deployments and questions whether the recent rush toward foundation models for forecasting reflects genuine necessity or architectural momentum.

Modelwire context

Analyst take

The buried finding here isn't that linear models are good, it's that optimal context windows are non-monotonic across forecast horizons, meaning there's no single preprocessing recipe that generalizes. That specificity is what makes this actionable and also what limits how far the result travels beyond controlled benchmarks.

This connects directly to the co-failure ceiling paper from the same day ('When Does Combining Language Models Help'), which made a structurally similar argument: scaling through model combination hits a hard ceiling defined by shared failure modes, not by individual model capacity. Both papers, arriving together, push toward the same uncomfortable conclusion for vendors selling complexity as a solution. The forecasting paper applies that logic to a different axis, model size rather than ensemble count, but the underlying critique is identical. Neither paper argues simple methods always win; both argue the burden of proof for expensive architectures is higher than current practice demands.

Watch whether any of the major time-series foundation model providers (Nixtla, Amazon Chronos, Google TimesFM) publish rebuttal benchmarks on multivariate or irregular-frequency datasets within the next two quarters. If they don't engage directly, that silence is informative about how seriously they take the preprocessing-first framing.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsRidge regression · Transformers · Foundation models

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.