Models & Releases Research Products & Apps Business & Funding

Developers Subscribe

Modelwire

A curated feed of what matters in AI. Independent, ad-supported, built in Denver, Colorado.

Read

Today
Models & Releases
Research
Business & Funding

About

About Modelwire
Methodology
Our sources
Editor's notes
Developer API
Contact
Advertise

Legal

Privacy policy
Terms of use
DMCA & takedowns
Corrections

© 2026 Modelwire. All article links go to the original publishers.Summaries generated by Modelwire. We don’t republish full articles.

arXiv cs.LG

https://arxiv.org/list/cs.LG/recent · Editorial weight 5/10

Illustration for: Robot policies scale to 8K-step context windows without latency cost

Research Models & Releases

Robot policies scale to 8K-step context windows without latency cost

Robot foundation models have historically operated within narrow temporal windows, limiting their ability to learn from extended interaction sequences. RoboTTT breaks this constraint by scaling visuomotor context to 8,000 timesteps without inference overhead, unlocking capabilities previously unavailable to embodied AI systems: single-shot learning from human video, adaptive policy refinement mid-deployment, and improved long-horizon task performance. The work demonstrates that scaling context length yields measurable closed-loop gains, mirroring insights from language model scaling. This shift matters because it reframes robot learning as a context-window problem rather than a data-collection problem, potentially accelerating deployment of more autonomous systems in unstructured environments.

arXiv cs.LG·2d ago

72

Illustration for: RL alignment framework extended to fast-sampling flow generators

Research Models & Releases

RL alignment framework extended to fast-sampling flow generators

Researchers have extended DiffusionNFT, an efficient reinforcement learning framework for aligning generative models, to work with MeanFlow generators that prioritize fast few-step sampling. The core innovation bridges a technical gap: DiffusionNFT optimizes instantaneous velocities while MeanFlow operates on average velocities across time intervals. By constructing an induced instantaneous-velocity representation grounded in the MeanFlow identity, MeanFlowNFT enables preference-aligned generation without reverse-process trajectories or likelihood computation. This matters because it expands RL-based alignment techniques to a faster, more practical class of generators, lowering the computational barrier for deploying human-preference-tuned models in production settings.

arXiv cs.LG·2d ago

58

Decoupled memory updates enable real-time video view synthesis

Researchers propose a decoupled memory update strategy for real-time novel view synthesis from streaming video, addressing a core bottleneck in dynamic scene reconstruction. By separating memory refresh frequency from inference application, the approach reduces computational overhead while maintaining temporal coherence across occluded regions. This work targets a practical constraint in video AI: balancing persistent context windows against latency budgets. The technique signals growing attention to inference-time efficiency in vision transformers and test-time adaptation, where naive per-frame updates become prohibitive at scale.

arXiv cs.LG·2d ago

52

Research Models & Releases

XGBoost classifies Bitcoin sentiment from on-chain and social signals

Researchers have developed a machine learning classifier that decodes Bitcoin market sentiment by fusing on-chain transaction patterns with social media signals and price history. Rather than chasing price prediction, the work treats sentiment as a distinct classification task, with XGBoost outperforming competing models in cross-validation. This represents a methodological shift in crypto analytics: treating blockchain data as a legitimate feature source for supervised learning, not just a speculative signal. The approach matters because it validates on-chain metrics as trainable inputs for financial ML, opening a new data stream for sentiment modeling across other assets.

arXiv cs.LG·2d ago

52

Illustration for: Retrain-free recommendation embeddings update in real time via sparse trees

Retrain-free recommendation embeddings update in real time via sparse trees

Researchers have tackled a persistent inefficiency in recommendation systems: stale user embeddings that persist until the next full model retrain. The proposed mutable sketch approach uses sparse segment trees to dynamically update user preferences as new ratings arrive, eliminating the need for retraining while maintaining theoretical guarantees on prediction error tightening. On benchmark data, the method cuts data I/O to 1.8% versus traditional ALS while achieving better RMSE and enabling sub-millisecond personalization after a single user interaction. This addresses a real production pain point where recommendation latency and computational cost have historically forced a tradeoff between freshness and efficiency.

arXiv cs.LG·2d ago

58

Illustration for: Researchers enable tokenizer upgrades for deployed language models

Research Tools & Code

Researchers enable tokenizer upgrades for deployed language models

Tokenizer vocabulary is typically locked at pre-training time, creating a structural problem when deployment priorities shift: languages added later fragment into many tokens per word, inflating latency and compute costs for those users. On-device models face particular pressure since embedding and output matrices consume substantial decode bandwidth. This paper introduces an in-place tokenizer expansion technique that lets model producers upgrade vocabulary post-hoc without full retraining, addressing a real efficiency gap between cloud and edge deployments. The work targets a concrete pain point in multilingual and evolving model ecosystems.

arXiv cs.LG·2d ago

58

Bandit algorithms optimize predictive maintenance scheduling from failure data

Researchers have formulated preventive maintenance scheduling as a multi-armed bandit problem, developing algorithms that learn optimal replacement intervals from operational failure data without knowing the underlying lifetime distribution. The work bridges classical reliability engineering and modern machine learning by applying bandit theory to minimize long-run maintenance costs across fleets of identical machines. This approach matters for infrastructure operators managing large-scale systems where downtime is costly and failure patterns are empirically learned rather than theoretically specified, connecting optimization theory to practical deployment challenges in industrial ML.

arXiv cs.LG·2d ago

42

Illustration for: Genetic algorithm evolves asynchronous neural networks without backpropagation

Research Models & Releases

Genetic algorithm evolves asynchronous neural networks without backpropagation

NeuronSoup introduces a fundamentally different neural architecture that abandons layer-wise synchronous computation in favor of asynchronous signal routing through a shared neuron pool, where interference patterns emerge from timing and polarity interactions. The entire system, topology through delays, is co-evolved via genetic algorithm rather than gradient descent. This represents a departure from backpropagation-dependent learning and challenges conventional assumptions about how neural computation must be organized, potentially opening new directions for neuromorphic and evolutionary approaches to architecture search.

arXiv cs.LG·2d ago

58

Sampling bias can scale without Metropolis correction under weak interactions

Researchers have extended a theoretical result showing that unadjusted sampling algorithms can operate efficiently without explicit bias correction, provided they take enough integration steps relative to problem dimensionality. This work addresses a core computational bottleneck in Bayesian inference and generative modeling: Metropolis-Hastings corrections typically require tiny step sizes that multiply iteration costs. By proving that bias naturally disperses across high-dimensional marginals under weak interactions, the finding suggests practitioners may trade acceptance-rate tuning for step-count scaling, potentially accelerating sampling-based inference in large-scale probabilistic models and variational methods that rely on these algorithms.

arXiv cs.LG·2d ago

52

Researchers expose misalignment attacks on embodied AI world models

Researchers have identified a fundamental vulnerability in world-action models, a class of embodied AI systems designed to couple action generation with future-state prediction. The BadWAM framework demonstrates that small visual perturbations can desynchronize what these models imagine will happen from what they actually execute, undermining a core safety assumption: that robots can validate actions against their own predictions. This attack surface exposes a gap between the theoretical robustness narrative around WAMs and their practical fragility, forcing a recalibration of how embodied AI safety is evaluated.

arXiv cs.LG·2d ago

62

Illustration for: Unified framework derives uncertainty measures from loss functions

Unified framework derives uncertainty measures from loss functions

Researchers propose a unified theoretical framework for uncertainty quantification that derives epistemic and aleatoric uncertainty measures from subjective risk decomposition rather than treating them as independent primitives. By grounding uncertainty in strictly proper loss functions, the work reconciles disparate UQ methods across the literature under a single mathematical foundation. This shift from axiom-driven to consequence-driven uncertainty has immediate implications for practitioners: model builders can now systematically induce appropriate uncertainty estimates directly from their choice of loss function and modeling objective, potentially streamlining how production systems calibrate confidence estimates and handle out-of-distribution scenarios.

arXiv cs.LG·2d ago

58

Illustration for: Neural networks fill gaps in physics-based differential equation models

Research Models & Releases

Neural networks fill gaps in physics-based differential equation models

Researchers propose a hybrid framework that embeds neural networks into physics-based differential equation models, allowing systems to learn missing dynamics while preserving known physics. The approach alternates between state inference via Rauch-Tung-Striebel smoothing and parameter optimization, addressing a core challenge in scientific machine learning: incomplete observability and unknown system components. This technique bridges symbolic and learned representations, relevant to domains from biology to materials science where partial mechanistic knowledge exists but measurement gaps remain.

arXiv cs.LG·2d ago

58

Illustration for: Delta distillation transfers reasoning gains without reward models

Delta distillation transfers reasoning gains without reward models

Researchers propose delta distillation, a refinement to on-policy reinforcement learning that sidesteps reward model bottlenecks by extracting reasoning gains directly from teacher models. Rather than copying output distributions, the method captures the delta between a tuned model and its pre-instruction baseline, isolating learned reasoning patterns for transfer. This addresses a real friction point in post-training: reward models often constrain signal quality. The approach matters for teams scaling reasoning-focused LLMs, as it offers a more granular supervision path that could improve efficiency in capability transfer without external reward annotation.

arXiv cs.LG·2d ago

58

Illustration for: Five leading world models fail basic visual consistency tests in Pong

Five leading world models fail basic visual consistency tests in Pong

A systematic evaluation of five leading world models reveals fundamental gaps in how these components learn visual dynamics, even when integrated into high-performing reinforcement learning agents. By freezing trained models and stress-testing them with independent policies, researchers uncovered consistent failure modes: vanishing objects, physically implausible motion, and broken interaction semantics. This work matters because world models are treated as black-box components within larger MBRL systems, obscuring whether performance gains come from accurate environment understanding or agent-level compensation. The findings suggest current visual world models lack the spatial reasoning needed for reliable long-horizon planning, a critical bottleneck for scaling model-based RL beyond narrow domains.

arXiv cs.LG·2d ago

62

Illustration for: AlphaWiSE interpolates multimodal checkpoints to balance continual learning tradeoffs

Research Tools & Code

AlphaWiSE interpolates multimodal checkpoints to balance continual learning tradeoffs

Continual learning in multimodal systems faces a fundamental tension: adapting to new data often erodes the cross-modal alignment learned earlier. AlphaWiSE addresses this by interpolating between frozen checkpoints in weight space, allowing practitioners to dial the stability-plasticity tradeoff per parameter rather than committing globally. The method fits scalar coefficients on a small exemplar buffer, producing a single deployable model without architectural overhead. This matters for production CLIP-like systems that must absorb streaming data without catastrophic forgetting or expensive retraining cycles.

arXiv cs.LG·2d ago

58

Illustration for: Offline RL treatment studies fail covariate balance checks

Offline RL treatment studies fail covariate balance checks

Researchers have identified a critical methodological gap in offline reinforcement learning applications for clinical treatment optimization. By applying covariate balance diagnostics, the work reveals that existing studies either harbor substantial bias risk or rely on inadequate validation metrics. This finding challenges the statistical credibility of deployed offline RL systems in healthcare and signals that the field lacks robust frameworks for detecting hidden confounding in long-horizon decision processes. The implications extend beyond medicine to any domain where offline RL informs high-stakes sequential decisions.

arXiv cs.LG·2d ago

58

Research Tools & Code

Sparse equation recovery method scales to practical engineering problems

SINDy represents a meaningful counterweight to the data-hungry neural network paradigm dominating surrogate modeling in engineering. By recovering sparse, interpretable equations from small datasets through regression over nonlinear term libraries, the method addresses a persistent friction point: practitioners often lack the massive labeled datasets required for deep learning, yet need models that expose their underlying physics rather than acting as black boxes. This tutorial bridges the gap between theoretical validation on toy problems and real-world deployment, making symbolic regression techniques more accessible to domain experts who prioritize explainability and sample efficiency over raw predictive power.

arXiv cs.LG·2d ago

52

Illustration for: New off-policy evaluation method handles behavior policy misspecification in bandits

New off-policy evaluation method handles behavior policy misspecification in bandits

Researchers have developed Kernel-WIS, an off-policy evaluation method that addresses a critical bottleneck in contextual bandit deployment. The technique combines importance sampling's theoretical guarantees with kernel-based variance reduction, enabling practitioners to assess policy performance using only historical data without live experimentation. This matters because behavior policy misspecification, a common real-world failure mode, typically degrades standard estimators, but Kernel-WIS maintains consistency under these conditions. The advance reduces friction in production bandit systems where offline validation before deployment is essential.

arXiv cs.LG·2d ago

58

Illustration for: DriftWorld accelerates robot planning by replacing iterative diffusion with single-pass generation

Research Models & Releases

DriftWorld accelerates robot planning by replacing iterative diffusion with single-pass generation

World models trained via diffusion face a critical inference bottleneck: generating robot action rollouts requires iterative denoising, making large-scale planning prohibitively slow. DriftWorld sidesteps this by learning action-conditioned drift trajectories during training, enabling single-pass frame generation at 30+ fps, roughly 17 times faster than diffusion alternatives. This speed gain directly unlocks real-time action search for robotic control, addressing a known constraint that has limited diffusion-based planning in practice. The work signals a shift toward inference-efficient generative models for embodied AI, where latency directly impacts task performance.

arXiv cs.LG·2d ago

62

Illustration for: Prompt tuning cuts medical AI parameters while preserving interpretability

Research Models & Releases

Prompt tuning cuts medical AI parameters while preserving interpretability

Researchers demonstrate a parameter-efficient adaptation strategy for vision foundation models applied to early dementia screening, reducing trainable parameters to 1.19 million through prompt tuning on a frozen DINOv2-Small backbone. The work addresses a persistent tension in medical AI: balancing model performance against computational efficiency and interpretability. By embedding explainability as an intrinsic property rather than post-hoc overlay, this approach signals growing maturity in deploying foundation models to resource-constrained clinical settings where both accuracy and auditability matter. The technique exemplifies how prompt-based adaptation can unlock specialized applications without full retraining.

arXiv cs.LG·2d ago

58

Research Tools & Code

New visualization framework tackles interpretability gap in categorical machine learning

Researchers introduce cGAP, a visualization framework that addresses a persistent gap in machine learning tooling: interpretable exploration of high-dimensional categorical data. Unlike existing methods that either collapse to low-dimensional projections or sacrifice readability for predictive power, cGAP preserves the original data matrix while embedding subjects and category levels in three-dimensional space mapped to RGB coordinates. The work targets domains where categorical structure dominates (genetics, biomedicine, social science) and reflects growing recognition that interpretability infrastructure lags behind model capability, particularly for non-continuous modalities that remain common in real-world applications.

arXiv cs.LG·2d ago

52

Illustration for: Simulation method synthesizes formally verified control policies

Research Tools & Code

Simulation method synthesizes formally verified control policies

Formal verification remains a critical bottleneck for deploying learned controllers in safety-critical systems. SMC-ES addresses this by combining simulation-based policy synthesis with probabilistic guarantees on safety, robustness, and performance, eliminating the traditional trade-off between learning flexibility and provable correctness. This bridges reinforcement learning's scalability with the formal assurance requirements of autonomous vehicles, robotics, and industrial control, potentially unlocking deployment pathways currently blocked by certification demands.

arXiv cs.LG·2d ago

58

Contrastive learning framework tackles false negatives in medical imaging

Researchers propose MseaCL, a contrastive learning framework that addresses a fundamental flaw in multimodal medical AI: standard approaches treat all unpaired samples as negatives, even when they share clinically relevant semantic properties. This false negative problem degrades representation quality in healthcare settings where subtle anatomical or pathological similarities matter. The work, trained on pediatric 3D brain imaging, signals growing sophistication in how the field handles domain-specific constraints in self-supervised learning. For practitioners building medical AI systems, this represents a practical refinement that could improve downstream diagnostic accuracy without requiring labeled data.

arXiv cs.LG·2d ago

52

Illustration for: Reinforcement learning tackles diversity collapse in image generation

Reinforcement learning tackles diversity collapse in image generation

Researchers propose multi-axis max@K, a reinforcement learning method that addresses a critical limitation in text-to-image diffusion models: mode collapse. When prompted to generate diverse outputs, T2I systems often produce visually similar results, particularly problematic for person-centric prompts where this can entrench demographic bias. The technique uses group-based credit assignment to reward samples that collectively cover predefined semantic categories, pushing models toward broader representational coverage. This work bridges fairness and generative quality, directly impacting how production T2I systems should balance prompt fidelity against demographic equity.

arXiv cs.LG·2d ago

62

Illustration for: LongStraw enables million-token RL training on fixed GPU budgets

Research Tools & Code

LongStraw enables million-token RL training on fixed GPU budgets

LongStraw addresses a critical bottleneck in AI agent development: the ability to run reinforcement learning post-training on million-token contexts within fixed GPU budgets. Current RL systems plateau at 256K tokens, forcing length generalization at deployment time, which undermines agents that accumulate observations and tool outputs over extended trajectories. This architecture-aware execution stack uses Group Relative Policy Optimization to eliminate redundant autograd computation, cache only token-specific state, and replay response branches sequentially, trading compute time for memory efficiency. The work signals growing recognition that agent capability depends on training-time context depth, not just inference-time window size.

arXiv cs.LG·2d ago

62

Illustration for: Rectified flow self-distillation gets closed-form optimal mixing rules

Rectified flow self-distillation gets closed-form optimal mixing rules

Researchers have derived closed-form solutions for optimal self-distillation in rectified flow models, a critical problem as generative systems increasingly train on their own outputs. The work proves when and how much a student model should blend real training signals with teacher-generated ones to avoid collapse while improving performance. A sign rule determines the mixing strategy: positive coefficients fix under-regularized teachers, negative ones correct over-regularization. This theoretical foundation matters because self-distillation is becoming standard practice in scaling generative models, yet the conditions for safe improvement remain poorly understood. The result provides practitioners with provable guidance for a technique that could otherwise amplify errors across training iterations.

arXiv cs.LG·2d ago

58

Illustration for: Mechanistic interpretability steers world models toward robustness

Mechanistic interpretability steers world models toward robustness

Researchers have identified a critical brittleness in World Action Models under distribution shift and developed mechanistic interpretability techniques to address it. By analyzing activation patterns across successful and failed rollouts, they discovered that some WAM architectures encode robustness-critical features in low-dimensional linear subspaces, enabling training-free steering via contrastive directions. They further leveraged local linearity in activation dynamics to construct WA-LQR, a lightweight optimal control framework that improves robustness without retraining. This work bridges interpretability and control theory, offering a practical pathway for hardening embodied AI systems against real-world variability.

arXiv cs.LG·2d ago

62

Causal inference method handles networked outcomes and hidden confounders

Researchers tackle a foundational challenge in causal inference: learning treatment effects when outcomes influence each other across time and units, while hidden confounders distort the signal. The work combines Ising models for outcome dependencies with low-rank latent structures, solved via maximum pseudo-likelihood estimation. This matters because real-world observational data from networks, marketplaces, and social systems violates the independence assumption baked into most causal methods. The approach opens pathways for practitioners to extract valid causal insights from complex, interdependent systems without randomized trials.

arXiv cs.LG·2d ago

52

Illustration for: Minimal dynamical systems model matches complex foundation model performance

Research Models & Releases

Minimal dynamical systems model matches complex foundation model performance

Researchers have distilled a state-of-the-art foundation model for dynamical systems forecasting into an interpretable two-parameter architecture called DynaBase. By systematically reducing DynaMix, they discovered that in-context learning for time-series prediction can operate through a simple linear interpolation between current latent states and nearest neighbors. This finding challenges the assumption that complex foundation models require architectural bloat, suggesting minimal mechanisms suffice for strong zero-shot generalization. The work matters for practitioners seeking efficiency gains and for theorists understanding what makes in-context learning tick across domains beyond language.

arXiv cs.LG·2d ago

62

Post-hoc defense method targets query-efficient adversarial attacks on neural networks

Adversarial robustness remains a critical bottleneck for deploying ML systems in high-stakes domains. This work introduces Random Logit Scaling, a post-hoc defense mechanism targeting black-box score-based attacks, where adversaries query a model's confidence scores to craft perturbations without accessing internal weights. The contribution matters because it offers practitioners a low-friction, model-agnostic layer that can wrap existing deployments. As production systems face increasingly sophisticated query-based attacks, defenses that don't require retraining or architectural changes become strategically valuable for enterprises balancing security and operational continuity.

arXiv cs.LG·2d ago

52