Business & FundingProducts & AppsWe’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risksGoogle DeepMind is establishing a regional accelerator program across Asia Pacific focused on deploying AI to address environmental challenges. This move signals DeepMind's pivot toward applied climate and sustainability work beyond pure research, positioning the lab as a direct competitor to other AI labs' climate initiatives while expanding its footprint in a strategically critical region. The program likely combines model deployment, compute access, and partnership infrastructure to help local organizations scale environmental AI applications, reflecting broader industry momentum around AI-for-good initiatives and geographic diversification of AI capability centers.Google DeepMind·May 2175
Business & FundingPolicy & RegulationSpotify and Universal Music strike deal allowing fan-made AI covers and remixesUniversal Music Group and Spotify are formalizing a revenue-sharing framework for generative audio, legitimizing AI-assisted music creation as a licensed product category rather than a copyright gray zone. This partnership signals that major rights holders are moving from litigation posture to commercialization, embedding artist compensation into the generative workflow itself. The model matters: rather than policing AI covers post-hoc, the deal bakes consent and payment into the platform, potentially reshaping how the industry handles synthetic media and setting a precedent for other entertainment verticals facing similar pressures.TechCrunch - AI·May 2181
Products & AppsBusiness & FundingSix search engines worth trying now that Google isn’t really Google anymoreGoogle's search interface is undergoing significant transformation driven by AI integration, particularly through expanded AI Overview features that are reshaping how results are presented to users. This shift signals a broader industry pivot where traditional search ranking and link-based discovery are being displaced by AI-generated summaries and direct answers. The emergence of viable alternatives reflects growing user friction with AI-first search, creating an opening for competitors to capture dissatisfied users. For the AI ecosystem, this represents a critical inflection point: search monetization models, training data sourcing, and user behavior patterns are all in flux as the dominant search paradigm transitions from indexing to generation.TechCrunch - AI·May 2169
Opinion & AnalysisScaling creativity in the age of AIMIT Technology Review examines how AI is reshaping creative expression and storytelling across media. The piece traces humanity's long history of technological innovation in narrative forms, from pigment-based cave art through photography, and positions generative AI as the latest inflection point in how stories are authored, distributed, and consumed. The strategic angle centers on whether AI tools democratize creative capacity or concentrate it, and how creators navigate authenticity when machines can generate narrative at scale. This matters to the AI landscape because it reframes the cultural stakes of generative models beyond productivity metrics into questions of artistic agency and human meaning-making.MIT Technology Review - AI·May 2172
Products & AppsTools & CodeShare Codex plugins with your teamOpenAI has expanded Codex's plugin ecosystem to enable team-level distribution and governance, allowing organizations to standardize internal tool access across workspaces. This shift from individual to collaborative plugin management reflects a broader maturation of AI development platforms toward enterprise workflows, where plugin curation and access control become operational necessities. The move signals OpenAI's positioning of Codex as infrastructure for scaled, multi-user AI development rather than isolated experimentation, directly competing with similar team collaboration features in competing LLM platforms.OpenAI (YouTube)·May 2165
Tools & CodePolicy & RegulationGoogle checks websites for llms.txt in new agentic browsing auditGoogle is expanding Lighthouse, its web performance audit tool, to measure how well websites accommodate AI agents through a new 'Agentic Browsing' category that checks for llms.txt compliance. This signals a structural shift in how the web is being optimized: rather than just human visitors, sites must now account for machine agents crawling and interacting with their content. The move reflects growing pressure on publishers and platforms to establish machine-readable protocols for AI access, effectively standardizing agent behavior expectations across the internet. For developers and site owners, this represents a new compliance surface alongside SEO and accessibility.The Decoder·May 2173
Products & AppsTools & CodeIntroducing Appshots in CodexOpenAI has integrated Appshots into Codex, enabling developers to anchor coding assistance to live application context. The feature captures both visual and non-visible window content via a Mac keyboard shortcut, allowing the LLM to reason over real-time UI state rather than abstract code snippets alone. This represents a meaningful shift in how code generation models consume context, moving beyond static files toward dynamic runtime environments. The rollout across consumer and enterprise tiers signals OpenAI's push to deepen Codex's integration into developer workflows, competing directly with IDE-native AI assistants that lack this contextual richness.OpenAI (YouTube)·May 2169
Tools & CodeProducts & Appsdatasette-agent-sprites 0.1a0Simon Willison released datasette-agent-sprites, a plugin enabling Datasette agents to execute commands within Fly Sprites sandboxes. This bridges agentic AI tooling with containerized execution environments, addressing a core infrastructure gap for safely running agent-generated code. The move signals growing maturity in the agent framework ecosystem, where isolation and controlled execution are becoming table stakes for production deployments. For teams building on Datasette or exploring agent architectures, this unlocks safer patterns for delegating computational tasks to LLM-driven systems.Simon Willison·May 2164
ResearchTools & CodeTokenisation via Convex RelaxationsResearchers have reframed tokenisation, a foundational NLP preprocessing step, as a convex optimisation problem rather than a greedy search. ConvexTok outperforms standard methods like BPE by constructing vocabularies that minimise bits-per-byte across language models while providing formal optimality guarantees. The work matters because tokeniser design directly affects model efficiency and downstream performance, yet has remained largely heuristic. This shift toward principled, certifiable tokenisation could reshape how practitioners approach vocabulary construction, particularly for resource-constrained deployments where compression gains compound across inference.arXiv cs.LG·May 2162
ResearchModels & ReleasesIntegrable Elasticity via Neural Demand PotentialsResearchers introduce ICDN, a neural architecture that models multiproduct demand by learning smooth, price-conditioned log-demand surfaces from which elasticities can be derived analytically. This work bridges econometrics and deep learning by enforcing economic structure (integrability constraints) directly into the model, improving both generalization and interpretability of cross-price effects on retail datasets. The approach signals growing interest in embedding domain knowledge and causal reasoning into neural systems, particularly where model outputs must satisfy real-world economic constraints rather than optimize purely for prediction accuracy.arXiv cs.LG·May 2152
ResearchModels & ReleasesVector Policy Optimization: Training for Diversity Improves Test-Time SearchVector Policy Optimization addresses a fundamental mismatch in LLM training: models optimized for single scalar rewards produce low-entropy outputs that fail when deployed in inference-time search systems like AlphaEvolve, which require diverse candidate solutions across multiple task-specific objectives. VPO reframes post-training to anticipate vector-valued rewards, training policies to generate varied outputs that better serve downstream selection procedures. This shift matters because it decouples training objectives from deployment constraints, potentially unlocking better performance in test-time compute scaling without retraining. The work signals growing recognition that LLM generalization now depends on output diversity as a first-class training goal.arXiv cs.LG·May 2162
ResearchRemember to be Curious: Episodic Context and Persistent Worlds for 3D ExplorationCuriosity-driven reinforcement learning has struggled to scale to photorealistic 3D environments because agents get stuck revisiting forgotten states without genuine exploration progress. This work identifies the root cause: agents lack both persistent world models that update continuously and episodic memory of their own trajectories. The fix addresses a fundamental bottleneck in sparse-reward learning, where intrinsic motivation signals degrade in complex visual domains. Success here unlocks more efficient training for embodied AI systems and long-horizon tasks, directly impacting how agents learn to navigate and act in realistic simulations before deployment.arXiv cs.LG·May 2158
ResearchThe Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation LearningA new theoretical framework unifies disparate robustness techniques across computer vision and deep learning under a single statistical principle: controlling encoder sensitivity to label-preserving nuisance variation. The work reinterprets adversarial training, domain adaptation, data augmentation, and alignment constraints as different estimators of the same underlying covariance structure, with closed-form optimality proofs in the linear-Gaussian case. This conceptual consolidation matters for practitioners because it suggests that seemingly orthogonal robustness methods share fundamental machinery, potentially enabling more principled design of invariant representations and clearer trade-offs between competing robustness objectives.arXiv cs.LG·May 2162
ResearchFinite-Particle Convergence Rates for Conservative and Non-Conservative Drifting ModelsResearchers have formalized convergence guarantees for a new class of generative models that use kernel density estimation to enforce conservative (gradient-based) drift dynamics. The work addresses a fundamental theoretical gap in one-step generation methods by proving finite-particle bounds and quantifying how estimation error from limited samples affects model quality. This matters for practitioners building efficient samplers: it provides the mathematical scaffolding to predict when and why KDE-based approaches outperform displacement methods, and establishes concrete rates for scaling kernel bandwidth and particle count in production systems.arXiv cs.LG·May 2152
ResearchTools & CodeMOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent SystemsResearchers propose MOSS, a framework enabling autonomous agents to modify their own source code rather than just prompt configurations or skill files. Current self-evolving systems are constrained to text-layer changes, leaving structural failures in routing logic, state management, and dispatch mechanisms unreachable. By treating the agent harness itself as mutable, MOSS expands the adaptation surface to Turing-complete scope, potentially closing a critical gap between what agents can learn and what they can actually fix. This shifts the self-improvement paradigm from configuration tuning toward genuine architectural adaptation.arXiv cs.LG·May 2162
ResearchTools & CodeLCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent SystemsMulti-agent LLM systems are increasingly adopting latent communication through transformer key-value caches to boost coordination efficiency, but this opaque channel risks leaking sensitive context and reasoning states across agents without explicit oversight. LCGuard addresses this emerging security gap by treating shared KV caches as a controlled communication layer, enabling safer information flow in systems where agents coordinate on complex tasks. This work signals growing tension between performance gains from direct latent sharing and the need for transparency and control in agent-to-agent data propagation, a critical concern as production multi-agent deployments scale.arXiv cs.LG·May 2158
ResearchModels & ReleasesEvaluating Commercial AI Chatbots as News IntermediariesA systematic evaluation of six major AI chatbots reveals a critical gap between multiple-choice and real-world performance on news comprehension. When tested on same-day BBC reporting across six languages and regions, top performers like Gemini and Claude maintained over 90% accuracy in constrained settings but dropped 11-17% when forced to generate free-form answers. This benchmarking work exposes how proprietary search and retrieval pipelines mask brittleness in factual grounding, raising questions about whether current systems are reliable enough for news intermediation at scale.arXiv cs.CL·May 2162
ResearchTools & CodeFAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly DetectionProduction log anomaly detection has long suffered from coarse-grained alerts that force operators to sift through routine messages. FAME introduces a mixture-of-experts architecture that pinpoints individual anomalous log lines rather than flagging entire sessions, addressing a critical operational bottleneck. By combining label-efficient training with selective LLM reasoning, the framework sidesteps the prohibitive cost of running language models on every log line in continuous systems. This work signals growing momentum in applying structured ML to observability infrastructure, where fine-grained anomaly localization directly reduces mean-time-to-resolution for production incidents.arXiv cs.LG·May 2158
ResearchModels & ReleasesSDPM: Survival Diffusion Probabilistic Model for Continuous-Time Survival AnalysisResearchers introduce SDPM, a diffusion-based generative model that reformulates survival analysis as a continuous-time problem without imposing restrictive hazard assumptions or discretizing time. By modeling censored time-to-event distributions directly through denoising diffusion, the approach sidesteps approximation errors endemic to traditional Cox models and discrete-time methods. This represents a methodological shift in how generative models tackle structured prediction tasks with incomplete data, relevant to healthcare ML and any domain where censoring complicates ground truth.arXiv cs.LG·May 2158
ResearchModels & ReleasesMambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking DataMambaGaze demonstrates how state-space models can solve a persistent real-world constraint in human-computer interaction: eye-tracking data is inherently noisy and incomplete due to blinks and sensor failures. By combining explicit uncertainty encoding with bidirectional Mamba-2's linear-time architecture, the framework achieves meaningful accuracy gains on cognitive load benchmarks. This matters because adaptive safety systems (pilot assistance, driver monitoring) depend on reliable signal processing at scale, and the technique's efficiency opens deployment paths where transformer-based alternatives would be computationally prohibitive. The work signals growing maturity in applying modern sequence models to embodied AI applications beyond language.arXiv cs.LG·May 2158
ResearchTools & CodeCogAdapt: Transferring Clinical ECG Foundation Models to Wearable Cognitive Load Assessment via Lead AdaptationCogAdapt demonstrates a practical transfer-learning pattern for repurposing large foundation models across hardware and task boundaries. By bridging the gap between clinical-grade 12-lead ECG systems and consumer wearables via learnable adapters, the work addresses a recurring infrastructure challenge in applied ML: how to extract value from expensive pre-training when deployment constraints differ fundamentally. The progressive fine-tuning strategy to avoid catastrophic forgetting is a known technique, but its application to cross-domain sensor adaptation signals growing maturity in foundation model deployment workflows. This matters for teams building real-time biometric systems where labeled wearable data remains scarce.arXiv cs.LG·May 2158
ResearchModels & ReleasesReducing Political Manipulation with Consistency TrainingResearchers have identified systematic political asymmetry in how large language models respond to paired prompts from opposing ideological perspectives, termed covert political bias. The work introduces Political Consistency Training, a reinforcement learning approach that enforces symmetric sentiment and engagement depth across politically sensitive topics. This addresses a critical alignment challenge for deployed LLMs: models can appear balanced on surface metrics while subtly privileging one political framing over another. The technique preserves overall model helpfulness while reducing bias, making it relevant for organizations deploying LLMs in high-stakes contexts where perceived neutrality matters.arXiv cs.CL·May 2162
ResearchModels & ReleasesUnderstanding Data Temporality Impact on Large Language Models Pre-trainingResearchers challenge a foundational assumption in LLM training by studying how data ordering affects temporal knowledge acquisition. Using a new 7,000-question benchmark grounded in time-sensitive facts, they pretrained 6B-parameter models on chronologically ordered Common Crawl snapshots versus standard shuffled corpora. The finding that sequential training matches or outperforms shuffled baselines suggests that temporal coherence during pretraining may improve factual grounding and time-aware reasoning, with implications for how practitioners should curate and structure training data for knowledge-intensive applications.arXiv cs.CL·May 2162
Policy & RegulationTrump delays AI security executive order: ‘I don’t want to get in the way of that leading’The Trump administration shelved a planned executive order mandating pre-release security reviews of AI models, signaling a regulatory pullback at a critical juncture for frontier AI development. The decision reflects tension between safety governance and competitive velocity: officials cited concerns that mandatory government vetting could slow innovation and cede advantage to international competitors. This reversal reshapes the near-term policy landscape for model deployment, removing a potential friction point for labs but leaving the U.S. without formal pre-release security guardrails as capabilities scale.TechCrunch - AI·May 2176
ResearchUniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State ReformulationResearchers have identified a fundamental mismatch in how Uniform Diffusion Models train versus how they're parameterized for inference. The standard approach optimizes a leave-one-out posterior rather than the stated denoising objective, creating a gap between theory and practice. This work provides exact mathematical conversions between different formulations, enabling practitioners to align training and deployment strategies. The finding matters for anyone scaling discrete diffusion to language and vision tasks, as it clarifies which architectural choices actually match their training signal.arXiv cs.LG·May 2158
ResearchLumberjack: Better Differentially Private Random Forests through Heavy Hitter Detection in TreesDifferential privacy remains a critical bottleneck for deploying machine learning on sensitive datasets, and random forests have been particularly vulnerable to privacy-utility tradeoffs that render them unusable in practice. Lumberjack addresses this by combining deep tree construction with privacy-aware pruning, anchored on a novel heavy hitter detection algorithm that scales favorably with tree depth. The theoretical contribution, a hierarchical DP algorithm with O(sqrt(log h)) error, unlocks substantially deeper trees than prior work and signals a meaningful shift in how practitioners might balance privacy guarantees against model performance on tabular data in healthcare, finance, and other regulated domains.arXiv cs.LG·May 2162
ResearchTools & CodeCyber-Physical Anomaly Detection in IoT-Enabled Smart Grids Using Machine Learning and Metaheuristic Feature OptimizationResearchers are applying genetic-algorithm-driven feature selection to distinguish cyber attacks from natural faults in power grid sensor networks. The work addresses a critical infrastructure vulnerability: as smart grids densify their measurement and control systems, operators face mounting difficulty separating malicious false-data injection from legitimate equipment failures. By reducing the dimensionality of PMU and IED telemetry while maintaining detection reliability, this approach signals growing ML adoption in operational technology security, where model interpretability and physical grounding matter as much as accuracy.arXiv cs.LG·May 2152
ResearchSuperhuman Safe and Agile Racing through Multi-Agent Reinforcement LearningMulti-agent reinforcement learning is emerging as a critical paradigm shift for autonomous systems operating in shared, dynamic environments. This arXiv paper demonstrates that single-agent approaches, which dominate current physical AI deployments, fail catastrophically when multiple actors interact. Using high-speed quadrotor racing as a stress test, researchers trained agents through league-based self-play to develop anticipatory behaviors like collision avoidance and strategic maneuvering. The work signals that real-world robustness for autonomous systems may require fundamentally rethinking coordination and safety as multi-agent problems rather than isolated control challenges.arXiv cs.LG·May 2162
ResearchTools & CodePlug-in Losses for Evidential Deep Learning: A Simplified Framework for Uncertainty Estimation that Includes the Softmax ClassifierResearchers propose a computationally tractable approximation to Evidential Deep Learning, a framework for uncertainty quantification in neural networks. By replacing complex Dirichlet objectives with simpler plug-in losses evaluated at the distribution mean, the work reduces implementation friction while maintaining theoretical guarantees on approximation error. This matters for practitioners building safety-critical systems in robotics and autonomous vehicles that depend on reliable confidence estimates without prohibitive computational overhead.arXiv cs.LG·May 2152
ResearchModels & ReleasesSeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept GenerationSeqLoRA tackles a core bottleneck in personalized image generation: composing multiple custom concepts without representation collapse. The work uses bilevel optimization to jointly refine LoRA adapter factors while maintaining orthogonality constraints, backed by convergence proofs and catastrophic forgetting bounds. This matters because parameter-efficient fine-tuning has become the standard path for fast model customization, but scaling to multi-concept workflows has remained fragile. The theoretical guarantees and data-driven basis learning signal a maturing approach to modular adaptation that could unlock more reliable commercial personalization pipelines.arXiv cs.LG·May 2158