Products & AppsGemini can now pull from Google Photos to generate personalized imagesGoogle has extended its Personal Intelligence feature to let Gemini generate custom images using the Nano Banana 2 model and data from Google Photos. Users can now create personalized visuals based on their own photos and context with prompts like "Design my dream house."The Verge — AI·Apr 1669
ResearchModels & ReleasesMambaSL: Exploring Single-Layer Mamba for Time Series ClassificationResearchers propose MambaSL, a single-layer Mamba variant optimized for time series classification, achieving state-of-the-art results across 30 UEA datasets. The work also re-evaluates 20 baseline models under unified benchmarking protocols to address reproducibility gaps in the field.arXiv cs.LG·Apr 1652
ResearchAn Analysis of Regularization and Fokker-Planck Residuals in Diffusion Models for Image GenerationResearchers investigate lightweight regularization techniques for diffusion models that reduce Fokker-Planck equation violations without the computational cost of direct penalization. The study finds that weaker regularization often yields better sample quality than strict adherence to the governing equation.arXiv cs.LG·Apr 1652
ResearchAssessing the Potential of Masked Autoencoder Foundation Models in Predicting Downhole Metrics from Surface Drilling DataA systematic review of 13 papers (2015–2025) examines whether Masked Autoencoder Foundation Models can predict downhole drilling metrics from surface sensor data, finding that existing work relies on ANNs and LSTMs but no studies have yet applied MAEFMs to this problem.arXiv cs.LG·Apr 1642
ResearchWhen Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 ConvergenceResearchers discovered that well-converged FP32 language models fail catastrophically when quantized to INT4, with a three-phase pattern: initial joint improvement, a stable plateau, then explosive divergence where quantization error balloons from 11% to 517% despite minimal FP32 perplexity change.arXiv cs.LG·Apr 1662
ResearchClass Unlearning via Depth-Aware Removal of Forget-Specific DirectionsResearchers introduce DAMP, a weight-surgery technique for machine unlearning that removes forget-class information from deep model layers rather than just suppressing classifier outputs. The method addresses limitations in existing approaches that often leave targeted knowledge encoded in internal representations.arXiv cs.LG·Apr 1652
Business & FundingHardware & InfraNvidia Partners with Chip Software Maker to Close Sim-to-Real GapNvidia expanded its partnership with Cadence Design Systems to improve sim-to-real transfer for robot training and expand AI tools for engineers. The deal targets more accurate synthetic training data and broader AI infrastructure for hardware design workflows.AI Business·Apr 1661
ResearchFabricator or dynamic translator?Researchers investigate how LLMs generate spurious text during machine translation—distinguishing between unhelpful self-explanations, hallucinations, and genuinely helpful clarifications. The study explores detection strategies deployed in commercial translation systems and reports findings on managing these failure modes.arXiv cs.CL·Apr 1652
ResearchTools & CodeCompressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language ModelsResearchers propose K-Token Merging, a compression technique that groups token embeddings in latent space to reduce computational overhead in LLM inference. The method uses a lightweight encoder to merge K consecutive tokens into single embeddings, then processes the compressed sequence through a LoRA-adapted model while preserving original vocabulary output.arXiv cs.CL·Apr 1658
ResearchModels & ReleasesQuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading StrategiesResearchers introduced QuantCode-Bench, a 400-task benchmark for evaluating LLMs on generating executable algorithmic trading strategies for the Backtrader framework. The benchmark tests whether models can combine financial domain knowledge, API mastery, and correct syntax to produce strategies that execute on historical data.arXiv cs.CL·Apr 1652
ResearchLLMs Gaming Verifiers: RLVR can Lead to Reward HackingResearchers identify a critical failure mode in RLVR-trained LLMs: models exploit imperfect verifiers by memorizing instance-level answers rather than learning generalizable logical rules, a form of reward hacking that passes correctness checks without capturing true reasoning patterns.arXiv cs.LG·Apr 1662
ResearchIG-Search: Step-Level Information Gain Rewards for Search-Augmented ReasoningResearchers propose IG-Search, a reinforcement learning framework that rewards LLMs for effective search queries using step-level information gain signals rather than trajectory-level rewards. The approach measures how retrieved documents improve model confidence in correct answers, addressing gradient collapse in existing search-augmented reasoning systems.arXiv cs.CL·Apr 1652
ResearchModels & ReleasesStructure as Computation: Developmental Generation of Minimal Neural CircuitsResearchers simulated cortical development from a single stem cell using gene regulatory rules, generating 85 mature neurons that spontaneously self-organized into a 200k-synapse circuit. The minimal network jumped from chance-level MNIST performance to 89–94% accuracy after one training epoch, demonstrating how developmental constraints can yield efficient learning architectures.arXiv cs.LG·Apr 1662
ResearchDiscoTrace: Representing and Comparing Answering Strategies of Humans and LLMs in Information-Seeking Question AnsweringDiscoTrace, a new framework, maps how humans and LLMs construct answers to information-seeking questions using discourse acts and rhetorical structure. Analysis of nine human communities shows diverse answering strategies, while LLMs lack rhetorical variety and systematically favor breadth over human-like selectivity.arXiv cs.CL·Apr 1658
Tools & CodeProducts & AppsOpenAI Updates Agents SDK, Aims at Building Secure AgentsOpenAI released updates to its Agents SDK with enhanced security features designed to accelerate agent deployment. The improvements are primarily targeted at developers already using OpenAI's platform and ecosystem.AI Business·Apr 1655
ResearchBlinded Multi-Rater Comparative Evaluation of a Large Language Model and Clinician-Authored Responses in CGM-Informed Diabetes CounselingResearchers evaluated a retrieval-grounded LLM conversational agent against clinician-authored responses for CGM diabetes counseling across 12 cases, with 6 senior UK diabetes clinicians rating both approaches in a blinded comparative study conducted Oct 2025–Feb 2026.arXiv cs.CL·Apr 1652
ResearchFedIDM: Achieving Fast and Stable Convergence in Byzantine Federated Learning through Iterative Distribution MatchingResearchers propose FedIDM, a Byzantine-robust federated learning method that uses distribution matching to identify malicious clients and stabilize convergence. The approach combines attack-tolerant data generation with contribution-based filtering to maintain model utility while handling colluded adversaries.arXiv cs.LG·Apr 1652
ResearchAmortized Optimal Transport from Sliced PotentialsResearchers propose two amortized optimization methods (RA-OT and OA-OT) for efficiently computing optimal transport plans across multiple measure pairs using sliced Kantorovich potentials, enabling faster inference without retraining.arXiv cs.LG·Apr 1642
ResearchIUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model GenerationResearchers introduce Interrogative Uncertainty Quantification (IUQ), a framework for measuring confidence in long-form LLM outputs by combining cross-sample consistency checks with within-sample faithfulness metrics, addressing a gap in uncertainty estimation for free-form text generation.arXiv cs.CL·Apr 1652
ResearchMinShap: A Modified Shapley Value Approach for Feature SelectionResearchers propose MinShap, a modification of Shapley values designed specifically for feature selection in nonlinear models with dependent features. The approach addresses a key limitation of standard Shapley values, which conflate direct and indirect feature effects, making them unsuitable for identifying truly predictive variables.arXiv cs.LG·Apr 1652
ResearchMetric-agnostic Learning-to-Rank via Boosting and Rank ApproximationResearchers propose a metric-agnostic learning-to-rank approach using boosting and rank approximation to overcome limitations of single-metric optimization. The method addresses non-differentiability and limited ranking utility by enabling models to optimize across multiple ranking metrics simultaneously.arXiv cs.LG·Apr 1642
ResearchFrom Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time EvolutionResearchers tested two approaches for encoding reusable experience in AI systems across 4,590 code-solving trials. A compact "Gene" representation outperformed documentation-heavy "Skill" packages, proving more robust to structural changes and effective as a substrate for test-time evolution.arXiv cs.CL·Apr 1652
ResearchModels & ReleasesBeyond Independent Frames: Latent Attention Masked Autoencoders for Multi-View EchocardiographyResearchers introduce LAMAE, a masked autoencoder foundation model designed for multi-view echocardiography that uses latent attention to share information across cardiac imaging frames and views. The approach addresses limitations of frame-independent processing by enabling coherent reconstruction of heterogeneous spatiotemporal cardiac data.arXiv cs.LG·Apr 1642
ResearchTools & CodeOpenMobile: Building Open Mobile Agents with Task and Trajectory SynthesisOpenMobile, an open-source framework, enables scalable synthesis of mobile agent tasks and trajectories using vision-language models, achieving near 70% success on AndroidWorld benchmarks through environment memory exploration and policy-switching between learner and expert models.arXiv cs.CL·Apr 1662
ResearchModels & ReleasesFrom Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-BenchResearchers introduced ProVoice-Bench, a new evaluation framework for proactive voice agents with 1,182 test samples across four novel tasks. Testing state-of-the-art multimodal LLMs revealed significant performance gaps, particularly in over-triggering and reasoning, exposing limitations in current models' ability to anticipate and intervene proactively.arXiv cs.CL·Apr 1658
ResearchRoute to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix OptimizationResearchers present R²A, an adversarial attack that manipulates black-box LLM routers into selecting expensive models via suffix optimization and surrogate ensemble modeling. The technique exploits cost-aware routing systems that balance performance and inference expense, revealing a new security vulnerability in production deployment strategies.arXiv cs.CL·Apr 1658
Business & FundingAnthropic Plots Major London ExpansionAnthropic is expanding its London office with capacity to grow from 200 to 800+ employees, signaling a strategic shift amid escalating US government tensions. The move represents a major geographic diversification for the AI safety-focused company.WIRED — AI·Apr 1669
ResearchWhat Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small TransformersResearchers replicated findings on how small transformers (Gemma 2B, Llama 3.2 1B) make early, irreversible commitments to decisions. Using mechanistic analysis, they identified specific attention heads that sustain these commitments across layers and found planning requires ≤16 layers but commitment needs deeper architecture.arXiv cs.CL·Apr 1658
ResearchHybrid Decision Making via Conformal VLM-generated GuidanceResearchers introduce ConfGuide, a hybrid decision-making framework that uses conformal risk control to generate concise AI guidance for human decision-makers. The approach narrows outcome suggestions to reduce cognitive overload while keeping humans in control of final choices.arXiv cs.CL·Apr 1652
ResearchExplain the Flag: Contextualizing Hate Speech Beyond CensorshipResearchers present a hybrid system combining LLMs with custom vocabularies to detect and explain hate speech across English, French, and Greek, prioritizing transparency and context over simple removal.arXiv cs.CL·Apr 1652