Products & AppsMicrosoft’s Edge Copilot update uses AI to pull information from across your tabsMicrosoft is expanding Edge Copilot's scope beyond single-page interactions by enabling the chatbot to synthesize information across a user's entire browser session. This represents a meaningful shift in how conversational AI integrates with everyday workflows: rather than treating each query in isolation, the system now operates as a cross-tab reasoning layer that can compare products, extract key points from multiple articles, and answer questions grounded in real-time browsing context. For product teams building AI assistants, this signals the competitive pressure to move beyond document-scoped or search-scoped models toward stateful, multi-source synthesis as table stakes.The Verge - AI·4d ago65
Products & AppsTools & CodeNotion just turned its workspace into a hub for AI agentsNotion is positioning itself as an orchestration layer for agentic workflows by opening its workspace to third-party AI agents, data connectors, and custom logic. This move signals a strategic pivot from document-centric productivity toward agent-native infrastructure, directly competing with platforms like Zapier and n8n while leveraging Notion's embedded user base. The shift matters because it transforms how teams will compose multi-agent systems without leaving their primary workspace, potentially reshaping the developer tool landscape for AI automation.TechCrunch - AI·4d ago69
Hardware & InfraPolicy & RegulationMusk’s xAI is running nearly 50 gas turbines unchecked at its Mississippi data centerxAI's Colossus 2 data center in Mississippi faces legal scrutiny over its deployment of nearly 50 mobile gas turbines to power AI infrastructure. The lawsuit highlights a critical tension in scaling frontier AI compute: the energy demands of large language model training now rival industrial operations, forcing companies to adopt unconventional power solutions that bypass traditional utility frameworks. This case signals how infrastructure constraints and regulatory gaps are becoming material business risks for AI labs competing on compute scale.TechCrunch - AI·4d ago69
Products & AppsOpinion & AnalysisAnthropic’s Cat Wu says that, in the future, AI will anticipate your needs before you know what they areAnthropic's product leadership is signaling a strategic pivot toward proactive AI systems that anticipate user intent rather than merely responding to explicit requests. This represents a meaningful shift in how frontier labs are thinking about the next generation of AI assistants, moving beyond reactive chat interfaces toward systems that model user context and goals. For builders and enterprise adopters, this signals where Claude's roadmap is headed and raises questions about how other labs will compete on predictive capability and user modeling.TechCrunch - AI·4d ago65
ResearchOpinion & AnalysisWhat It Will Take to Make AI SustainableThe sustainability of AI infrastructure hinges on two overlooked gaps: transparent emissions accounting and visibility into actual deployment patterns. Researcher Sasha Luccioni's argument surfaces a critical blind spot in the industry's environmental narrative. Without granular data on how models consume energy across diverse use cases and geographies, claims about efficiency improvements remain unverifiable. This matters because infrastructure decisions made today lock in carbon footprints for years. For practitioners and procurement teams, the implication is stark: vendor sustainability claims need third-party validation, not marketing copy. The broader landscape shift is toward treating emissions as a compliance and competitive metric, not an afterthought.WIRED - AI·4d ago69
Products & AppsBusiness & FundingAnthropic Further Targets Legal With New ConnectorsAnthropic is expanding its enterprise footprint by releasing connectors that integrate its LLMs into legal workflows, signaling a deliberate pivot beyond research and consumer applications toward vertical-specific business solutions. This move mirrors the industry-wide shift toward domain-tailored AI deployment, where foundation model providers compete not just on raw capability but on ease of integration into existing enterprise stacks. For legal tech vendors and enterprises evaluating LLM providers, Anthropic's connector strategy suggests a maturing go-to-market approach that prioritizes adoption friction reduction over raw model superiority.AI Business·4d ago55
Policy & RegulationHardware & InfraDHS Plans Experiment Running ‘Reconnaissance’ Drones Along the US-Canada BorderThe Department of Homeland Security is piloting autonomous surveillance systems along the US-Canada border this fall, deploying AI-driven drones and ground vehicles to transmit real-time tactical data via 5G infrastructure. This marks a significant expansion of autonomous decision-making in border security and signals growing government investment in edge AI systems for critical infrastructure. The bilateral experiment tests whether distributed autonomous agents can operate reliably in remote, high-stakes environments, a capability with implications for both public-sector AI adoption and the infrastructure demands of autonomous systems at scale.WIRED - AI·4d ago69
Business & FundingHardware & InfraTencent plans to ramp up AI spending as China's chip supply allegedly improvesTencent's commitment to expand AI infrastructure investment signals confidence in China's domestic chip ecosystem as supply constraints ease. The timing matters: ramped spending in H2 2026 follows improved output from local chipmakers, reducing reliance on foreign semiconductors and reshaping competitive dynamics in large-scale model training. Concurrent stake negotiations with Deepseek suggest Tencent is hedging across multiple frontier AI players while securing hardware independence, a strategic posture that could accelerate China's AI capability development and fragment the global inference market.The Decoder·4d ago73
Business & FundingAnthropic overtakes OpenAI in B2B adoption for the first time according to Ramp spending dataAnthropic has surpassed OpenAI in B2B enterprise adoption for the first time, capturing 34.4 percent of US companies tracked by Ramp's spending index versus OpenAI's 32.3 percent. The shift reflects Anthropic's aggressive market penetration over the past year, though the lead remains fragile. The article identifies three structural vulnerabilities that could reverse this momentum, signaling that enterprise AI vendor consolidation remains unsettled and that market share gains among frontier labs are still volatile enough to reshape competitive positioning within quarters.The Decoder·4d ago85
Products & AppsPolicy & RegulationAI chatbots are giving out people’s real phone numbersGoogle's AI systems are leaking personal phone numbers to users who query them, creating a real-world harm vector that exposes the tension between retrieval-augmented generation and privacy. The incident reveals a critical gap in how LLM-powered search products handle personally identifiable information: without clear opt-out mechanisms, individuals face harassment campaigns triggered by AI-mediated disclosure. This surfaces a broader infrastructure problem for the industry: as AI systems increasingly synthesize and surface web-indexed data, the absence of privacy controls becomes a liability for both platforms and users, forcing a reckoning around data governance in production AI systems.MIT Technology Review - AI·4d ago84
ResearchModels & ReleasesWARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training DataWARDEN demonstrates a practical shift in how language models handle extreme data scarcity, splitting transcription and translation into separate pipelines rather than forcing end-to-end training on 6 hours of audio. This architectural choice reflects a broader trend in applied ML: when scale assumptions break down, decomposition and domain-specific techniques become competitive with unified models. The work matters beyond linguistics because it signals viable patterns for deploying AI in low-resource contexts where large-scale datasets will never exist, forcing the field to rethink whether monolithic architectures are actually necessary.arXiv cs.CL·4d ago58
ResearchTools & CodeEVA-Bench: A New End-to-end Framework for Evaluating Voice AgentsEVA-Bench tackles a critical gap in voice AI evaluation by introducing the first end-to-end framework that both simulates realistic multi-turn spoken conversations and measures performance across voice-specific failure modes. The framework automates bot-to-bot dialogue generation with built-in validation to catch simulator errors, then applies composite metrics designed for voice agents rather than text-based systems. This addresses a pressing infrastructure need as enterprises deploy conversational AI at scale, where existing benchmarks fail to capture the full complexity of spoken interaction failures. For teams building or deploying voice systems, standardized evaluation methodology directly impacts production reliability and competitive positioning.arXiv cs.CL·4d ago62
ResearchWhat is Learnable in Valiant's Theory of the Learnable?A new characterization of Valiant's original 1984 learning model reveals that learnability hinges on adaptive query-compression schemes, not the PAC framework commonly attributed to that work. This theoretical refinement matters because it clarifies foundational assumptions in computational learning theory and reframes what 'learnable' means when a system can query an oracle and must avoid false positives. The result reshapes how researchers think about sample efficiency and the role of interaction in learning, with implications for understanding the limits of supervised learning systems that operate under strict correctness constraints.arXiv cs.LG·4d ago52
ResearchTools & CodeGood Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your WeightsResearchers propose TFlow, a weight-space communication protocol that lets multi-agent LLM systems bypass token serialization by directly compiling one agent's hidden states into transient weight perturbations for its peers. This sidesteps the computational drag of natural-language message passing, cutting prefill overhead and KV-cache memory while maintaining a fixed receiver architecture. The shift from token-based to activation-based inter-agent handoffs could reshape how production multi-agent systems balance interpretability against efficiency, particularly for latency-sensitive or resource-constrained deployments.arXiv cs.CL·4d ago62
ResearchTools & CodeR-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh FlowR-DMesh tackles a practical bottleneck in video-driven 3D animation: mesh-to-video pose misalignment. The framework uses a novel VAE architecture to decouple geometry from motion, enabling high-fidelity 4D mesh generation that automatically rectifies initial pose mismatch without distortion. This addresses a real deployment friction point that has limited adoption of motion-transfer systems in production pipelines, making it relevant to studios and game developers integrating AI-assisted animation workflows.arXiv cs.LG·4d ago58
ResearchModels & ReleasesTopology-Preserving Neural Operator Learning via Hodge DecompositionResearchers propose a neural operator framework that uses Hodge decomposition to separate learnable geometric dynamics from topological invariants in physical field equations. By decomposing solution operators into structure-preserving subspaces, the method reduces spectral interference and improves generalization on mesh-based problems. This addresses a fundamental challenge in physics-informed machine learning: operators trained on one geometry often fail on others. The Hodge Spectral Duality architecture combines discrete differential forms with auxiliary ambient spaces, offering a principled inductive bias for scientific computing models that must respect underlying mathematical structure.arXiv cs.LG·4d ago62
ResearchModels & ReleasesQLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token ModelingResearchers propose QLAM, a hybrid quantum-classical architecture that applies quantum superposition principles to state-space modeling for long-sequence tasks. The work targets a core bottleneck in modern sequence models: transformers scale quadratically with context length while SSMs sacrifice expressiveness through linear state transitions. By encoding multiple token dependencies simultaneously in quantum states, QLAM attempts to achieve both linear-time efficiency and richer global pattern capture. This represents an early-stage exploration of quantum computing's practical role in foundation model infrastructure, though real-world viability remains unproven.arXiv cs.LG·4d ago58
ResearchTools & CodeQuantifying Sensitivity for Tree Ensembles: A symbolic and compositional approachResearchers have developed a formal method to quantify robustness vulnerabilities in decision tree ensembles, a class of models widely deployed in safety-critical applications. The work introduces an algorithmic framework that discretizes input space and identifies regions prone to misclassification under small feature perturbations, with certified error bounds. This advances the verification toolkit for production ML systems where adversarial sensitivity poses real operational risk, particularly relevant as enterprises scale tree-based models in regulated domains like finance and healthcare.arXiv cs.LG·4d ago58
ResearchNegation Neglect: When models fail to learn negations in trainingResearchers have identified a critical failure mode in large language model finetuning where models internalize false claims despite explicit negations in training data. When Qwen3.5-397B was finetuned on documents repeatedly flagging fabricated statements as false, belief rates jumped from 2.5% to 88.6%, suggesting models may conflate frequency of claim mention with truth regardless of negation markers. This finding exposes a fundamental gap between contextual understanding and training-time knowledge absorption, with implications for how organizations deploy finetuned models in safety-critical applications and raises questions about whether current architectures can reliably distinguish negated from affirmed propositions during parameter updates.arXiv cs.CL·4d ago72
ResearchReducing cross-sample prediction churn in scientific machine learningA new study exposes a critical blind spot in scientific machine learning: models trained on different data samples agree on overall accuracy but flip predictions on 8-22% of individual test cases. This 'cross-sample prediction churn' undermines confidence in reported benchmarks across chemistry applications. While standard uncertainty techniques (deep ensembles, MC dropout) fail to address it, two data-side methods show promise, with K-bootstrap bagging reducing churn 40-54% without sacrificing accuracy. The finding signals that aggregate metrics mask instability in real-world deployment, forcing practitioners to rethink how they validate and report model reliability.arXiv cs.LG·4d ago62
Policy & RegulationBusiness & FundingAltman forced to confront claims at OpenAI trial that he's a prolific liarSam Altman faces courtroom testimony over credibility claims during an OpenAI legal proceeding, with questioning focused on his account of losing operational control over the organization. The trial surfaces tensions around leadership accountability and governance disputes within one of AI's most influential institutions. For the broader sector, the case underscores how rapidly AI companies' internal power structures and founder narratives can become subject to legal scrutiny, potentially setting precedent for how disputes between founders, boards, and investors in high-stakes AI ventures are adjudicated.Ars Technica - AI·4d ago65
Products & AppsPolicy & RegulationMeta AI gets a private mode where no conversation data is stored on serversMeta is introducing Incognito Chat, a privacy-focused mode for its AI assistant across WhatsApp and the Meta AI app, where conversations are processed on isolated servers inaccessible even to Meta and automatically deleted post-session. The move signals a strategic pivot toward privacy-as-differentiator in consumer AI, positioning Meta against rivals in a landscape where data handling practices increasingly influence adoption. If the technical claims hold, this represents a meaningful shift in how major platforms balance AI utility with user privacy expectations, though verification of Meta's isolation architecture remains critical for credibility.The Decoder·4d ago73
ResearchTools & CodeHarnessing Agentic EvolutionResearchers propose a new framework that treats iterative AI improvement as an interactive environment rather than a fixed procedure or black-box agent. The key insight addresses a real tension in agentic systems: hand-designed evolution loops are rigid but stable, while general-purpose agents adapt flexibly but lose coherence over long horizons. By formalizing accumulated evolution context (candidates, feedback, traces, failures) as a persistent interface, this work enables both modularity and adaptive revision of the search mechanism itself. The approach matters for practitioners building self-improving systems and suggests a path toward more interpretable, steerable autonomous optimization loops.arXiv cs.LG·4d ago62
ResearchUncertainty-Driven Anomaly Detection for Psychotic Relapse Using Smartwatches: Forecasting and Multi-Task Learning FusionResearchers have developed dual smartwatch-based frameworks for detecting psychotic relapse through continuous physiological monitoring, combining forecasting and multi-task learning to flag behavioral anomalies. The systems use Transformer encoders and uncertainty quantification via ensemble MLPs to handle real-world wearable sensor noise, outputting daily risk scores from cardiac, sleep, and motion data. This work exemplifies how digital phenotyping and uncertainty-aware deep learning can translate into clinical applications, pushing the boundary of passive health monitoring beyond fitness tracking into psychiatric intervention.arXiv cs.LG·4d ago58
ResearchTools & CodeProvable Quantization with Randomized Hadamard TransformResearchers have cracked a long-standing efficiency problem in vector quantization by combining randomized Hadamard transforms with dithering, cutting computational cost from quadratic to near-linear while maintaining theoretical guarantees. This matters because quantization underpins critical ML infrastructure: similarity search at scale, federated learning privacy, and the KV cache compression that makes long-context LLMs feasible. The breakthrough bridges the gap between fast-but-loose empirical methods and slow-but-rigorous dense rotations, potentially unlocking tighter compression for production systems without sacrificing speed or accuracy.arXiv cs.LG·4d ago62
ResearchModels & ReleasesParallel Scan Recurrent Neural Quantum States for Scalable Variational Monte CarloResearchers have overcome a long-standing scalability bottleneck in recurrent neural quantum states by applying parallel scan techniques to enable efficient training on quantum many-body problems. This work challenges the assumption that RNNs are inherently sequential and uncompetitive with transformer-based approaches in variational Monte Carlo simulations. The breakthrough matters because it expands the toolkit for neural-network quantum state research, potentially unlocking new applications in materials science and fundamental physics where autoregressive architectures offer interpretability advantages over attention-based alternatives.arXiv cs.LG·4d ago58
ResearchMin-Max Optimization Requires Exponentially Many QueriesTheoretical computer science has established a fundamental barrier in min-max optimization: finding approximate stationary points in nonconvex-nonconcave settings requires query complexity that scales exponentially with precision or dimensionality. This result matters for AI because adversarial training, GANs, and multi-agent reinforcement learning all rely on min-max formulations. The finding suggests inherent computational limits that no algorithm can overcome, reshaping expectations around scalability and convergence guarantees in these domains. Practitioners building robust models through adversarial methods now have formal evidence that certain efficiency gains may be impossible, not just undiscovered.arXiv cs.LG·4d ago58
Products & AppsBusiness & FundingAnthropic launches Claude for Small Business to embed AI into the tools you forgot you pay forAnthropic is moving beyond API access to verticalize Claude for small business operations, bundling 15 pre-built agent workflows tied directly to accounting, payments, and CRM platforms. The strategy signals a shift in how frontier labs monetize: rather than compete on model capability alone, Anthropic is packaging domain-specific automation that reduces friction for SMBs who already own these tools but lack the technical depth to integrate AI themselves. The accompanying training tour and free courses suggest a deliberate play for market share in the underserved small-business AI segment, where adoption barriers are organizational rather than technical.The Decoder·4d ago73
ResearchImproving Reproducibility in Evaluation through Multi-Level Annotator ModelingA new study tackles a critical blind spot in AI evaluation: how annotator disagreement and bias corrupt reproducibility across model safety and utility assessments. The research models individual rater behavior across larger pools than typical practice, revealing that standard 3-5 annotation setups may systematically underestimate variance. This directly impacts how LLMs get certified for deployment, suggesting current benchmarks understate real-world evaluation uncertainty and that scaling annotator diversity could stabilize trustworthiness claims.arXiv cs.LG·4d ago62
ResearchAn LLM-Based System for Argument ReconstructionResearchers have built an end-to-end LLM pipeline that converts natural language arguments into structured logical graphs, decomposing text into premises, conclusions, and their relationships (support, attack, undercut). This work bridges symbolic argumentation theory with neural language models, enabling machines to parse and represent human reasoning patterns at scale. The system's ability to extract logical structure from unstructured text has implications for fact-checking, debate analysis, and reasoning verification in downstream AI applications.arXiv cs.CL·4d ago52