Models & Releases Research Products & Apps Business & Funding

Modelwire

A curated feed of what matters in AI. Independent, ad-supported, built in Denver, Colorado.

Read

Today
Models & Releases
Research
Business & Funding

About

About Modelwire
Methodology
Our sources
Editor's notes
Contact
Advertise

Legal

Privacy policy
Terms of use
DMCA & takedowns
Corrections

© 2026 Modelwire. All article links go to the original publishers.Summaries generated by Modelwire. We don’t republish full articles.

Earlier stories

The full Modelwire feed, ordered by publish time.

Illustration for: Heterogeneity in Formal Linguistic Competence of Language Models: Is Data the Real Bottleneck?

Heterogeneity in Formal Linguistic Competence of Language Models: Is Data the Real Bottleneck?

Researchers found that GPT-2 Small models trained on web data struggle with specific grammatical constructions, but injecting just 1% synthetic data targeting those phenomena recovered performance across 8 of 9 failing linguistic benchmarks, suggesting data scarcity rather than architectural limits drive formal linguistic gaps.

arXiv cs.CL·Apr 20

62

Illustration for: HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment

HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment

Researchers propose HEAL, a framework addressing entropy collapse in few-shot reinforcement learning for language models. The method combines general-domain data with entropy dynamics alignment to improve exploration and reasoning performance in low-resource settings.

arXiv cs.LG·Apr 20

52

Illustration for: Prompting Foundation Models for Zero-Shot Ship Instance Segmentation in SAR Imagery

Research Models & Releases

Prompting Foundation Models for Zero-Shot Ship Instance Segmentation in SAR Imagery

Researchers combined YOLOv11 ship detection with Segment Anything Model 2 to perform zero-shot instance segmentation on SAR maritime imagery without mask annotations. The approach uses spatial constraints from a SAR-trained detector to regularize foundation model predictions, sidestepping the need for fine-tuning or adapters.

arXiv cs.LG·Apr 20

52

Illustration for: Fisher Decorator: Refining Flow Policy via A Local Transport Map

Fisher Decorator: Refining Flow Policy via A Local Transport Map

Researchers propose Fisher Decorator, a geometric refinement to flow-based offline reinforcement learning that replaces isotropic L2 regularization with anisotropic policy-aware constraints. The method addresses a fundamental mismatch between behavioral policy structure and existing optimization approaches, potentially improving expressiveness and sample efficiency in offline RL.

arXiv cs.LG·Apr 20

52

Illustration for: Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought

Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought

Researchers propose Calibrated Attempt-Level GRPO, a reinforcement learning method that fixes gradient bias when training reasoning models to iteratively refine chain-of-thought solutions across multiple attempts. The technique enables models to learn from per-attempt feedback while maintaining low variance, improving performance on problems requiring successive reasoning steps.

arXiv cs.LG·Apr 20

58

Illustration for: LoReC: Rethinking Large Language Models for Graph Data Analysis

Research Models & Releases

LoReC: Rethinking Large Language Models for Graph Data Analysis

Researchers propose LoReC, a method to fix a core limitation in graph-LLM systems: LLMs struggle to process and retain graph structure, underperforming traditional GNNs on graph tasks. The plug-and-play approach uses a three-stage look-remember-contrast pipeline to improve LLM comprehension of relational data.

arXiv cs.LG·Apr 20

58

Illustration for: Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study

Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study

Researchers tested whether explicitly encoding physical constraints like obstacle avoidance during training improves Vision-Language-Action robot policies. Adding geometry-grounded feasibility supervision to diffusion-based VLA models shows promise as structured guidance beyond what imitation learning alone can infer.

arXiv cs.LG·Apr 20

52

Illustration for: Automatic Slide Updating with User-Defined Dynamic Templates and Natural Language Instructions

Research Tools & Code

Automatic Slide Updating with User-Defined Dynamic Templates and Natural Language Instructions

Researchers introduced DynaSlide, a 20K-example benchmark for automatically updating presentation slides via natural language commands on custom templates, plus SlideAgent, an agent framework combining multimodal parsing and language models to handle real-world business reporting decks.

arXiv cs.CL·Apr 20

52

$Illustration for: LEPO: \underline{L}atent R\underline{e}asoning \underline{P}olicy \underline{O}ptimization for Large Language~Models$

Research Models & Releases

LEPO: \underline{L}atent R\underline{e}asoning \underline{P}olicy \underline{O}ptimization for Large Language~Models

Researchers introduce LEPO, a reinforcement learning framework that applies policy optimization directly to continuous latent representations in LLMs by injecting controllable stochasticity via Gumbel-Softmax. The method restores exploration capacity lost in deterministic latent reasoning, enabling RL training on hidden model states rather than token sequences.

arXiv cs.LG·Apr 20

58

Illustration for: Latent Preference Modeling for Cross-Session Personalized Tool Calling

Research Tools & Code

Latent Preference Modeling for Cross-Session Personalized Tool Calling

Researchers introduced MPT, a 265-dialogue benchmark for personalized tool calling in LLM agents, and PRefine, a memory-augmented method that cuts token usage to 1.24% of full-history prompting while handling incomplete user requests across sessions.

arXiv cs.CL·Apr 20

58

Illustration for: Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer

Research Tools & Code

Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer

Researchers propose replacing code as the primary engineering artifact with a typed property graph consensus layer, arguing that current AI-assisted development workflows collapse system complexity into opaque chat histories that obscure dependencies and make debugging regressions impossible.

arXiv cs.LG·Apr 20

58

Illustration for: GraSP: Graph-Structured Skill Compositions for LLM Agents

Research Tools & Code

GraSP: Graph-Structured Skill Compositions for LLM Agents

Researchers propose GraSP, a skill graph architecture that solves a counterintuitive problem in LLM agents: more skills degrade performance. The system structures skills as directed acyclic graphs with explicit dependencies, enabling agents to select and compose only relevant capabilities rather than drowning in documentation.

arXiv cs.CL·Apr 20

62

Illustration for: Latent Abstraction for Retrieval-Augmented Generation

Latent Abstraction for Retrieval-Augmented Generation

Researchers propose LAnR, a unified RAG framework where a single LLM performs retrieval and generation within its latent space rather than generating natural language queries. The approach eliminates architectural separation between retriever and generator, potentially reducing hallucinations while improving factuality.

arXiv cs.CL·Apr 20

58

Illustration for: M100: An Orchestrated Dataflow Architecture Powering General AI Computing

Hardware & Infra Research

M100: An Orchestrated Dataflow Architecture Powering General AI Computing

Li Auto unveiled M100, a custom AI chip architecture designed to handle autonomous driving inference, LLM serving, and in-car AI interactions with better efficiency and cost than general-purpose GPUs. The dataflow-based design uses compiler-architecture co-optimization to balance performance across diverse automotive AI workloads.

arXiv cs.LG·Apr 20

58

Illustration for: On the Emergence of Syntax by Means of Local Interaction

On the Emergence of Syntax by Means of Local Interaction

Researchers trained a tiny 18K-parameter neural cellular automaton to parse arithmetic expressions using only a 1-bit boundary signal, and it spontaneously developed an internal structure resembling CKY parsing that generalizes beyond training data and aligns with grammatical structure (r≈0.71).

arXiv cs.CL·Apr 20

62

Illustration for: QuickScope: Certifying Hard Questions in Dynamic LLM Benchmarks

Research Tools & Code

QuickScope: Certifying Hard Questions in Dynamic LLM Benchmarks

Researchers propose QuickScope, a methodology for efficiently identifying weak spots in dynamic LLM benchmarks by adapting Bayesian optimization. The approach addresses the computational cost of evaluating models across template-generated question variants, offering practitioners a tool to pinpoint failure modes without exhaustive testing.

arXiv cs.CL·Apr 20

52

Illustration for: Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs

Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs

Researchers decomposed Mixture-of-Experts layers into separate control and content channels, revealing that routing signals encode abstract functions while content preserves surface features. The finding suggests MoE specialization emerges from low-bandwidth routing constraints, with implications for understanding and designing sparse models.

arXiv cs.CL·Apr 20

62

Illustration for: Learning to Seek Help: Dynamic Collaboration Between Small and Large Language Models

Learning to Seek Help: Dynamic Collaboration Between Small and Large Language Models

Researchers propose a framework where smaller language models learn to dynamically request help from larger ones during reasoning tasks, with results showing stronger SLMs become more independent while stronger LLMs enable sparser, higher-value interactions. The work addresses the efficiency-capability tradeoff by treating collaboration as a learned skill rather than a fixed pipeline.

arXiv cs.CL·Apr 20

58

Illustration for: PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking

Research Models & Releases

PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking

Researchers propose PDDL-Mind, a neuro-symbolic framework that grounds LLM theory-of-mind reasoning in explicit state representations using Planning Domain Definition Language. The approach decouples world state tracking from belief inference, addressing failures on benchmarks like MMToM-QA by replacing implicit reasoning with logically consistent symbolic states.

arXiv cs.CL·Apr 20

58

Illustration for: Navigating the Conceptual Multiverse

Navigating the Conceptual Multiverse

Researchers built an interactive system that exposes the hidden conceptual choices language models make when solving open-ended problems, letting users inspect and modify these decisions against domain-specific reasoning standards. The work adapts multiverse analysis from statistics to create verifiable decision structures that prevent models from obscuring their reasoning.

arXiv cs.CL·Apr 20

58

Illustration for: Enabling AI ASICs for Zero Knowledge Proof

Research Hardware & Infra

Enabling AI ASICs for Zero Knowledge Proof

Researchers introduced MORPH, a framework that reformulates zero-knowledge proof computations to run efficiently on AI ASICs like TPUs by converting high-precision modular arithmetic into dense matrix operations. The work bridges cryptographic workloads and commodity ML hardware, potentially unlocking cheaper ZKP proving for blockchain and privacy applications.

arXiv cs.CL·Apr 20

58

Illustration for: Bridging the Reasoning Gap in Vietnamese with Small Language Models via Test-Time Scaling

Research Models & Releases

Bridging the Reasoning Gap in Vietnamese with Small Language Models via Test-Time Scaling

Researchers tackle the reasoning gap in small language models for Vietnamese by applying test-time scaling to Qwen3-1.7B. A new dataset (Vi-S1K) and benchmark (Vi-Elementary-Bench) reveal the base model has latent knowledge but struggles with output formatting, suggesting a path to deploy sophisticated reasoning on resource-constrained devices.

arXiv cs.CL·Apr 20

52

Illustration for: DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

Research Hardware & Infra

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

Researchers propose DuQuant++, a fine-grained rotation technique that improves MXFP4 quantization for LLM inference by targeting activation outliers that degrade precision in Nvidia Blackwell's microscaling format. The method outperforms data-agnostic rotation approaches by adapting to where outliers concentrate within tensor blocks.

arXiv cs.CL·Apr 20

58

Illustration for: Claude Token Counter, now with model comparisons

Claude Token Counter, now with model comparisons

Simon Willison upgraded his Claude Token Counter tool to compare tokenization across different Claude models. Claude Opus 4.7 introduced the first tokenizer change in the Claude family, making cross-model comparison newly relevant for developers optimizing API costs.

Simon Willison·Apr 20

64

Illustration for: OpenAI helps Hyatt advance AI among colleagues

Products & Apps Business & Funding

OpenAI helps Hyatt advance AI among colleagues

Hyatt is rolling out ChatGPT Enterprise across its global workforce, leveraging GPT-5.4 and Codex to streamline operations and guest-facing services. The deployment signals enterprise AI adoption at scale in hospitality, where labor efficiency and personalization directly impact margins.

OpenAI·Apr 20

68

Illustration for: Anthropic and Amazon expand collaboration for up to 5 gigawatts of new compute

Business & Funding Hardware & Infra

Anthropic and Amazon expand collaboration for up to 5 gigawatts of new compute

Anthropic and Amazon are deepening their partnership with a commitment to up to 5 gigawatts of new compute capacity, a massive infrastructure expansion that signals confidence in scaling frontier AI systems and underscores the capital intensity of competing at the frontier.

Anthropic·Apr 20

100

Illustration for: Headless everything for personal AI

Opinion & Analysis Products & Apps

Headless everything for personal AI

Matt Webb argues headless APIs will proliferate as personal AI agents become the preferred interface to services, bypassing traditional GUIs. Salesforce's new Headless 360 product signals enterprise adoption of this architectural shift.

Simon Willison·Apr 19

77

Illustration for: The 12-month window

Opinion & Analysis Business & Funding

The 12-month window

TechCrunch examines how many AI startups rely on gaps in foundation model coverage, betting they'll remain unfilled long enough to build defensible businesses—a window that won't stay open indefinitely as model capabilities expand.

TechCrunch — AI·Apr 19

58

Illustration for: Anthropic's revenue surge reportedly fuels talk of trillion-dollar valuation

Business & Funding

Anthropic's revenue surge reportedly fuels talk of trillion-dollar valuation

Anthropic has reportedly achieved $30 billion in annualized revenue, surpassing OpenAI and reversing its prior losses. Investors are now valuing the company at up to $1 trillion, marking a dramatic shift in its financial trajectory.

The Decoder·Apr 19

92

Illustration for: German court rules AI comic adaptation of copyrighted photo doesn't violate the original

Policy & Regulation

German court rules AI comic adaptation of copyrighted photo doesn't violate the original

A German Higher Regional Court determined that AI-generated comic adaptations of copyrighted photographs don't infringe copyright when only the subject matter is transformed, not the original image itself. The ruling clarifies fair use boundaries for AI-driven creative transformations in EU jurisprudence.

The Decoder·Apr 19

73

Older stories →