Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs

Researchers decomposed Mixture-of-Experts layers into separate control and content channels, revealing that routing signals encode abstract functions while content preserves surface features. The finding suggests MoE specialization emerges from low-bandwidth routing constraints, with implications for understanding and designing sparse models.

Modelwire context

Explainer

The more precise claim here is that routing operates as a low-bandwidth bottleneck, and that constraint is what forces functional abstraction rather than surface-feature memorization. This reframes MoE specialization as an emergent consequence of architectural pressure, not a designed property.

This connects most directly to the generalization work covered in 'Generalization in LLM Problem Solving: The Case of the Shortest Path' (April 16). That paper found models transfer spatial patterns but collapse under recursive depth, which is consistent with a picture where routing encodes abstract relational structure while content channels stay shallow. If routing really does carry the abstract load, it would help explain why models generalize across surface variation but fail when the task demands composing those abstractions across steps. More broadly, this paper belongs to a growing body of mechanistic work trying to locate where computation actually lives in transformer-family models, a conversation that benchmark-focused coverage largely ignores.

Watch whether interpretability teams at labs running large MoE deployments (Mistral, Google with Gemini) publish follow-up ablations that selectively perturb routing signals while holding content weights fixed. If functional performance degrades faster from routing perturbation than from equivalent content perturbation, this decomposition holds up as more than a descriptive frame.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsMixture-of-Experts · LLM · residual stream

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.