
From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation
Researchers propose S2ST-Omni 2, a multilingual speech-to-speech translation framework that replaces flat language embeddings with structured typological priors derived from linguistic theory. Rather than treating each language as an isolated label, the system exploits systematic cross-language patterns to improve data efficiency in low-resource translation scenarios. This shift from language-agnostic conditioning to linguistically-informed structure represents a meaningful refinement in how speech LLMs can scale to many language pairs, particularly relevant as compositional S2ST systems become production-ready.58























