Research·arXiv cs.CL·5d ago

Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus

Researchers tested whether semantic relationships between text embeddings survive machine translation, using 2,800+ political manifestos across 28 languages translated via EU eTranslation. By measuring inter-model disagreement as a calibration baseline, they identified which languages preserve embedding structure through translation and which degrade it. The finding matters for practitioners deploying multilingual NLP systems: translation fidelity varies sharply by language pair and embedding model, suggesting that cross-lingual semantic search and similarity tasks require language-specific validation rather than assuming invariance.

Modelwire context

Explainer

The paper's key contribution isn't just that translation degrades embeddings (known), but that it does so inconsistently by language pair and model. This means you can't apply a single cross-lingual strategy; you need language-specific validation before deploying multilingual semantic search.

This connects directly to the ML-Bench&Guard work from May 1st, which flagged that existing multilingual systems rely on machine translation without validating whether semantic meaning survives the conversion. That paper focused on safety guardrails; this one measures the underlying embedding fidelity problem that makes cross-lingual safety enforcement unreliable in the first place. Together they suggest that practitioners building systems across language borders face a two-layer validation burden: first confirming that embeddings preserve meaning, then confirming that safety semantics transfer correctly.

If EU eTranslation or a major embedding provider (OpenAI, Cohere, Mistral) publishes language-pair specific performance cards or recommends language-specific retraining by Q4 2026, that signals the industry is operationalizing this finding. If they don't, multilingual deployments will continue treating translation as a transparent layer, and failures will accumulate quietly in production.

Coverage we drew on

ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsEU eTranslation · Manifesto Corpus

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Research

Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe

arXiv cs.CL·5d ago

Research

ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models

arXiv cs.CL·5d ago

Research

MIT study explains why scaling language models works so reliably

The Decoder·3d ago

Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus

Modelwire context

Coverage we drew on

Modelwire Editorial

Related

Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe

ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models

MIT study explains why scaling language models works so reliably