XQ-MEval: A Dataset with Cross-lingual Parallel Quality for Benchmarking Translation Metrics

Researchers introduce XQ-MEval, a benchmark dataset spanning nine language pairs to expose cross-lingual scoring bias in machine translation metrics. The dataset uses semi-automatic error injection and native speaker validation to ensure parallel-quality translations, addressing a gap in systematic evaluation of multilingual systems.
MentionsXQ-MEval · MQM
Read full story at arXiv cs.CL →(arxiv.org)
Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.