Research Tools & Code·arXiv cs.CL·3d ago

XQ-MEval: A Dataset with Cross-lingual Parallel Quality for Benchmarking Translation Metrics

Researchers introduce XQ-MEval, a benchmark dataset spanning nine language pairs to expose cross-lingual scoring bias in machine translation metrics. The dataset uses semi-automatic error injection and native speaker validation to ensure parallel-quality translations, addressing a gap in systematic evaluation of multilingual systems.

MentionsXQ-MEval · MQM

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.