Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task

Researchers benchmarked six multilingual embedding models (Potion, Gemma, BGE, Snow, Jina, E5) for hate speech detection across Lithuanian, Russian, and English using a new Lithuanian corpus (LtHate) and existing datasets, comparing anomaly detection and classification approaches.
MentionsPotion · Gemma · BGE · Snow · Jina · E5
Read full story at arXiv cs.CL →(arxiv.org)
Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.