Accurate and Efficient Statistical Testing for Word Semantic Breadth
Contextualized embeddings have enabled measurement of semantic breadth by treating word meanings as dispersed token clouds, but naive statistical testing on dispersion introduces systematic bias. This work addresses a methodological flaw in how NLP researchers compare semantic scope across words, showing that directional shifts in embedding space can falsely inflate significance. The fix matters for downstream applications like thesaurus construction and domain lexicon design, where incorrect breadth rankings could propagate into production systems relying on these embeddings.52






















