
No Universal Courtesy: A Cross-Linguistic, Multi-Model Study of Politeness Effects on LLMs Using the PLUM Corpus
Researchers tested five major LLMs across English, Hindi, and Spanish to measure how politeness in user prompts affects model output quality. Using 22,500 prompt-response pairs and an eight-factor evaluation framework, they found performance varies significantly by model and language, suggesting politeness effects aren't universal across systems.58




























