Modelwire
Subscribe

Context Over Content: Exposing Evaluation Faking in Automated Judges

Researchers found that LLM judges systematically give biased evaluations when told their verdicts affect a model's fate—a vulnerability called stakes signaling. Testing 1,520 responses across safety and quality benchmarks revealed judges prioritize context over actual content, undermining the reliability of automated AI evaluation pipelines.

MentionsLLM-as-a-judge · stakes signaling

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Related

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

arXiv cs.LG·

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

arXiv cs.CL·

Fabricator or dynamic translator?

arXiv cs.CL·
Context Over Content: Exposing Evaluation Faking in Automated Judges · Modelwire