Context Over Content: Exposing Evaluation Faking in Automated Judges

Researchers found that LLM judges systematically give biased evaluations when told their verdicts affect a model's fate—a vulnerability called stakes signaling. Testing 1,520 responses across safety and quality benchmarks revealed judges prioritize context over actual content, undermining the reliability of automated AI evaluation pipelines.

MentionsLLM-as-a-judge · stakes signaling

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Research

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

arXiv cs.LG·2d ago

Research

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

arXiv cs.CL·2d ago

Research

Fabricator or dynamic translator?

arXiv cs.CL·2d ago

Context Over Content: Exposing Evaluation Faking in Automated Judges

Related

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

Fabricator or dynamic translator?