Modelwire
Subscribe

Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings

Illustration accompanying: Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings

Researchers tested eight competing Shapley value variants across fraud detection and risk workflows with 3,735 professional analyst reviews, finding that standard quantitative metrics for explainability don't correlate with what actually helps humans make decisions in high-stakes settings.

Modelwire context

Explainer

The deeper finding here isn't just that metrics are imperfect: it's that the entire benchmarking infrastructure for Shapley-based XAI may be optimizing for the wrong objective from the start, measuring mathematical properties of explanations rather than whether those explanations change analyst behavior in useful ways.

This connects most directly to the 'White-Box Signal-Subspace Probe' paper from the same day, which also challenges opaque learned representations by substituting interpretable signal components. Both papers are pushing toward a similar position: that interpretability tools need to be validated against what practitioners actually do with them, not just against internal consistency measures. The broader cluster of April 24 arXiv work on this site is otherwise focused on efficiency and control problems, so this paper and the WG-SRC piece form a small but coherent thread about the gap between what models expose and what humans can use.

Watch whether fraud detection vendors or financial regulators cite this study when revisiting model explainability requirements. If a regulatory body references human-centered XAI evaluation criteria in updated guidance within the next 12 months, this line of research will have found real institutional traction.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsShapley values

MW

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings · Modelwire