Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems

Researchers have formalized how evaluation biases in language models propagate across multi-agent systems, revealing that systematic preferences held by one LLM evaluator contaminate downstream agents' outputs even when using identical base models. The work introduces a mathematical framework quantifying contagion strength and identifies that cross-model agent networks amplify bias spread 3-5x more than homogeneous setups. This finding matters for anyone deploying LLM-based evaluation pipelines in production: bias isn't contained to a single evaluator but cascades through agent interactions, potentially corrupting entire workflows unless explicitly mitigated.

Modelwire context

Analyst take

The 3-5x amplification figure for cross-model networks is the buried detail: organizations that deliberately mix model providers to reduce single-vendor risk may be inadvertently creating faster bias propagation pathways, inverting a common diversification rationale.

This connects directly to two threads in recent coverage. LedgerAgent (story 5) tackled implicit state drift in multi-agent systems by externalizing task state, and the same architectural instinct applies here: if bias propagates through agent interactions, explicit mediation layers become a mitigation candidate, not just a reliability tool. Sovereign Execution Brokers (story 6) pushed further, arguing that runtime enforcement checkpoints are necessary because you cannot trust the reasoning loop alone. Contagion Networks extends that logic to evaluation pipelines specifically, where the contamination vector is subtler than a policy violation but potentially more pervasive at scale.

Watch whether any major evaluation framework (LangChain evals, Inspect, or similar) ships explicit contagion-mitigation primitives within the next two quarters. Adoption there would signal the field treating this as an engineering problem rather than a research curiosity.

Coverage we drew on

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsDeepSeek-chat · Contagion Networks

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.