FUSE: Ensembling Verifiers with Zero Labeled Data

Researchers propose FUSE, an unsupervised ensemble method that improves LLM verification without labeled ground truth data. The technique controls dependencies between verifiers using spectral algorithms, matching or beating semi-supervised baselines while eliminating costly annotation requirements.

Modelwire context

Explainer

The real buried lede is the dependency problem: naive ensembles of verifiers assume independence, which inflates confidence when verifiers share the same failure modes. FUSE's spectral approach explicitly models those correlations, which is what lets it work without ground-truth labels rather than just hoping the errors cancel out.

This connects directly to the LLM judge reliability work covered here in mid-April ('Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations'), which found that aggregate consistency scores mask per-instance logical failures in pairwise comparisons. FUSE is essentially attacking the same problem from the other direction: rather than diagnosing individual verifier unreliability after the fact, it tries to hedge against it structurally at ensemble construction time. The SpecGuard paper from the same period ('Verification-Aware Speculative Decoding') is also relevant context, since it treats verification as a first-class inference concern rather than a post-hoc check. Together these papers suggest verification is quietly becoming its own subfield within LLM infrastructure, distinct from the core modeling work.

Watch whether FUSE's gains hold when verifiers are drawn from the same base model family rather than architecturally diverse sources. If performance degrades significantly in that homogeneous setting, the spectral dependency correction may be doing less work than the diversity of the ensemble itself.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsFUSE · LLMs

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.