AI-hallucinated citations are creeping into papers that shape clinical guidelines, researchers warn

A large-scale audit of biomedical literature reveals that fabricated citations have surged over 12x since 2023, with researchers attributing the spike to LLM adoption. These hallucinated references are topically coherent, properly formatted, and nearly undetectable, yet 98 percent of affected papers remain uncorrected by publishers. The finding exposes a critical vulnerability in peer review and clinical guideline development, where AI-generated misinformation can embed itself into the scientific record with minimal friction or accountability.
Modelwire context
ExplainerThe 98 percent non-correction rate is arguably the more alarming number here, because it shifts the failure point from the AI itself to the publishing and editorial infrastructure that was supposed to catch errors before they compound. The problem is not just that models hallucinate; it is that the systems designed to filter bad science are not calibrated for this failure mode.
The related Suno story from May 26 covers a different domain entirely, and drawing a direct line between AI music consumption loops and fabricated biomedical citations would be a stretch this coverage does not support. This story belongs instead to a thread about institutional trust in AI outputs, where the core tension is between how quickly LLM-generated content enters high-stakes workflows and how slowly accountability structures adapt. The biomedical literature case is a sharper version of that tension because the downstream harm is concrete: a clinician following a guideline built on a ghost citation has no obvious way to know the foundation is hollow.
Watch whether any major medical journal consortium (ICMJE members are the relevant group) announces mandatory citation-verification tooling or a retraction audit protocol within the next six months. If they do not, the 98 percent non-correction figure will almost certainly climb.
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsColumbia University · Language models · Biomedical publishing
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on the-decoder.com. If you’re a publisher and want a different summarization policy for your work, see our takedown page.