Research Models & Releases·arXiv cs.CL·5d ago

LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking

LongBEL addresses a fundamental brittleness in biomedical NLP: entity linking systems that process mentions in isolation miss document-level coherence, leading to contradictory predictions when the same concept appears under different names. This generative framework anchors predictions to full-document context and a memory of prior decisions, trained via cross-validated predictions to avoid the train-test mismatch that typically cascades errors in pipeline systems. The approach signals a broader shift toward consistency-aware architectures in specialized domains where coherence across a document matters as much as local accuracy, with validation across multiple languages and benchmarks suggesting practical applicability in clinical and biomedical research workflows.

Modelwire context

Explainer

LongBEL's core novelty isn't just adding document context (prior work has tried that) but training the linking system to avoid the train-test mismatch that cascades errors when pipeline components are trained independently. The cross-validated prediction approach during training is the mechanism that prevents the model from learning shortcuts that vanish at test time.

This connects directly to the RealICU benchmark from the same day, which also flags a fundamental evaluation gap in clinical AI: systems that look accurate on standard metrics often fail when reasoning about incomplete or evolving information. LongBEL tackles the inverse problem at the entity level. Where RealICU asks 'are LLMs actually reasoning about medical trajectories or imitating suboptimal decisions?', LongBEL asks 'can entity linking systems maintain coherent concept resolution across a full document rather than contradicting themselves?' Both papers signal that domain-specific AI evaluation and training in healthcare requires richer consistency checks than single-instance accuracy allows.

If LongBEL's multilingual validation holds up when applied to the UMLS subset used in the RealICU benchmark (if one exists), that confirms the consistency gains transfer to real clinical workflows. If performance degrades when tested on out-of-domain biomedical corpora not seen during cross-validation, that suggests the approach is overfitting to benchmark structure rather than solving the underlying coherence problem.

Coverage we drew on

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLongBEL · UMLS · SNOMED CT

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.