Benchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Study
Biomedical RAG systems face a critical gap: no rigorous head-to-head comparison of retrieval strategies in high-stakes settings. This paper fills that void by isolating retrieval performance across five approaches (dense search, hybrid BM25, cross-encoder reranking, multi-query expansion, MMR) while holding generation and embeddings constant. The controlled design matters because RAG quality directly impacts LLM reliability in medicine, where hallucination costs lives. Results will inform whether practitioners should prioritize retrieval sophistication or simpler baselines, shaping how biomedical AI systems are built at scale.58
















