PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature
PubMed-Ophtha addresses a critical bottleneck in medical AI: the scarcity of large, high-quality domain-specific vision-language datasets. This 102K image-caption corpus extracted from open-access ophthalmology literature represents a shift toward structured, modality-aware training data that goes beyond generic image collections. The hierarchical decomposition of figures into panels and individual images, paired with imaging-type annotations, creates a foundation for specialized clinical models that can ground themselves in peer-reviewed context. For practitioners building medical AI, this signals both the feasibility and necessity of dataset curation tailored to narrow specialties, potentially unlocking faster iteration on domain models without licensing friction.58










