Research Models & Releases·arXiv cs.LG·1d ago

CoralBay: A Self-Supervised CT Foundation Model

CoralBay addresses a structural gap in medical AI by applying self-supervised learning to volumetric CT data, where 2D pre-training paradigms fail to capture spatial continuity and tissue-specific properties like Hounsfield Units. The framework extends DINO with a hierarchical 3D Swin backbone and multi-scale feature distillation, enabling efficient foundation model training on unlabeled medical imaging at scale. This work signals growing recognition that domain-specific self-supervision, not just scaling natural-image methods, unlocks transfer learning in specialized modalities. Success here could reshape how medical AI teams approach pre-training and reduce annotation burden across radiology workflows.

Modelwire context

Explainer

CoralBay's contribution is narrower than the summary suggests: it's not that self-supervision works on CT data (it does), but that hierarchical 3D architectures with multi-scale distillation are necessary to preserve the spatial and physical properties (like Hounsfield Units) that 2D methods discard. The actual novelty is architectural, not conceptual.

This connects directly to the pattern established in recent clinical AI work: domain-specific adaptation of foundation models beats generic scaling. The Llama-3 fine-tuning for clinical provenance categorization (June 1st) and the self-harm surveillance pipeline both showed that healthcare AI requires task-aware pre-training or adaptation to work reliably. CoralBay extends that logic to imaging modalities, where the domain constraint is geometric rather than linguistic. Unlike the multimodal continual learning papers (CRAM, ProtoAda), which solve routing and forgetting problems, CoralBay solves a prior problem: ensuring the foundation model itself captures domain-specific structure before any downstream task is defined.

If CoralBay's pre-trained weights are released and downstream fine-tuning on standard radiology benchmarks (e.g., CheXpert, RSNA Pneumonia) shows consistent gains over ImageNet-initialized 3D models with the same compute budget, that confirms the 3D self-supervision hypothesis. If gains disappear when tested on tasks where Hounsfield Unit semantics don't matter (e.g., synthetic or non-medical volumetric data), that reveals the contribution is domain-specific rather than architectural.

Coverage we drew on

Towards Multidisciplinary Summarization of Hospital Stays: Efficient Sentence-Level Clinical Provenance Categorization · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsCoralBay · DINO · Swin · CT imaging

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.