Research Models & Releases·arXiv cs.LG·1d ago

Foundation Models vs. Radiomics for Lung Computed Tomography: A Benchmark of Feature Extractors, Classification Heads, and Segmentation Choices

Illustration accompanying: Foundation Models vs. Radiomics for Lung Computed Tomography: A Benchmark of Feature Extractors, Classification Heads, and Segmentation Choices

A systematic benchmark comparing foundation models against classical radiomics for lung cancer diagnosis reveals how feature extraction, classifier choice, and segmentation strategy each shape real-world performance. Testing five extractors (including DINOv3 and Curia variants) across seven classifiers on survival prediction, histology, and staging tasks, researchers prioritized cross-cohort robustness over in-distribution accuracy, exposing which architectural combinations generalize beyond their training hospital. This work matters because medical AI deployment hinges on worst-case external validity, not benchmark leaderboards, and isolating each component's contribution helps practitioners avoid false confidence in foundation model hype.

Modelwire context

Explainer

The study's real contribution isn't that foundation models sometimes beat radiomics or vice versa, but that segmentation strategy and classifier choice often matter more than the feature extractor itself. This inverts the typical narrative where architecture dominates.

This work shares DNA with the RF drone benchmark paper from earlier today, which exposed how standard evaluation splits mask overfitting in time-series tasks through data leakage. Both papers argue that methodological choices in how you slice and validate data can inflate reported performance beyond what generalizes. The lung CT study goes further by systematically isolating each component (feature extractor, classifier, segmentation) to show practitioners which knobs actually control real-world robustness. Where the drone paper caught a flaw, this one builds a framework to avoid it in medical imaging.

If the same five extractors are benchmarked on an independent lung cancer cohort (NLST or similar) within six months and the ranking of classifiers holds stable, that confirms the generalization claims. If rankings flip, the current findings are cohort-specific and the paper's guidance for practitioners becomes less actionable.

Coverage we drew on

How Much Do RF Drone Benchmarks Overstate? A Controlled Study and Theory of Data Leakage in UAV Signal Identification · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsCuria · DINOv3 · TabPFN · XGBoost · LUNG1 · LUNG2

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Research

Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies

arXiv cs.CL·1d ago

Research

How Much Do RF Drone Benchmarks Overstate? A Controlled Study and Theory of Data Leakage in UAV Signal Identification

arXiv cs.LG·1d ago

Research

The Course of News Events: A Comparison of Bottom-Up and Top-Down Approaches for Collecting Text-Based Data about Disasters

arXiv cs.CL·1d ago

Foundation Models vs. Radiomics for Lung Computed Tomography: A Benchmark of Feature Extractors, Classification Heads, and Segmentation Choices

Modelwire context

Coverage we drew on

Modelwire Editorial

Related

Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies

How Much Do RF Drone Benchmarks Overstate? A Controlled Study and Theory of Data Leakage in UAV Signal Identification

The Course of News Events: A Comparison of Bottom-Up and Top-Down Approaches for Collecting Text-Based Data about Disasters