Research·arXiv cs.LG·May 3

GeoSAE: Geometric Prior-Guided Layer-Wise Sparse Autoencoder Annotation of Brain MRI Foundation Models

Interpretability of medical foundation models has hit a wall: standard sparse autoencoders collapse features in deep layers, and clinical datasets like brain MRI scans confound age with disease signals. GeoSAE solves both by leveraging the model's learned geometric structure to stabilize feature extraction, then deconfounds annotations using partial correlations across 14k scans from ADNI and AIBL. This matters because it unblocks systematic mechanistic understanding of what medical AI actually learns, moving interpretability from a research curiosity to a prerequisite for clinical deployment.

Modelwire context

Explainer

GeoSAE's core contribution isn't just better feature extraction, it's the deconfounding step: using partial correlations to separate age effects from pathology signals across 14k scans. Most interpretability work stops at 'we can see features now.' This one acknowledges that in medical imaging, correlation isn't diagnosis.

This connects directly to the interpretability work from early May on encoding probes, which tackled the problem of reconstructing what models actually learn versus what we assume they learn. GeoSAE extends that concern into the medical domain, where the stakes are higher and the confounds are messier. It also sits beneath the Harvard diagnostic study and Google DeepMind's co-clinician work, both of which claim clinical superiority but don't yet explain what their models are actually using to make decisions. GeoSAE provides a method to answer that question systematically.

If GeoSAE's deconfounded features correlate with known Alzheimer's biomarkers (amyloid, tau) better than the raw model activations do, that validates the approach. If independent groups apply it to other medical foundation models (radiology, pathology) within the next 6 months and report similar stability gains, adoption accelerates. If it doesn't, the geometric prior may be specific to brain MRI architectures.

Coverage we drew on

Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsGeoSAE · Alzheimer's Disease Neuroimaging Initiative · ADNI · AIBL · sparse autoencoders

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.