Transformer Geometry Observatory TGO-II: Representational Similarity Observatory

TGO-II introduces a systematic framework for tracking how Vision Transformer representations reorganize geometrically during training, moving beyond attention-pattern analysis to expose the underlying structural evolution of learned features. This addresses a critical gap in transformer interpretability: while we benchmark downstream performance and dissect attention heads, the actual geometric trajectory of internal representations remains opaque. Using CKA and SVCCA metrics on ViT-Small/16, the work reveals whether representations converge toward stable manifolds, collapse unexpectedly, or undergo phase transitions. For practitioners building interpretable systems and researchers pursuing mechanistic understanding, this shifts the interpretability agenda from behavioral black-box analysis toward geometric first principles, potentially unlocking why transformers generalize and where they fail.

Modelwire context

Explainer

TGO-II treats representation geometry as a first-class object of study, not a byproduct of attention analysis. The key move is tracking how feature manifolds themselves reorganize during training, not just which attention heads fire or where knowledge localizes.

This complements the mechanistic interpretability push we've covered extensively. The KnowledgeDebugger work from July 1st showed how to locate and edit specific facts in transformer weights, but it assumed you already knew what to look for. TGO-II addresses the prior question: how do representations actually structure themselves as the model learns? Similarly, the Model Organism Lottery paper revealed that training methodology shapes mechanistic behavior in ways we don't fully understand. TGO-II's geometric lens offers a way to measure whether those methodological differences produce fundamentally different representational trajectories or just surface-level variations. Both efforts are trying to move interpretability from 'what does the model do' to 'why does it do that'.

If TGO-II's CKA/SVCCA metrics successfully predict which ViT variants will fail on out-of-distribution data before training completes, that validates geometric analysis as a forward-looking diagnostic tool. If the framework only describes post-hoc what we already see in test accuracy, it remains descriptive rather than predictive.

Coverage we drew on

KnowledgeDebugger -- an Exploration Tool for Knowledge Localization and Editing in Transformers · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsVision Transformer · ViT-Small/16 · Transformer Geometry Observatory-II · Centered Kernel Alignment · SVCCA

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.