Geometry-Preserving Orthonormal Initialization for Low-Rank Adaptation in RLVR

Researchers have identified a critical gap in how low-rank adaptation (LoRA) variants behave under reinforcement learning with verifiable rewards, revealing that structured initialization methods like PiSSA and MiLoRA can destabilize training despite excelling in supervised fine-tuning. Theoretical analysis shows orthonormal initialization minimizes the performance gap between LoRA and full fine-tuning in RL settings, suggesting that parameter-efficient tuning strategies require fundamentally different initialization approaches depending on the training paradigm. This finding matters for practitioners scaling LLM alignment work, where RL is increasingly central but LoRA's behavior remains poorly characterized.

Modelwire context

Explainer

The paper's core finding is not that orthonormal initialization works well in RL, but that structured initialization methods (PiSSA, MiLoRA) which excel in supervised fine-tuning actively destabilize RL training. This inversion of expectations is what makes the work worth attention.

This connects directly to the June 30 work on geometric mechanisms in neural network learning (Radial Suppression paper). Both identify how initialization and geometric structure of learned representations interact with loss dynamics in ways that differ sharply from supervised settings. Where that work showed how radial expansion delays generalization on algorithmic tasks, this paper shows how structured initialization assumptions break under RL's different optimization pressure. The finding also echoes a broader pattern from the same day: the Genetic Programming initialization study found that warm-starting with domain knowledge provided no lasting advantage, suggesting simpler schemes suffice. Here, the inverse holds for RL contexts, where simpler (orthonormal) initialization outperforms sophisticated priors.

If practitioners report training instability when switching LoRA-based alignment systems from supervised fine-tuning to RL phases without re-initializing, that validates the paper's claim about context-dependent initialization. Conversely, if major open-source RL alignment frameworks (like those from Hugging Face or Anthropic) don't adopt orthonormal LoRA initialization within the next two quarters, it suggests the finding hasn't yet crossed the gap from theory to production practice.

Coverage we drew on

Radial Suppression Accelerates Algorithmic Generalization: A Geometric Analysis of Delayed Generalization · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLoRA · PiSSA · MiLoRA · RLVR

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.