Research Tools & Code·arXiv cs.LG·2d ago

ZO-Act: Efficient Zeroth-Order Fine-Tuning via One-Shot Activation-Informed Low-Rank Subspaces

ZO-Act addresses a critical bottleneck in LLM adaptation: fine-tuning without backpropagation access or sufficient memory. By anchoring perturbations to activation-derived low-rank subspaces rather than random projections, the method cuts variance and computational overhead while enabling standard optimizers like Adam. This matters for practitioners deploying models on edge hardware, in restricted API environments, or under extreme memory constraints. The technique signals growing sophistication in zeroth-order methods, a category increasingly relevant as model sizes outpace available GPU memory and closed-model APIs limit gradient access.

Modelwire context

Explainer

The key distinction ZO-Act makes is not just efficiency but signal quality: by deriving perturbation subspaces from actual activation patterns rather than random draws, the method reduces gradient estimate variance in a principled way, which is what allows standard optimizers like Adam to function without backpropagation rather than requiring custom zeroth-order update rules.

This connects directly to the quantization work covered in 'Beyond Activation Alignment: The Alignment-Diversity Tradeoff in Task-Aware LLM Quantization,' which also centers on activation-informed decisions during model adaptation. Both papers are working the same seam: using internal model signals to make compression or fine-tuning more precise rather than relying on generic approximations. The broader pattern across recent Modelwire coverage is a cluster of papers attacking the cost of running and adapting large models under hardware constraints, whether through zeroth-order methods, quantization sensitivity, or confidence-adaptive inference as in the CAT paper. ZO-Act fits that cluster as the fine-tuning entry point, particularly for closed-API or edge scenarios where gradient access is simply unavailable.

Watch whether ZO-Act's activation-derived subspace approach holds its variance reduction advantage on models above 70B parameters, where activation dimensionality and memory overhead could erode the one-shot construction benefit that makes the method practical at smaller scales.

Coverage we drew on

Beyond Activation Alignment:The Alignment-Diversity Tradeoff in Task-Aware LLM Quantization · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsZO-Act · Adam

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.