When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence

Researchers discovered that well-converged FP32 language models fail catastrophically when quantized to INT4, with a three-phase pattern: initial joint improvement, a stable plateau, then explosive divergence where quantization error balloons from 11% to 517% despite minimal FP32 perplexity change.
MentionsPythia-160m · INT4 · FP32 · post-training quantization
Read full story at arXiv cs.LG →(arxiv.org)
Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.