Adapting Foundation ASR Models to Dysarthric Speech: A Case Study

Foundation models trained on general speech data struggle with dysarthric speech, but targeted fine-tuning can bridge the gap. Researchers adapted OpenAI's Whisper to a single dysarthric speaker using 92 hours of collected audio plus user corrections, achieving 9.7% word error rate with full data versus 15.8% with minimal adaptation. The work demonstrates that personalized ASR remains tractable even with modest speaker-specific datasets, opening accessibility pathways for underserved populations. LoRA and alternative foundation models underperformed in this setting, suggesting full fine-tuning remains the practical baseline for high-stakes accessibility applications.
Modelwire context
ExplainerThe critical finding is not just that fine-tuning works, but that LoRA and other parameter-efficient variants actively underperformed on dysarthric speech despite their success elsewhere. This suggests accessibility applications may require full model adaptation even when efficiency methods excel in other domains.
This directly echoes the LoRA instability documented in the June 30 orthonormal initialization study, which showed that parameter-efficient tuning behaves differently depending on training paradigm. That work focused on RL contexts; this paper extends the pattern to supervised fine-tuning on out-of-distribution speech data. The implication is consistent: when stakes are high and data distribution diverges from pretraining, full fine-tuning remains the safer baseline. This also connects to the conformal prediction acceleration work from the same day, which emphasizes that safety-critical deployments (energy grids, accessibility systems) demand not just accuracy but calibrated confidence and robustness, not just efficiency.
If the same researchers or follow-up work demonstrate that LoRA recovers performance on dysarthric speech when combined with the orthonormal initialization schemes from the June 30 cs.LG paper, that would suggest the LoRA gap here is fixable rather than fundamental. If not, and if similar full-fine-tuning requirements emerge in other accessibility tasks (stuttering, apraxia), that signals a broader pattern where parameter efficiency trades off robustness for underserved populations.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsWhisper · OpenAI · Qwen3-ASR · LoRA · TEQST
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.