Stability AI releases a new audio model that can create six-minute songs

Stability AI's latest audio generation model marks a shift toward practical on-device music synthesis, enabling creators to produce extended compositions without cloud dependency. The move signals intensifying competition in generative audio, where latency and accessibility now rival raw capability as competitive vectors. For music producers and app developers, local inference at scale reduces both cost and privacy friction, potentially accelerating adoption of AI-assisted composition tools across consumer and professional workflows.
Modelwire context
Skeptical readThe announcement conspicuously omits benchmark comparisons against Suno, Udio, or Google's MusicFX, which are the actual competitive reference points for audio quality and prompt adherence. Six-minute output length is a headline number, but duration alone says nothing about structural coherence across that span, which is where current models tend to fall apart.
Modelwire has no prior coverage of generative audio to anchor this against, so the honest framing is that this story belongs to a competitive cluster we haven't tracked closely yet. That cluster includes Suno's repeated model updates and Udio's licensing disputes, both of which have shaped what "practical" audio generation actually means for rights-holding creators. Stability AI is entering this conversation late, after a period of internal financial instability and leadership turnover that raised real questions about its ability to sustain model development. The on-device angle is the most credible differentiator here, but Stability has made capability claims before that didn't survive contact with independent testing.
Watch whether third-party audio researchers publish blind listening tests against Suno v4 within the next 60 days. If Stability Audio 3.0 holds up on structural coherence in those independent evaluations, the on-device pitch becomes a genuine wedge. If not, this reads as a positioning move ahead of a funding round.
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsStability AI · Stability Audio 3.0 · TechCrunch
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on techcrunch.com. If you’re a publisher and want a different summarization policy for your work, see our takedown page.