Modelwire
Subscribe

Gemini 3.1 Flash TTS

Illustration accompanying: Gemini 3.1 Flash TTS

Google launched Gemini 3.1 Flash TTS, a new text-to-speech model accessible via the Gemini API that accepts natural language prompts to control voice characteristics and output style, expanding multimodal capabilities beyond text generation.

Modelwire context

Skeptical read

The headline capability here is natural language prompt control over voice characteristics, but Google DeepMind's own announcement (covered the same day) frames this more specifically around 'granular audio tags' for expressive speech synthesis — which is a meaningfully different technical claim than free-form prompting, and the two descriptions don't obviously reconcile.

This release lands in the middle of a busy week of Gemini expansion across modalities. As covered in our story on Gemini pulling from Google Photos for personalized image generation (April 16, The Verge and Ars Technica), Google is clearly pushing Gemini into every media type simultaneously. The pattern is consistent: each announcement extends Gemini's reach into a new output format while keeping developers inside the Gemini API. Whether this reflects genuine multimodal depth or a surface-level expansion strategy is the question none of these individual announcements answer on their own.

Watch whether third-party developers report that the natural language prompt controls actually produce reliably differentiated audio outputs, or whether the 'audio tags' framing in the DeepMind post turns out to be the real interface and the prompt-based description is marketing gloss.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsGoogle · Gemini 3.1 Flash TTS · Gemini API

Modelwire summarizes — we don’t republish. The full article lives on simonwillison.net. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Gemini 3.1 Flash TTS · Modelwire