Day 0 support for Google Gemini 3.1 Flash TTS Try it now →
Providers

Google (Gemini TTS)

Google Gemini 3.1 Flash and 2.5 Flash/Pro text-to-speech preview models.

Prefixgoogle
Default modelgemini-2.5-flash-preview-tts
Env varGOOGLE_API_KEY
Official docsai.google.dev/gemini-api/docs/text-generation

Models

ModelStreamingAudio TagsVoice CloningNotes
gemini-3.1-flash-tts-previewYesNoNoNewest; best quality
gemini-2.5-flash-preview-ttsYesNoNoDefault; lower latency
gemini-2.5-pro-preview-ttsYesNoNoPrevious-gen, higher quality

Usage

import { generateSpeech } from "@speech-sdk/core"

const result = await generateSpeech({
  model: "google/gemini-3.1-flash-tts-preview",
  text: "Hello from SpeechSDK!",
  voice: "Kore",
})

Built-in voices include Kore, Puck, Charon, Fenrir, Aoede, and others — see the Gemini TTS docs for the full list.

Output Format

Gemini returns raw PCM audio. SpeechSDK sets mediaType to audio/L16;rate=24000 and passes through the uint8Array unchanged. If you need a container like WAV, wrap the bytes yourself or use a different provider.

Streaming

Gemini's streaming surface is server-buffered SSE — SpeechSDK exposes it through streamSpeech, but chunks may arrive in larger batches than with true chunked providers.

Provider Options

await generateSpeech({
  model: "google/gemini-3.1-flash-tts-preview",
  text: "Hello!",
  voice: "Kore",
  providerOptions: {
    temperature: 0.9,
  },
})

Custom Configuration

import { generateSpeech } from "@speech-sdk/core"
import { createGoogle } from "@speech-sdk/core/providers"

const google = createGoogle({
  apiKey: process.env.GOOGLE_API_KEY,
})

const result = await generateSpeech({
  model: google("gemini-3.1-flash-tts-preview"),
  text: "Hello!",
  voice: "Kore",
})

On this page