Google (Gemini TTS)


Prefix	`google`
Default model	`gemini-2.5-flash-preview-tts`
Env var	`GOOGLE_API_KEY`
Official docs	ai.google.dev/gemini-api/docs/text-generation

Models

Model	Streaming	Audio Tags	Voice Cloning	Notes
`gemini-3.1-flash-tts-preview`	Yes	No	No	Newest; best quality
`gemini-2.5-flash-preview-tts`	Yes	No	No	Default; lower latency
`gemini-2.5-pro-preview-tts`	Yes	No	No	Previous-gen, higher quality

Usage

import { generateSpeech } from "@speech-sdk/core"

const result = await generateSpeech({
  model: "google/gemini-3.1-flash-tts-preview",
  text: "Hello from SpeechSDK!",
  voice: "Kore",
})

Built-in voices include Kore, Puck, Charon, Fenrir, Aoede, and others — see the Gemini TTS docs for the full list.

Gemini returns raw PCM audio. SpeechSDK sets mediaType to audio/L16;rate=24000 and passes through the uint8Array unchanged. If you need a container like WAV, wrap the bytes yourself or use a different provider.

Streaming

Gemini's streaming surface is server-buffered SSE — SpeechSDK exposes it through streamSpeech, but chunks may arrive in larger batches than with true chunked providers.

Provider Options

await generateSpeech({
  model: "google/gemini-3.1-flash-tts-preview",
  text: "Hello!",
  voice: "Kore",
  providerOptions: {
    temperature: 0.9,
  },
})

Custom Configuration

import { generateSpeech } from "@speech-sdk/core"
import { createGoogle } from "@speech-sdk/core/providers"

const google = createGoogle({
  apiKey: process.env.GOOGLE_API_KEY,
})

const result = await generateSpeech({
  model: google("gemini-3.1-flash-tts-preview"),
  text: "Hello!",
  voice: "Kore",
})