Providers
Google (Gemini TTS)
Google Gemini 3.1 Flash and 2.5 Flash/Pro text-to-speech preview models.
| Prefix | google |
| Default model | gemini-2.5-flash-preview-tts |
| Env var | GOOGLE_API_KEY |
| Official docs | ai.google.dev/gemini-api/docs/text-generation |
Models
| Model | Streaming | Audio Tags | Voice Cloning | Notes |
|---|---|---|---|---|
gemini-3.1-flash-tts-preview | Yes | No | No | Newest; best quality |
gemini-2.5-flash-preview-tts | Yes | No | No | Default; lower latency |
gemini-2.5-pro-preview-tts | Yes | No | No | Previous-gen, higher quality |
Usage
import { generateSpeech } from "@speech-sdk/core"
const result = await generateSpeech({
model: "google/gemini-3.1-flash-tts-preview",
text: "Hello from SpeechSDK!",
voice: "Kore",
})Built-in voices include Kore, Puck, Charon, Fenrir, Aoede, and others — see the Gemini TTS docs for the full list.
Output Format
Gemini returns raw PCM audio. SpeechSDK sets mediaType to audio/L16;rate=24000 and passes through the uint8Array unchanged. If you need a container like WAV, wrap the bytes yourself or use a different provider.
Streaming
Gemini's streaming surface is server-buffered SSE — SpeechSDK exposes it through streamSpeech, but chunks may arrive in larger batches than with true chunked providers.
Provider Options
await generateSpeech({
model: "google/gemini-3.1-flash-tts-preview",
text: "Hello!",
voice: "Kore",
providerOptions: {
temperature: 0.9,
},
})Custom Configuration
import { generateSpeech } from "@speech-sdk/core"
import { createGoogle } from "@speech-sdk/core/providers"
const google = createGoogle({
apiKey: process.env.GOOGLE_API_KEY,
})
const result = await generateSpeech({
model: google("gemini-3.1-flash-tts-preview"),
text: "Hello!",
voice: "Kore",
})