Providers

All supported text-to-speech providers, their models, and configuration.

SpeechSDK supports 12 providers out of the box. Use provider/model strings to select a provider and model, or pass just the provider name to use its default model.

Browse the full list of providers and models on the Models page.

Provider Table

ProviderPrefixDefault ModelEnv Var
OpenAIopenaigpt-4o-mini-ttsOPENAI_API_KEY
ElevenLabselevenlabseleven_multilingual_v2ELEVENLABS_API_KEY
Deepgramdeepgramaura-2DEEPGRAM_API_KEY
Cartesiacartesiasonic-3CARTESIA_API_KEY
Humehumeoctave-2HUME_API_KEY
Googlegooglegemini-2.5-flash-preview-ttsGOOGLE_API_KEY
Fish Audiofish-audios2-proFISH_AUDIO_API_KEY
Unreal Speechunreal-speechdefaultUNREAL_SPEECH_API_KEY
MurfmurfGEN2MURF_API_KEY
ResembleresembledefaultRESEMBLE_API_KEY
falfal-ai(user-specified)FAL_API_KEY
Mistralmistralvoxtral-mini-tts-2603MISTRAL_API_KEY

Usage Examples

OpenAI

import { generateSpeech } from "@speech-sdk/core"

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello from SpeechSDK!",
  voice: "alloy",
})

OpenAI models: gpt-4o-mini-tts, tts-1, tts-1-hd

ElevenLabs

const result = await generateSpeech({
  model: "elevenlabs/eleven_multilingual_v2",
  text: "Hello from SpeechSDK!",
  voice: "EXAVITQu4vr4xnSDxMaL",
})

ElevenLabs models: eleven_v3, eleven_multilingual_v2, eleven_flash_v2_5, eleven_flash_v2

Deepgram

const result = await generateSpeech({
  model: "deepgram/aura-2",
  text: "Hello from SpeechSDK!",
  voice: "thalia-en",
})

Cartesia

const result = await generateSpeech({
  model: "cartesia/sonic-3",
  text: "Hello from SpeechSDK!",
  voice: "a0e99841-438c-4a64-b679-ae501e7d6091",
})

Cartesia models: sonic-3, sonic-2

Google (Gemini TTS)

const result = await generateSpeech({
  model: "google/gemini-2.5-flash-preview-tts",
  text: "Hello from SpeechSDK!",
  voice: "Kore",
})

Hume

const result = await generateSpeech({
  model: "hume/octave-2",
  text: "Hello from SpeechSDK!",
  voice: "Dacher",
})

Mistral

const result = await generateSpeech({
  model: "mistral/voxtral-mini-tts-2603",
  text: "Hello from SpeechSDK!",
  voice: { audio: "base64-encoded-audio..." },
})

Mistral uses voice cloning by default — pass a voice object with reference audio.

Provider Options

Each provider accepts provider-specific parameters via providerOptions. These are sent directly to the provider's API using the API's own field names.

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
  providerOptions: {
    speed: 1.2,
    response_format: "opus",
  },
})

API Key Resolution

When using string models (e.g., 'openai/tts-1'), API keys are resolved from environment variables automatically (see the table above). You can override this with custom configuration.

On this page