Providers

All Providers

All supported text-to-speech providers, their models, and configuration.

SpeechSDK supports 14 providers out of the box. Use provider/model strings to select a provider and model, or pass just the provider name to use its default model.

Browse the full list of models on the Models page, or jump into a provider page below.

Provider Table

ProviderPrefixDefault ModelEnv Var
OpenAIopenaigpt-4o-mini-ttsOPENAI_API_KEY
ElevenLabselevenlabseleven_multilingual_v2ELEVENLABS_API_KEY
Deepgramdeepgramaura-2DEEPGRAM_API_KEY
Cartesiacartesiasonic-3CARTESIA_API_KEY
Humehumeoctave-2HUME_API_KEY
Googlegooglegemini-2.5-flash-preview-ttsGOOGLE_API_KEY
Fish Audiofish-audios2-proFISH_AUDIO_API_KEY
Inworldinworldinworld-tts-1.5-maxINWORLD_API_KEY
MurfmurfGEN2MURF_API_KEY
ResembleresembledefaultRESEMBLE_API_KEY
Smallest AIsmallest-ailightning-v3.1SMALLEST_API_KEY
falfal-ai(user-specified)FAL_API_KEY
Mistralmistralvoxtral-mini-tts-2603MISTRAL_API_KEY
xAIxaigrok-ttsXAI_API_KEY

Capability Matrix

ProviderStreamingAudio TagsVoice CloningTimestampsOpen Source
OpenAIYesYes (as instructions)NoSTT fallback onlyNo
ElevenLabsYesYes (eleven_v3)NoNativeNo
DeepgramYesNoNoSTT fallback onlyNo
CartesiaYesYes (sonic-3)Yes (sonic-3)NativeNo
HumeYesNoYes (octave-2)Native (octave-2)No
GoogleYesNoNoSTT fallback onlyNo
Fish AudioYesYesYesSTT fallback onlyYes
InworldYesNoNoNativeNo
MurfNoNoNoNative (GEN2)No
ResembleYesNoYesNativeYes
Smallest AINoNoNoSTT fallback onlyNo
falNoNoYes (select models)STT fallback onlyVaries
MistralNoNoYesSTT fallback onlyYes
xAIYesYes (grok-tts)NoSTT fallback onlyNo

Support is per-model — check each provider page for the per-model features. "STT fallback only" means timestamps: true works via a transcription round-trip (OpenAI Whisper by default); see the timestamps guide for details.

Usage

import { generateSpeech } from "@speech-sdk/core"

// provider/model string
await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello from SpeechSDK!",
  voice: "alloy",
})

// just the provider name uses the default model
await generateSpeech({
  model: "elevenlabs",
  text: "Hello from SpeechSDK!",
  voice: "EXAVITQu4vr4xnSDxMaL",
})

Provider Options

Each provider accepts provider-specific parameters via providerOptions. These are sent directly to the provider's API using the API's own field names.

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
  providerOptions: {
    speed: 1.2,
    response_format: "opus",
  },
})

API Key Resolution

When using string models (e.g., 'openai/gpt-4o-mini-tts'), API keys are resolved from environment variables automatically (see the table above). You can override this with custom configuration.

On this page