All Providers
All supported text-to-speech providers, their models, and configuration.
SpeechSDK supports 14 providers out of the box. Use provider/model strings to select a provider and model, or pass just the provider name to use its default model.
Browse the full list of models on the Models page, or jump into a provider page below.
Provider Table
| Provider | Prefix | Default Model | Env Var |
|---|---|---|---|
| OpenAI | openai | gpt-4o-mini-tts | OPENAI_API_KEY |
| ElevenLabs | elevenlabs | eleven_multilingual_v2 | ELEVENLABS_API_KEY |
| Deepgram | deepgram | aura-2 | DEEPGRAM_API_KEY |
| Cartesia | cartesia | sonic-3 | CARTESIA_API_KEY |
| Hume | hume | octave-2 | HUME_API_KEY |
google | gemini-2.5-flash-preview-tts | GOOGLE_API_KEY | |
| Fish Audio | fish-audio | s2-pro | FISH_AUDIO_API_KEY |
| Inworld | inworld | inworld-tts-1.5-max | INWORLD_API_KEY |
| Murf | murf | GEN2 | MURF_API_KEY |
| Resemble | resemble | default | RESEMBLE_API_KEY |
| Smallest AI | smallest-ai | lightning-v3.1 | SMALLEST_API_KEY |
| fal | fal-ai | (user-specified) | FAL_API_KEY |
| Mistral | mistral | voxtral-mini-tts-2603 | MISTRAL_API_KEY |
| xAI | xai | grok-tts | XAI_API_KEY |
Capability Matrix
| Provider | Streaming | Audio Tags | Voice Cloning | Timestamps | Open Source |
|---|---|---|---|---|---|
| OpenAI | Yes | Yes (as instructions) | No | STT fallback only | No |
| ElevenLabs | Yes | Yes (eleven_v3) | No | Native | No |
| Deepgram | Yes | No | No | STT fallback only | No |
| Cartesia | Yes | Yes (sonic-3) | Yes (sonic-3) | Native | No |
| Hume | Yes | No | Yes (octave-2) | Native (octave-2) | No |
| Yes | No | No | STT fallback only | No | |
| Fish Audio | Yes | Yes | Yes | STT fallback only | Yes |
| Inworld | Yes | No | No | Native | No |
| Murf | No | No | No | Native (GEN2) | No |
| Resemble | Yes | No | Yes | Native | Yes |
| Smallest AI | No | No | No | STT fallback only | No |
| fal | No | No | Yes (select models) | STT fallback only | Varies |
| Mistral | No | No | Yes | STT fallback only | Yes |
| xAI | Yes | Yes (grok-tts) | No | STT fallback only | No |
Support is per-model — check each provider page for the per-model features. "STT fallback only" means timestamps: true works via a transcription round-trip (OpenAI Whisper by default); see the timestamps guide for details.
Usage
import { generateSpeech } from "@speech-sdk/core"
// provider/model string
await generateSpeech({
model: "openai/gpt-4o-mini-tts",
text: "Hello from SpeechSDK!",
voice: "alloy",
})
// just the provider name uses the default model
await generateSpeech({
model: "elevenlabs",
text: "Hello from SpeechSDK!",
voice: "EXAVITQu4vr4xnSDxMaL",
})Provider Options
Each provider accepts provider-specific parameters via providerOptions. These are sent directly to the provider's API using the API's own field names.
const result = await generateSpeech({
model: "openai/gpt-4o-mini-tts",
text: "Hello!",
voice: "alloy",
providerOptions: {
speed: 1.2,
response_format: "opus",
},
})API Key Resolution
When using string models (e.g., 'openai/gpt-4o-mini-tts'), API keys are resolved from environment variables automatically (see the table above). You can override this with custom configuration.