MODELS & PROVIDERS
Supporting 25+ speech models
All the text-to-speech models supported by SpeechSDK in one place.
| Provider | Model | Languages | Release Date | Open Source | Streaming | Audio Tags | Voice Clone* |
|---|---|---|---|---|---|---|---|
| google/gemini-3.1-flash-tts-preview | afamar | Apr 15, 2026 | — | — | |||
| mistral/voxtral-mini-tts-2603 | enfrde | Mar 23, 2026 | — | ||||
| fish-audio/s2-pro | jaenzh | Mar 9, 2026 | |||||
| xai/grok-tts | enarbn | Nov 1, 2025 | — | — | |||
| cartesia/sonic-3 | enfrde | Oct 27, 2025 | — | ||||
| hume/octave-2 | enfrde | Oct 1, 2025 | — | — | |||
| resemble/default | enarda | Sep 4, 2025 | — | ||||
| inworld/inworld-tts-1.5-max | enesfr | Aug 15, 2025 | — | — | — | ||
| inworld/inworld-tts-1.5-mini | enesfr | Aug 15, 2025 | — | — | — | ||
| elevenlabs/eleven_v3 | afarhy | Jun 8, 2025 | — | — | |||
| google/gemini-2.5-flash-preview-tts | enfrde | May 1, 2025 | — | — | — | ||
| google/gemini-2.5-pro-preview-tts | enfrde | May 1, 2025 | — | — | — | ||
| deepgram/aura-2 | enesde | Apr 15, 2025 | — | — | — | ||
| openai/gpt-4o-mini-tts | afarbg | Mar 20, 2025 | — | — | |||
| fal-ai/orpheus-tts | enesfr | Mar 18, 2025 | — | — | — | ||
| cartesia/sonic-2 | en | Mar 13, 2025 | — | — | — | ||
| hume/octave-1 | en | Mar 1, 2025 | — | — | — | ||
| fal-ai/kokoro | enfrko | Jan 27, 2025 | — | — | — | ||
| murf/GEN2 | endees | Jan 1, 2025 | — | — | — | ||
| murf/FALCON | en | Jan 1, 2025 | — | — | — | ||
| smallest-ai/lightning-v3.1 | enhies | Jan 1, 2025 | — | — | — | — | |
| elevenlabs/eleven_flash_v2_5 | arbgcs | Dec 1, 2024 | — | — | — | ||
| elevenlabs/eleven_flash_v2 | en | Dec 1, 2024 | — | — | — | ||
| fal-ai/f5-tts | enzhfr | Oct 8, 2024 | — | — | |||
| openai/tts-1 | afarbg | Nov 6, 2023 | — | — | — | ||
| openai/tts-1-hd | afarbg | Nov 6, 2023 | — | — | — | ||
| elevenlabs/eleven_multilingual_v2 | arbgcs | Aug 22, 2023 | — | — | — |
Audio Tags — bracket syntax like [laughs], [sighs], or [whispers] that adds expressive audio cues to generated speech. Models without support will strip tags and return warnings.
*Voice Clone refers to passing inline audio references instead of selecting a pre-defined voice.