MODELS & PROVIDERS
Supporting 30+ speech models
All the text-to-speech models supported by SpeechSDK in one place.
| Provider | Model | Languages | Release Date | Open Source | Streaming | Audio Tags | Voice Clone* |
|---|---|---|---|---|---|---|---|
Inworld | inworld/inworld-tts-2 | enarzh | May 5, 2026 | — | — | — | |
Cartesia | cartesia/sonic-3.5 | enfrde | May 4, 2026 | — | |||
Minimax | minimax/speech-2.8-hd | enzhja | May 1, 2026 | — | — | — | — |
Minimax | minimax/speech-2.8-turbo | enzhja | May 1, 2026 | — | — | — | — |
Google | google/gemini-3.1-flash-tts-preview | afamar | Apr 15, 2026 | — | — | ||
Mistral | mistral/voxtral-mini-tts-2603 | enfrde | Mar 23, 2026 | — | |||
Fish Audio | fish-audio/s2-pro | jaenzh | Mar 9, 2026 | ||||
xAI | xai/grok-tts | enarbn | Nov 1, 2025 | — | — | ||
Cartesia | cartesia/sonic-3 | enfrde | Oct 27, 2025 | — | |||
Hume | hume/octave-2 | enfrde | Oct 1, 2025 | — | — | ||
Resemble | resemble/default | enarda | Sep 4, 2025 | — | |||
Inworld | inworld/inworld-tts-1.5-max | enarzh | Aug 15, 2025 | — | — | — | |
Inworld | inworld/inworld-tts-1.5-mini | enarzh | Aug 15, 2025 | — | — | — | |
ElevenLabs | elevenlabs/eleven_v3 | afarhy | Jun 8, 2025 | — | — | ||
Google | google/gemini-2.5-flash-preview-tts | enfrde | May 1, 2025 | — | — | — | |
Google | google/gemini-2.5-pro-preview-tts | enfrde | May 1, 2025 | — | — | — | |
Smallest AI | smallest-ai/lightning_v3.1_pro | enhi | May 1, 2025 | — | — | — | — |
Deepgram | deepgram/aura-2 | enesde | Apr 15, 2025 | — | — | — | |
OpenAI | openai/gpt-4o-mini-tts | afarbg | Mar 20, 2025 | — | — | ||
fal | fal-ai/orpheus-tts | enesfr | Mar 18, 2025 | — | — | — | |
Cartesia | cartesia/sonic-2 | en | Mar 13, 2025 | — | — | — | |
Hume | hume/octave-1 | en | Mar 1, 2025 | — | — | — | |
fal | fal-ai/kokoro | enfrko | Jan 27, 2025 | — | — | — | |
Murf | murf/GEN2 | endees | Jan 1, 2025 | — | — | — | |
Murf | murf/FALCON | en | Jan 1, 2025 | — | — | — | |
Smallest AI | smallest-ai/lightning_v3.1 | enhies | Jan 1, 2025 | — | — | — | — |
ElevenLabs | elevenlabs/eleven_flash_v2_5 | arbgcs | Dec 1, 2024 | — | — | — | |
ElevenLabs | elevenlabs/eleven_flash_v2 | en | Dec 1, 2024 | — | — | — | |
fal | fal-ai/f5-tts | enzhfr | Oct 8, 2024 | — | — | ||
OpenAI | openai/tts-1 | afarbg | Nov 6, 2023 | — | — | — | |
OpenAI | openai/tts-1-hd | afarbg | Nov 6, 2023 | — | — | — | |
ElevenLabs | elevenlabs/eleven_multilingual_v2 | arbgcs | Aug 22, 2023 | — | — | — |
Audio Tags — bracket syntax like [laughs], [sighs], or [whispers] that adds expressive audio cues to generated speech. Models without support will strip tags and return warnings.
*Voice Clone refers to passing inline audio references instead of selecting a pre-defined voice.














