Day 0 support for Google Gemini 3.1 Flash TTS Try it now →

The Unified Text-to-Speech SDK

The SpeechSDK is a free, open-source toolkit for building better AI audio applications with multiple voice providers.

$ npm install @speech-sdk/core
+8 more →
import { generateSpeech } from '@speech-sdk/core';

const result = await generateSpeech({
  model: 'openai/gpt-4o-mini-tts',
  text: 'Hello from SpeechSDK!',
  voice: 'alloy',
});

result.audio.uint8Array;  // Uint8Array
result.audio.base64;      // string (lazy)
result.audio.mediaType;   // "audio/mpeg"
14
Providers
25+
Models
Built
For Production
Open Source
Apache 2.0 License

One API, Every Provider

One interface across OpenAI, ElevenLabs, Deepgram, Cartesia, Google, Mistral, Hume, and more. Unified model strings, consistent response format, BYO API keys.

Streaming
Auto-Retries
Error Handling
Provider Options

Multi-Speaker Conversations

Generate a multi-speaker conversation with a single API. Mix voices across providers in one call, with automatic volume leveling and turn stitching.

Audio Tags
Voice Cloning
Pronunciations
Volume Leveling
Speed Control

Auto-Chunking & Timestamps

Intelligently splits long inputs on sentence boundaries for providers with max input lengths.

Multilingual Splitter
Timestamps
SRT/VTT Captions
Output Formats

Why SpeechSDK?

Locking into a single TTS provider's SDK means rewriting code when a better or less expensive model ships.

The SpeechSDK integrates all major providers into an easy-to-use, unified interface so you can swap models without breaking your application code.

$ npm i @speech-sdk/core

Supports

OpenAI
ElevenLabs
Google
xAI
+ 10 providers
generate-conversation.ts
import { generateConversation } from "@speech-sdk/core";

const result = await generateConversation({
  turns: [
    {
      model: "elevenlabs/eleven_v3",
      voice: "EXAVITQu4vr4xnSDxMaL",
      text: "Hello from the SDK.",
    },
    {
      model: "google/gemini-3.1-flash-tts-preview",
      voice: "Kore",
      text: "One call. Multiple voices. Auto-leveled.",
    },
  ],
});

result.audio.uint8Array; // Uint8Array
result.audio.mediaType;  // "audio/mpeg"

AI Engineering

For Production Voice Applications

Smart retries

Jittered exponential backoff retries 5xx and 429 automatically. 429s honor Retry-After (60s cap) and expose the delay via ApiError.retryAfterMs.

Long inputs, handled

maxInputChars splits at sentence boundaries, stitches chunks into one audio file, and reconnects word-level timestamps end-to-end.

Format conversions

Render wav, mp3, or pcm from any provider. Native pass-through where supported, lossless local conversion otherwise.

Custom fetch & Base URL

Every provider accepts a custom fetch and baseURL — point at OpenAI-compatible proxies, Azure, LiteLLM, or local models.

Words and captions

Word-level timestamps from native alignment or a one-shot STT fallback. timestampsToCaptions ships SRT or WebVTT in a single call.

Speechbase ready

Queuing, quality processing, voice management, and analytics — one config change to connect. Coming soon.

PROVIDERS

Every model, one interface

View All Providers →
ProviderModel StringDefault*
OpenAIopenai/gpt-4o-mini-ttsYes
ElevenLabselevenlabs/eleven_v3Yes
ElevenLabselevenlabs/eleven_flash_v2_5
ElevenLabselevenlabs/eleven_flash_v2
Deepgramdeepgram/aura-2Yes
Cartesiacartesia/sonic-3Yes
Humehume/octave-2Yes
Googlegoogle/gemini-3.1-flash-tts-previewYes
Fish Audiofish-audio/s2-proYes
Inworldinworld/inworld-tts-1.5-maxYes
Murfmurf/GEN2Yes
Smallest AIsmallest-ai/lightning-v3.1Yes
Resembleresemble/defaultYes
falfal-ai/*
Mistralmistral/voxtral-mini-tts-2603Yes
xAIxai/grok-ttsYes

* Pass just the provider name to use its default model — e.g. model: 'openai' resolves to openai/gpt-4o-mini-tts.

Frequently asked questions

Each provider has its own SDK, request format, auth pattern, and response shape. SpeechSDK is one API, every provider — same function call, same result type, same error handling. Switch providers by simply changing a model string.

SpeechSDK

One SDK, every provider. Add text-to-speech to your app in minutes with a unified, open-source interface.