Getting Started
Get started with SpeechSDK — the universal text-to-speech SDK for JavaScript and TypeScript.
SpeechSDK is a lightweight, provider-agnostic TypeScript toolkit for text-to-speech. One API, 12 providers, zero lock-in.
Install
npm install @speech-sdk/coreFor Agents
Give your AI coding assistant full knowledge of this library. Works with Claude Code, Cursor, Codex, and more.
npx skills add Jellypod-Inc/speech-sdk --skill speech-sdkQuick Start
import { generateSpeech } from "@speech-sdk/core"
const result = await generateSpeech({
model: "openai/gpt-4o-mini-tts",
text: "Hello from SpeechSDK!",
voice: "alloy",
})
result.audio.uint8Array // Uint8Array
result.audio.base64 // string (lazy-computed)
result.audio.mediaType // "audio/mpeg"That's it — no provider SDK to install, no client to initialize. Just pass a provider/model string and go.
How It Works
SpeechSDK resolves the provider from the model string, reads the API key from the corresponding environment variable, calls the provider's API, and returns a normalized result.
// These all work the same way
generateSpeech({ model: "openai/gpt-4o-mini-tts", text: "...", voice: "alloy" })
generateSpeech({
model: "elevenlabs/eleven_v3",
text: "...",
voice: "voice-id",
})
generateSpeech({ model: "deepgram/aura-2", text: "...", voice: "thalia-en" })
// Pass just the provider name to use its default model
generateSpeech({ model: "openai", text: "...", voice: "alloy" })Next Steps
- Providers — see all 12 supported providers and their models
- Streaming — stream audio chunk-by-chunk for low-latency playback
- Standardized Audio Tags — write expressive cues once, every provider handles them
- Voice Cloning — clone voices with reference audio
- Configuration — custom API keys, base URLs, and fetch
- Error Handling — handle API errors gracefully