Getting Started

Get started with SpeechSDK — the universal text-to-speech SDK for JavaScript and TypeScript.

SpeechSDK is a lightweight, provider-agnostic TypeScript toolkit for text-to-speech. One API, 12 providers, zero lock-in.

Install

npm install @speech-sdk/core

For Agents

Give your AI coding assistant full knowledge of this library. Works with Claude Code, Cursor, Codex, and more.

npx skills add Jellypod-Inc/speech-sdk --skill speech-sdk

Quick Start

import { generateSpeech } from "@speech-sdk/core"

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello from SpeechSDK!",
  voice: "alloy",
})

result.audio.uint8Array // Uint8Array
result.audio.base64 // string (lazy-computed)
result.audio.mediaType // "audio/mpeg"

That's it — no provider SDK to install, no client to initialize. Just pass a provider/model string and go.

How It Works

SpeechSDK resolves the provider from the model string, reads the API key from the corresponding environment variable, calls the provider's API, and returns a normalized result.

// These all work the same way
generateSpeech({ model: "openai/gpt-4o-mini-tts", text: "...", voice: "alloy" })
generateSpeech({
  model: "elevenlabs/eleven_v3",
  text: "...",
  voice: "voice-id",
})
generateSpeech({ model: "deepgram/aura-2", text: "...", voice: "thalia-en" })

// Pass just the provider name to use its default model
generateSpeech({ model: "openai", text: "...", voice: "alloy" })

Next Steps

On this page