Configuration

By default, SpeechSDK reads API keys from environment variables. Use provider factory functions when you need custom API keys, base URLs, or fetch implementations.

Factory Functions

All provider create* factories are exported from @speech-sdk/core/providers:

import { generateSpeech } from "@speech-sdk/core"
import { createOpenAI } from "@speech-sdk/core/providers"

const myOpenAI = createOpenAI({
  apiKey: "sk-...",
  baseURL: "https://my-proxy.com/v1",
})

const result = await generateSpeech({
  model: myOpenAI("gpt-4o-mini-tts"),
  text: "Hello!",
  voice: "alloy",
})

The factory returns a function that accepts an optional model ID. Call it without arguments to use the provider's default model:

import { createElevenLabs } from "@speech-sdk/core/providers"

const elevenlabs = createElevenLabs({ apiKey: "..." })

// Uses the provider's default model
generateSpeech({ model: elevenlabs(), text: "...", voice: "EXAVITQu4vr4xnSDxMaL" })

// Or pick a specific model
generateSpeech({ model: elevenlabs("eleven_v3"), text: "...", voice: "EXAVITQu4vr4xnSDxMaL" })

Available Factories

All factories are imported from @speech-sdk/core/providers:

Function
`createOpenAI()`
`createElevenLabs()`
`createDeepgram()`
`createCartesia()`
`createHume()`
`createGoogle()`
`createFishAudio()`
`createInworld()`
`createMurf()`
`createResemble()`
`createFal()`
`createMistral()`
`createXai()`
`createSmallestAI()`
`createSpeechGateway()`

Configuration Options

All factory functions accept the same base options:

interface ProviderConfig {
  apiKey?: string // Override the env var
  baseURL?: string // Custom API endpoint (proxies, self-hosted)
  fetch?: typeof globalThis.fetch // Custom fetch implementation
}

Custom Fetch

Pass a custom fetch for logging, instrumentation, or environments without a global fetch:

import { createOpenAI } from "@speech-sdk/core/providers"

const openai = createOpenAI({
  fetch: async (url, init) => {
    console.log(`Requesting: ${url}`)
    return globalThis.fetch(url, init)
  },
})

Request Options

Every generateSpeech call accepts these options:

generateSpeech({
  model: string | ResolvedModel,       // required
  text: string,                        // required
  voice: Voice,                        // required
  providerOptions?: object,            // provider-specific API params
  timestamps?: boolean,                // word-level alignment, default: false
  speed?: number,                      // 0.75–1.5; time-stretch the final audio (see /docs/speed)
  maxRetries?: number,                 // default: 2
  abortSignal?: AbortSignal,           // cancel the request
  headers?: Record<string, string>,    // additional HTTP headers
});

Abort Signal

Cancel an in-flight request:

const controller = new AbortController()

const promise = generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
  abortSignal: controller.signal,
})

// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000)

Custom Headers

Pass additional HTTP headers to the provider's API:

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
  headers: {
    "X-Custom-Header": "value",
  },
})

Retries

SpeechSDK retries on 5xx and network errors with exponential backoff. Does not retry 4xx errors. Default: 2 retries.

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
  maxRetries: 5,
})

Configuration

On this page