Configuration

Custom API keys, base URLs, and fetch implementations.

By default, SpeechSDK reads API keys from environment variables. Use provider factory functions when you need custom API keys, base URLs, or fetch implementations.

Factory Functions

All provider create* factories are exported from @speech-sdk/core/providers:

import { generateSpeech } from "@speech-sdk/core"
import { createOpenAI } from "@speech-sdk/core/providers"

const myOpenAI = createOpenAI({
  apiKey: "sk-...",
  baseURL: "https://my-proxy.com/v1",
})

const result = await generateSpeech({
  model: myOpenAI("gpt-4o-mini-tts"),
  text: "Hello!",
  voice: "alloy",
})

The factory returns a function that accepts an optional model ID. Call it without arguments to use the provider's default model:

import { createElevenLabs } from "@speech-sdk/core/providers"

const elevenlabs = createElevenLabs({ apiKey: "..." })

// Uses the provider's default model
generateSpeech({ model: elevenlabs(), text: "...", voice: "EXAVITQu4vr4xnSDxMaL" })

// Or pick a specific model
generateSpeech({ model: elevenlabs("eleven_v3"), text: "...", voice: "EXAVITQu4vr4xnSDxMaL" })

Available Factories

All factories are imported from @speech-sdk/core/providers:

Function
createOpenAI()
createElevenLabs()
createDeepgram()
createCartesia()
createHume()
createGoogle()
createFishAudio()
createInworld()
createMurf()
createResemble()
createFal()
createMistral()
createXai()
createSmallestAI()
createSpeechGateway()

Configuration Options

All factory functions accept the same base options:

interface ProviderConfig {
  apiKey?: string // Override the env var
  baseURL?: string // Custom API endpoint (proxies, self-hosted)
  fetch?: typeof globalThis.fetch // Custom fetch implementation
}

Custom Fetch

Pass a custom fetch for logging, instrumentation, or environments without a global fetch:

import { createOpenAI } from "@speech-sdk/core/providers"

const openai = createOpenAI({
  fetch: async (url, init) => {
    console.log(`Requesting: ${url}`)
    return globalThis.fetch(url, init)
  },
})

Request Options

Every generateSpeech call accepts these options:

generateSpeech({
  model: string | ResolvedModel,       // required
  text: string,                        // required
  voice: Voice,                        // required
  providerOptions?: object,            // provider-specific API params
  timestamps?: boolean,                // word-level alignment, default: false
  speed?: number,                      // 0.75–1.5; time-stretch the final audio (see /docs/speed)
  maxRetries?: number,                 // default: 2
  abortSignal?: AbortSignal,           // cancel the request
  headers?: Record<string, string>,    // additional HTTP headers
});

Abort Signal

Cancel an in-flight request:

const controller = new AbortController()

const promise = generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
  abortSignal: controller.signal,
})

// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000)

Custom Headers

Pass additional HTTP headers to the provider's API:

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
  headers: {
    "X-Custom-Header": "value",
  },
})

Retries

SpeechSDK retries on 5xx and network errors with exponential backoff. Does not retry 4xx errors. Default: 2 retries.

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
  maxRetries: 5,
})

On this page