OpenAI


Prefix	`openai`
Default model	`gpt-4o-mini-tts`
Env var	`OPENAI_API_KEY`
Official docs	platform.openai.com/docs/guides/text-to-speech

Models

Model	Streaming	Audio Tags	Voice Cloning	Notes
`gpt-4o-mini-tts`	Yes	Yes (via `instructions`)	No	Steerable; tags become instructions
`tts-1`	Yes	No	No	Low-latency, fixed voices
`tts-1-hd`	Yes	No	No	Higher quality, fixed voices

Usage

import { generateSpeech } from "@speech-sdk/core"

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello from SpeechSDK!",
  voice: "alloy",
})

Built-in voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse.

Audio Tags

gpt-4o-mini-tts is steerable — SpeechSDK maps standardized audio tags in your text to the OpenAI instructions field:

await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "[cheerful] Welcome back!",
  voice: "alloy",
})

tts-1 and tts-1-hd do not accept instructions — tags are stripped and a warning is returned.

Provider Options

await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
  providerOptions: {
    speed: 1.2,
    response_format: "opus", // mp3 | opus | aac | flac | wav | pcm
    instructions: "Speak with a warm, friendly tone.",
  },
})

Custom Configuration

import { generateSpeech } from "@speech-sdk/core"
import { createOpenAI } from "@speech-sdk/core/providers"

const openai = createOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://my-proxy.com/v1",
})

const result = await generateSpeech({
  model: openai("gpt-4o-mini-tts"),
  text: "Hello!",
  voice: "alloy",
})