Day 0 support for Google Gemini 3.1 Flash TTS Try it now →
Providers

OpenAI

OpenAI text-to-speech models (gpt-4o-mini-tts, tts-1, tts-1-hd).

Prefixopenai
Default modelgpt-4o-mini-tts
Env varOPENAI_API_KEY
Official docsplatform.openai.com/docs/guides/text-to-speech

Models

ModelStreamingAudio TagsVoice CloningNotes
gpt-4o-mini-ttsYesYes (via instructions)NoSteerable; tags become instructions
tts-1YesNoNoLow-latency, fixed voices
tts-1-hdYesNoNoHigher quality, fixed voices

Usage

import { generateSpeech } from "@speech-sdk/core"

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello from SpeechSDK!",
  voice: "alloy",
})

Built-in voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse.

Audio Tags

gpt-4o-mini-tts is steerable — SpeechSDK maps standardized audio tags in your text to the OpenAI instructions field:

await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "[cheerful] Welcome back!",
  voice: "alloy",
})

tts-1 and tts-1-hd do not accept instructions — tags are stripped and a warning is returned.

Provider Options

await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
  providerOptions: {
    speed: 1.2,
    response_format: "opus", // mp3 | opus | aac | flac | wav | pcm
    instructions: "Speak with a warm, friendly tone.",
  },
})

Custom Configuration

import { generateSpeech } from "@speech-sdk/core"
import { createOpenAI } from "@speech-sdk/core/providers"

const openai = createOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://my-proxy.com/v1",
})

const result = await generateSpeech({
  model: openai("gpt-4o-mini-tts"),
  text: "Hello!",
  voice: "alloy",
})

On this page