Speech Result

Working with the audio data returned by generateSpeech.

generateSpeech returns a SpeechResult containing the generated audio and optional provider metadata.

SpeechResult

interface SpeechResult {
  readonly audio: GeneratedAudioFile
  readonly providerMetadata?: Record<string, unknown>
  readonly warnings?: string[]
}

Audio File

The audio property provides the generated audio in multiple formats:

interface GeneratedAudioFile {
  readonly uint8Array: Uint8Array // Raw audio bytes
  readonly base64: string // Base64 encoded (lazy-computed)
  readonly mediaType: string // MIME type, e.g. "audio/mpeg"
}

Accessing Audio Data

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
})

// Raw bytes — best for writing to files or streaming
result.audio.uint8Array

// Base64 — useful for data URIs or JSON serialization
result.audio.base64

// Media type — use for Content-Type headers
result.audio.mediaType

The base64 property is lazy-computed from uint8Array on first access, so there's no overhead if you only need the raw bytes.

Writing to a File (Node.js)

import { writeFileSync } from "fs"

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
})

writeFileSync("output.mp3", result.audio.uint8Array)

Creating a Response (Edge/Server)

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
})

return new Response(result.audio.uint8Array, {
  headers: { "Content-Type": result.audio.mediaType },
})

Playing in the Browser

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "Hello!",
  voice: "alloy",
})

const blob = new Blob([result.audio.uint8Array], {
  type: result.audio.mediaType,
})
const url = URL.createObjectURL(blob)
const audio = new Audio(url)
audio.play()

Provider Metadata

Some providers return additional metadata alongside the audio. Access it via providerMetadata:

const result = await generateSpeech({
  model: "hume/octave-2",
  text: "Hello!",
  voice: "Dacher",
})

if (result.providerMetadata) {
  console.log(result.providerMetadata)
}

The shape of metadata varies by provider.

Warnings

When using features that aren't supported by all providers (like audio tags), SpeechSDK returns warnings instead of throwing:

const result = await generateSpeech({
  model: "openai/gpt-4o-mini-tts",
  text: "[laugh] Hello world",
  voice: "alloy",
})

if (result.warnings) {
  console.log(result.warnings)
  // ["Audio tag [laugh] is not supported by openai/gpt-4o-mini-tts and was removed."]
}

warnings is undefined when there are no warnings.

On this page