Speech Result
Working with the audio data returned by generateSpeech.
generateSpeech returns a SpeechResult containing the generated audio and optional provider metadata.
SpeechResult
interface SpeechResult {
readonly audio: GeneratedAudioFile
readonly providerMetadata?: Record<string, unknown>
readonly warnings?: string[]
}Audio File
The audio property provides the generated audio in multiple formats:
interface GeneratedAudioFile {
readonly uint8Array: Uint8Array // Raw audio bytes
readonly base64: string // Base64 encoded (lazy-computed)
readonly mediaType: string // MIME type, e.g. "audio/mpeg"
}Accessing Audio Data
const result = await generateSpeech({
model: "openai/gpt-4o-mini-tts",
text: "Hello!",
voice: "alloy",
})
// Raw bytes — best for writing to files or streaming
result.audio.uint8Array
// Base64 — useful for data URIs or JSON serialization
result.audio.base64
// Media type — use for Content-Type headers
result.audio.mediaTypeThe base64 property is lazy-computed from uint8Array on first access, so there's no overhead if you only need the raw bytes.
Writing to a File (Node.js)
import { writeFileSync } from "fs"
const result = await generateSpeech({
model: "openai/gpt-4o-mini-tts",
text: "Hello!",
voice: "alloy",
})
writeFileSync("output.mp3", result.audio.uint8Array)Creating a Response (Edge/Server)
const result = await generateSpeech({
model: "openai/gpt-4o-mini-tts",
text: "Hello!",
voice: "alloy",
})
return new Response(result.audio.uint8Array, {
headers: { "Content-Type": result.audio.mediaType },
})Playing in the Browser
const result = await generateSpeech({
model: "openai/gpt-4o-mini-tts",
text: "Hello!",
voice: "alloy",
})
const blob = new Blob([result.audio.uint8Array], {
type: result.audio.mediaType,
})
const url = URL.createObjectURL(blob)
const audio = new Audio(url)
audio.play()Provider Metadata
Some providers return additional metadata alongside the audio. Access it via providerMetadata:
const result = await generateSpeech({
model: "hume/octave-2",
text: "Hello!",
voice: "Dacher",
})
if (result.providerMetadata) {
console.log(result.providerMetadata)
}The shape of metadata varies by provider.
Warnings
When using features that aren't supported by all providers (like audio tags), SpeechSDK returns warnings instead of throwing:
const result = await generateSpeech({
model: "openai/gpt-4o-mini-tts",
text: "[laugh] Hello world",
voice: "alloy",
})
if (result.warnings) {
console.log(result.warnings)
// ["Audio tag [laugh] is not supported by openai/gpt-4o-mini-tts and was removed."]
}warnings is undefined when there are no warnings.