Streaming

Stream text-to-speech audio chunk-by-chunk with streamSpeech for low-latency playback.

Use streamSpeech to receive audio as it's generated, instead of waiting for the full file. This is ideal for real-time playback, voice agents, and long-form text where time-to-first-byte matters.

Quick Start

import { streamSpeech } from "@speech-sdk/core"

const result = await streamSpeech({
  model: "elevenlabs/eleven_v3",
  text: "Streaming audio, one chunk at a time.",
  voice: "voice-id",
})

for await (const chunk of result.audio) {
  // chunk is a Uint8Array of audio bytes
  process.stdout.write(chunk)
}

result.audio is a standard Web ReadableStream<Uint8Array>, so it works anywhere streams do — Node.js, Edge runtimes, and the browser.

StreamSpeechResult

interface StreamSpeechResult {
  readonly audio: ReadableStream<Uint8Array>
  readonly mediaType: string
  readonly providerMetadata?: Record<string, unknown>
  readonly warnings?: string[]
}

Returning a Streaming Response

Pass result.audio straight to a Response to stream audio to a client:

export async function GET() {
  const result = await streamSpeech({
    model: "deepgram/aura-2",
    text: "Hello from the edge.",
    voice: "thalia-en",
  })

  return new Response(result.audio, {
    headers: { "Content-Type": result.mediaType },
  })
}

Playing in the Browser

Use the Media Source Extensions API or buffer chunks into a Blob as they arrive:

const result = await streamSpeech({
  model: "cartesia/sonic-2",
  text: "Streaming in the browser.",
  voice: "voice-id",
})

const chunks: Uint8Array[] = []
for await (const chunk of result.audio) {
  chunks.push(chunk)
}

const blob = new Blob(chunks, { type: result.mediaType })
new Audio(URL.createObjectURL(blob)).play()

Aborting a Stream

Pass an AbortSignal to cancel the request mid-stream:

const controller = new AbortController()

const result = await streamSpeech({
  model: "elevenlabs/eleven_v3",
  text: "...",
  voice: "voice-id",
  abortSignal: controller.signal,
})

// Cancel after 2 seconds
setTimeout(() => controller.abort(), 2000)

Provider Support

Not every model supports streaming. Check the Models page for the full matrix, or use hasFeature at runtime:

import { FEATURES, hasFeature } from "@speech-sdk/core"

if (hasFeature(model, FEATURES.STREAMING)) {
  // safe to call streamSpeech
}

Calling streamSpeech on a model without streaming support throws a StreamingNotSupportedError — fall back to generateSpeech for those models.

On this page