Voice Cloning

Clone voices using reference audio with supported providers.

Some providers support inline voice cloning — pass a voice object with reference audio instead of a voice ID string.

From Base64 Audio

import { generateSpeech } from "@speech-sdk/core"
import { createMistral } from "@speech-sdk/core/mistral"

const mistral = createMistral()

const result = await generateSpeech({
  model: mistral(),
  text: "Hello in a cloned voice!",
  voice: { audio: "base64-encoded-audio..." },
})

You can also pass a Uint8Array:

import { readFileSync } from "fs"

const audioBytes = readFileSync("./reference.wav")

const result = await generateSpeech({
  model: mistral(),
  text: "Hello!",
  voice: { audio: audioBytes },
})

From a URL

import { generateSpeech } from "@speech-sdk/core"
import { createFal } from "@speech-sdk/core/fal-ai"

const fal = createFal()

const result = await generateSpeech({
  model: fal("fal-ai/chatterbox"),
  text: "Hello in a cloned voice!",
  voice: { url: "https://example.com/reference.wav" },
})

Voice Type

The voice parameter accepts three forms:

type Voice =
  | string // Voice ID (e.g., 'alloy', 'EXAVITQu4vr4...')
  | { audio: string | Uint8Array } // Inline clone from audio data
  | { url: string } // Inline clone from URL

Providers with Voice Cloning

ProviderVoice CloningMethod
Cartesia (sonic-3)YesAudio data
MistralYesAudio data
falYesURL

Not all models from a provider support voice cloning. Check the provider's documentation for model-specific support.

On this page