POST
/
v1
/
voice
/
dub
curl --request POST \
  --url https://geoff.ai/api/v1/voice/dub \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "audio_url": "https://files.geoff.ai/audio/english_clip.mp3",
    "target_language": "es",
    "voice_id": "brand_voice_v1",
    "format": "mp3"
  }'
{
  "data": {
    "audio_url": "https://files.geoff.ai/output/dubbed_abc123.mp3",
    "audio_b64": "...",
    "format": "mp3",
    "sample_rate": 24000,
    "duration_s": 12.1,
    "source_language": "en",
    "target_language": "es",
    "transcript": "Hello and welcome to the show...",
    "translation": "Hola y bienvenidos al programa..."
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}
Translate a spoken-audio clip into a target language and re-render it in a chosen voice. Composes speech-to-text, translation, and text-to-speech into a single call so the chat surface and SDK don’t have to thread file ids between three separate requests. Useful for internationalising voiceovers, dubbing video segments, and re-rendering code-switched input in a single canonical language.

Authorization

Authorization
string
required
Bearer token. Bearer API_key.

Request Body

audio_url
string
required
Source audio URL. Any common codec accepted; speech is auto- detected and the rest of the clip is treated as silence.
target_language
string
required
ISO 639-1 target language code (e.g. en, es, fr, de, ja, zh).
voice_id
string
Voice to render the translated text in. When omitted, the engine preserves the source speaker’s timbre via cross-lingual cloning (the same voice now speaking a new language).
format
string
Output audio format. Options: wav, mp3. Default: wav.
curl --request POST \
  --url https://geoff.ai/api/v1/voice/dub \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "audio_url": "https://files.geoff.ai/audio/english_clip.mp3",
    "target_language": "es",
    "voice_id": "brand_voice_v1",
    "format": "mp3"
  }'
{
  "data": {
    "audio_url": "https://files.geoff.ai/output/dubbed_abc123.mp3",
    "audio_b64": "...",
    "format": "mp3",
    "sample_rate": 24000,
    "duration_s": 12.1,
    "source_language": "en",
    "target_language": "es",
    "transcript": "Hello and welcome to the show...",
    "translation": "Hola y bienvenidos al programa..."
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}

Notes

  • Cross-lingual cloning (omitting voice_id) keeps the source speaker’s timbre and prosody while speaking the new language. Quality depends on how much of the source speaker is captured in the clip — longer source clips produce more faithful cross-lingual output.
  • Code-switched input (multiple languages in the same clip) is collapsed to target_language in the output. The transcript field shows the detected source-language segments.
  • Step-by-step alternative: when you need control over the intermediate transcript or translation, call transcribe with translate: true, then T2A on the translated text.