Voice Tools
Voice Dub
Translate spoken audio into another language and re-render in a chosen voice. Composes STT + translation + TTS behind a single tool.
POST
Translate a spoken-audio clip into a target language and re-render
it in a chosen voice. Composes speech-to-text, translation, and
text-to-speech into a single call so the chat surface and SDK don’t
have to thread file ids between three separate requests.
Useful for internationalising voiceovers, dubbing video segments,
and re-rendering code-switched input in a single canonical language.
Authorization
Bearer token.
Bearer API_key.Request Body
Source audio URL. Any common codec accepted; speech is auto-
detected and the rest of the clip is treated as silence.
ISO 639-1 target language code (e.g.
en, es, fr, de,
ja, zh).Voice to render the translated text in. When omitted, the engine
preserves the source speaker’s timbre via cross-lingual cloning
(the same voice now speaking a new language).
Output audio format. Options:
wav, mp3. Default: wav.Notes
- Cross-lingual cloning (omitting
voice_id) keeps the source speaker’s timbre and prosody while speaking the new language. Quality depends on how much of the source speaker is captured in the clip — longer source clips produce more faithful cross-lingual output. - Code-switched input (multiple languages in the same clip) is
collapsed to
target_languagein the output. The transcript field shows the detected source-language segments. - Step-by-step alternative: when you need control over the
intermediate transcript or translation,
call transcribe with
translate: true, then T2A on the translated text.