Voice Tools
Voice Convert
Zero-shot voice conversion — re-render speech in voice A as if spoken by voice B. No training required.
POST
Take speech in voice A plus a short reference of voice B; return
the same content spoken in voice B’s timbre. No per-voice training
required — works from a single 6–10 second reference clip.
Useful for dubbing, voice replacement in existing recordings, or
auditioning catalog voices on real content before committing.
Authorization
Bearer token.
Bearer API_key.Request Body
Source audio URL. The content (words + prosody) is preserved;
only the timbre is replaced.
Voice id from the catalog (use List Voices
to enumerate). Mutually exclusive with
reference_audio_url.Direct URL to a 6–10 second reference clip of the target voice.
Mutually exclusive with
target_voice_id.Optional semitone shift applied to the converted output.
Range:
-12 to 12. Default: 0.Output audio format. Options:
wav, mp3. Default: wav.Tips
- Pre-process noisy source: pipe through voice denoise first if the source has room tone, hiss, or background music.
- Extract from songs: use voice isolate to extract vocals from a mixed track, then convert just the isolated vocal.
- Pitch shift sparingly: large shifts (>4 semitones) introduce artifacts. For male↔female voice changes, the conversion handles formant adjustment automatically — pitch shift is for fine-tuning.