POST
/
v1
/
voice
/
convert
curl --request POST \
  --url https://geoff.ai/api/v1/voice/convert \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "audio_url": "https://files.geoff.ai/audio/source.wav",
    "target_voice_id": "brand_voice_v1",
    "format": "wav"
  }'
{
  "data": {
    "audio_url": "https://files.geoff.ai/output/converted_abc123.wav",
    "audio_b64": "UklGRiQAAABXQVZFZm10...",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 8.4,
    "target_voice": "brand_voice_v1"
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}
Take speech in voice A plus a short reference of voice B; return the same content spoken in voice B’s timbre. No per-voice training required — works from a single 6–10 second reference clip. Useful for dubbing, voice replacement in existing recordings, or auditioning catalog voices on real content before committing.

Authorization

Authorization
string
required
Bearer token. Bearer API_key.

Request Body

audio_url
string
required
Source audio URL. The content (words + prosody) is preserved; only the timbre is replaced.
target_voice_id
string
Voice id from the catalog (use List Voices to enumerate). Mutually exclusive with reference_audio_url.
reference_audio_url
string
Direct URL to a 6–10 second reference clip of the target voice. Mutually exclusive with target_voice_id.
pitch_shift
integer
Optional semitone shift applied to the converted output. Range: -12 to 12. Default: 0.
format
string
Output audio format. Options: wav, mp3. Default: wav.
curl --request POST \
  --url https://geoff.ai/api/v1/voice/convert \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "audio_url": "https://files.geoff.ai/audio/source.wav",
    "target_voice_id": "brand_voice_v1",
    "format": "wav"
  }'
{
  "data": {
    "audio_url": "https://files.geoff.ai/output/converted_abc123.wav",
    "audio_b64": "UklGRiQAAABXQVZFZm10...",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 8.4,
    "target_voice": "brand_voice_v1"
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}

Tips

  • Pre-process noisy source: pipe through voice denoise first if the source has room tone, hiss, or background music.
  • Extract from songs: use voice isolate to extract vocals from a mixed track, then convert just the isolated vocal.
  • Pitch shift sparingly: large shifts (>4 semitones) introduce artifacts. For male↔female voice changes, the conversion handles formant adjustment automatically — pitch shift is for fine-tuning.