POST
/
v1
/
voice
/
isolate
curl --request POST \
  --url https://geoff.ai/api/v1/voice/isolate \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "audio_url": "https://files.geoff.ai/music/track.mp3",
    "return_instrumental": true,
    "format": "wav"
  }'
{
  "data": {
    "vocals_url": "https://files.geoff.ai/output/vocals_abc123.wav",
    "vocals_b64": "...",
    "instrumental_url": "https://files.geoff.ai/output/instrumental_abc123.wav",
    "instrumental_b64": "...",
    "format": "wav",
    "sample_rate": 44100,
    "duration_s": 187.4
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}
Extract the vocal track from a music or mixed-audio file. Returns vocals as the primary payload; optionally returns the no-vocals instrumental stem in a separate field. Useful as a pre-processing step before voice convert when you want to swap the singer in a song, before voice clone when extracting a reference from a noisy podcast, or for stems-based remixing.

Authorization

Authorization
string
required
Bearer token. Bearer API_key.

Request Body

audio_url
string
required
Source audio URL (mp3 / wav / m4a / mp4).
return_instrumental
boolean
When true, also returns the no-vocals instrumental stem in the response. Default: false.
format
string
Output audio format. Options: wav, mp3. Default: wav.
curl --request POST \
  --url https://geoff.ai/api/v1/voice/isolate \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "audio_url": "https://files.geoff.ai/music/track.mp3",
    "return_instrumental": true,
    "format": "wav"
  }'
{
  "data": {
    "vocals_url": "https://files.geoff.ai/output/vocals_abc123.wav",
    "vocals_b64": "...",
    "instrumental_url": "https://files.geoff.ai/output/instrumental_abc123.wav",
    "instrumental_b64": "...",
    "format": "wav",
    "sample_rate": 44100,
    "duration_s": 187.4
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}

Notes

  • Best with clean masters: highly compressed sources (lossy at low bitrate, heavy radio compression) yield noisier separation.
  • No vocal at all? When no vocal is detected the vocals_url is still returned but the audio will be near-silent — check the duration / RMS before downstream processing.