Voice Extend - Geoff API

curl --request POST \
  --url https://geoff.ai/api/v1/voice/extend \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "reference_audio_url": "https://files.geoff.ai/audio/intro_segment.wav",
    "text": "And now, back to the main story.",
    "format": "wav"
  }'

{
  "data": {
    "audio_url": "https://files.geoff.ai/output/extended_abc123.wav",
    "audio_b64": "...",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 3.4
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}

POST

voice

extend

curl --request POST \
  --url https://geoff.ai/api/v1/voice/extend \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "reference_audio_url": "https://files.geoff.ai/audio/intro_segment.wav",
    "text": "And now, back to the main story.",
    "format": "wav"
  }'

{
  "data": {
    "audio_url": "https://files.geoff.ai/output/extended_abc123.wav",
    "audio_b64": "...",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 3.4
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}

Continue an audio clip in the same voice — passes a short reference clip plus the new text and returns audio that sounds like the original speaker speaking the new content. Lower-friction than cloning + synthesizing in two calls when the goal is a single contiguous-feeling clip.

Authorization

string

required

Bearer token. Bearer API_key.

Request Body

reference_audio_url

string

required

Source audio URL providing voice + style. 5–30 seconds is the sweet spot.

text

string

required

New text to render in the source voice.

format

string

Output audio format. Options: wav, mp3. Default: wav.

curl --request POST \
  --url https://geoff.ai/api/v1/voice/extend \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "reference_audio_url": "https://files.geoff.ai/audio/intro_segment.wav",
    "text": "And now, back to the main story.",
    "format": "wav"
  }'

{
  "data": {
    "audio_url": "https://files.geoff.ai/output/extended_abc123.wav",
    "audio_b64": "...",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 3.4
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}

When to use what

Goal	Tool
Continue an existing clip in the same voice	`voice_extend` (this endpoint)
Render arbitrary text in a saved catalog voice	T2A with `voice_id`
Quick single-clip clone + render in one call	One Shot Voice
Persist a voice for repeated use	Voice Clone → T2A

Tips

Reference length matters: longer reference clips capture more of the speaker’s prosody. 15+ seconds substantially improves long-form fidelity over 5-second references.
Prosody fidelity: voice extend preserves the cadence and emotional register of the reference — useful for podcast-style continuations where consistency matters more than literal word-for-word voice match.

Voice Dub Voice Emotion

​Authorization

​Request Body

​When to use what

​Tips

Authorization

Request Body

When to use what

Tips