POST
/
v1
/
voice
/
one-shot
curl --request POST \
  --url https://geoff.ai/api/v1/voice/one-shot \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "Hello, this is a one-shot rendered clip.",
    "reference_audio_url": "https://files.geoff.ai/audio/sample.wav",
    "language": "en",
    "format": "wav",
    "sample_rate": 24000
  }'
{
  "data": {
    "audio_b64": "UklGRiQAAABXQVZFZm10...",
    "audio_url": "https://files.geoff.ai/output/oneshot_abc123.wav",
    "audio_cid": "bafy...",
    "url": "https://files.geoff.ai/output/oneshot_abc123.wav",
    "type": "audio",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 3.4,
    "voice_id": "one_shot_1716200000",
    "voice": "one_shot_1716200000",
    "provider": "stacknet"
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}
Render text in the voice of a supplied reference clip in a single call. Internally clones the reference and synthesizes the text in that voice — equivalent to chaining voice clone and T2A but without threading a voice_id between requests. Best for one-off “say this in this voice” interactions where the caller doesn’t need to persist the cloned voice for reuse.

Authorization

Authorization
string
required
Bearer token. Bearer API_key.

Request Body

prompt
string
required
Text to render in the cloned voice. Long inputs are chunked automatically.
reference_audio_url
string
URL of the reference clip (3–10 s recommended; mp3 or wav). Mutually exclusive with reference_audio_b64.
reference_audio_b64
string
Base64-encoded reference audio. Use when uploading directly without a URL.
name
string
Optional human-readable name for the cloned voice. A timestamped name is generated when omitted.
language
string
ISO 639-1 language code. Default: en.
prompt_text
string
Advanced: the exact words spoken in the reference clip. Auto- detected via transcription when omitted.
format
string
Output audio format. Options: wav, pcm16. Default: wav.
sample_rate
integer
Audio sample rate in Hz. Options: 16000, 22050, 24000, 44100, 48000. Default: 24000.
curl --request POST \
  --url https://geoff.ai/api/v1/voice/one-shot \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "Hello, this is a one-shot rendered clip.",
    "reference_audio_url": "https://files.geoff.ai/audio/sample.wav",
    "language": "en",
    "format": "wav",
    "sample_rate": 24000
  }'
{
  "data": {
    "audio_b64": "UklGRiQAAABXQVZFZm10...",
    "audio_url": "https://files.geoff.ai/output/oneshot_abc123.wav",
    "audio_cid": "bafy...",
    "url": "https://files.geoff.ai/output/oneshot_abc123.wav",
    "type": "audio",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 3.4,
    "voice_id": "one_shot_1716200000",
    "voice": "one_shot_1716200000",
    "provider": "stacknet"
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081"
}

When to use what

GoalTool
One-off “say this in this voice”one_shot_voice (this endpoint)
Render many lines in the same voiceVoice CloneT2A
Brand voice that needs distribution-quality consistencyVoice Model Training → T2A
No reference clip available; design from parametersVoice Design