POST
/
v1
/
voice
/
design
curl --request POST \
  --url https://geoff.ai/api/v1/voice/design \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "Grand day! [laughter] this is so wonderful.",
    "gender": "female",
    "age": "young-adult",
    "style": "warm",
    "emotion": "excited",
    "accent": "british",
    "description": "a friendly radio host",
    "format": "wav",
    "sample_rate": 24000
  }' \
  --output designed.wav
{
  "data": {
    "audio_b64": "UklGRiQAAABXQVZFZm10IBAAAAAB...",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 4.3,
    "voice": "british_radio_host_1",
    "voice_id": "british_radio_host_1",
    "provider": "stacknet",
    "instruction": "A young adult female speaker, in a warm style, with a excited tone, speaking with a british accent. a friendly radio host.",
    "saved": true,
    "fallback": false
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081",
  "extra_info": {
    "audio_megabytes": 0.21
  }
}
Design a custom voice character by dialing in profile parameters (gender, age, pitch, style, emotion, accent, dialect) and optionally a free-form prose description, then render prompt in that voice. Useful when the caller wants a specific voice character but has no audio sample to clone from. Pass save_as to persist the designed voice for reuse — subsequent calls can render arbitrary text in the saved voice via the standard text-to-audio endpoint.

Authorization

Authorization
string
required
Bearer token. Bearer API_key, can be found in Settings > API Keys.

Request Body

prompt
string
required
Text to render in the designed voice.
description
string
Free-form prose description of the voice — e.g. "warm female radio host with a soft mid-Atlantic accent". Combined with the structured slots below to compose the engine instruction.
gender
string
Voice gender. Options: auto, male, female. Default: auto.
age
string
Apparent speaker age. Options: auto, child, teenager, young-adult, middle-aged, elderly. Default: auto.
pitch
string
Pitch register. Options: auto, very-low, low, moderate, high, very-high. Default: auto.
style
string
Delivery style. Options: auto, neutral, whisper, authoritative, excited, calm, narrator, warm, cheerful. Default: auto.
emotion
string
Emotional tone. Options: neutral, happy, sad, angry, fearful, surprised, calm, excited. Default: neutral.
accent
string
Free-form accent label, e.g. american, british, australian, southern, scottish, indian.
dialect
string
Optional regional dialect refinement, e.g. cantonese, sichuanese, andalusian, received-pronunciation.
speed
number
Playback speed multiplier. Range: 0.5 to 2.0. Default: 1.0.
steps
integer
Sampling steps. Higher values trade latency for fidelity. Range: 8 to 64. Default: 16.
language
string
ISO 639-1 language code. Default: en.
format
string
Output audio format. Options: wav, pcm16. Default: wav.
sample_rate
integer
Audio sample rate in Hz. Options: 16000, 22050, 24000, 44100, 48000. Default: 24000.
save_as
string
Optional human-readable name. When supplied, the designed voice is persisted to the catalog with a stable voice_id so it can be reused via the standard text-to-audio endpoint.

Non-verbal effects

Inline tokens inside prompt are recognised as non-verbal cues and rendered as the named effect rather than spoken aloud. Use them sparingly inside the spoken text.
TokenEffect
[laughter]Laughter
[sigh]Sigh
[breath]Audible breath
[gasp]Sharp inhale
[chuckle]Light chuckle
[clear-throat]Throat clear
[question]Inflected question lift
[surprise]Surprised reaction
[whisper]Whisper segment
[shouted]Raised voice
[crying]Tearful delivery
Example: "Grand day! [laughter] this is so wonderful. [clear-throat] you know?"

Auto behaviour

Any slot left as auto (or omitted) is filled by the engine based on the rest of the profile. Setting all four primary axes (gender, age, pitch, style) to auto and supplying only a free-form description is a valid pattern — the engine infers a coherent voice from the prose.
curl --request POST \
  --url https://geoff.ai/api/v1/voice/design \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "Grand day! [laughter] this is so wonderful.",
    "gender": "female",
    "age": "young-adult",
    "style": "warm",
    "emotion": "excited",
    "accent": "british",
    "description": "a friendly radio host",
    "format": "wav",
    "sample_rate": 24000
  }' \
  --output designed.wav
{
  "data": {
    "audio_b64": "UklGRiQAAABXQVZFZm10IBAAAAAB...",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 4.3,
    "voice": "british_radio_host_1",
    "voice_id": "british_radio_host_1",
    "provider": "stacknet",
    "instruction": "A young adult female speaker, in a warm style, with a excited tone, speaking with a british accent. a friendly radio host.",
    "saved": true,
    "fallback": false
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081",
  "extra_info": {
    "audio_megabytes": 0.21
  }
}

Notes

  • voice_id is only present in the response when save_as was supplied. Without it, the audio is rendered and returned, but no catalog entry is created.
  • instruction echoes the composed instruction string sent to the engine — useful for debugging which slot combinations the model actually saw.
  • fallback: true indicates the dispatcher routed through a profile-matched zero-shot path rather than the native instruct path (transparent to the caller; quality is comparable).
  • Designed voices saved via save_as round-trip into the catalog and can be enumerated via the Voice List endpoint.