Voice Design - Geoff API

curl --request POST \
  --url https://geoff.ai/api/v1/voice/design \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "Grand day! [laughter] this is so wonderful.",
    "gender": "female",
    "age": "young-adult",
    "style": "warm",
    "emotion": "excited",
    "accent": "british",
    "description": "a friendly radio host",
    "format": "wav",
    "sample_rate": 24000
  }' \
  --output designed.wav

{
  "data": {
    "audio_b64": "UklGRiQAAABXQVZFZm10IBAAAAAB...",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 4.3,
    "voice": "british_radio_host_1",
    "voice_id": "british_radio_host_1",
    "provider": "stacknet",
    "instruction": "A young adult female speaker, in a warm style, with a excited tone, speaking with a british accent. a friendly radio host.",
    "saved": true,
    "fallback": false
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081",
  "extra_info": {
    "audio_megabytes": 0.21
  }
}

POST

voice

design

curl --request POST \
  --url https://geoff.ai/api/v1/voice/design \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "Grand day! [laughter] this is so wonderful.",
    "gender": "female",
    "age": "young-adult",
    "style": "warm",
    "emotion": "excited",
    "accent": "british",
    "description": "a friendly radio host",
    "format": "wav",
    "sample_rate": 24000
  }' \
  --output designed.wav

{
  "data": {
    "audio_b64": "UklGRiQAAABXQVZFZm10IBAAAAAB...",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 4.3,
    "voice": "british_radio_host_1",
    "voice_id": "british_radio_host_1",
    "provider": "stacknet",
    "instruction": "A young adult female speaker, in a warm style, with a excited tone, speaking with a british accent. a friendly radio host.",
    "saved": true,
    "fallback": false
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081",
  "extra_info": {
    "audio_megabytes": 0.21
  }
}

Design a custom voice character by dialing in profile parameters (gender, age, pitch, style, emotion, accent, dialect) and optionally a free-form prose description, then render prompt in that voice. Useful when the caller wants a specific voice character but has no audio sample to clone from. Pass save_as to persist the designed voice for reuse — subsequent calls can render arbitrary text in the saved voice via the standard text-to-audio endpoint.

Authorization

string

required

Bearer token. Bearer API_key, can be found in Settings > API Keys.

Request Body

prompt

string

required

Text to render in the designed voice.

description

string

Free-form prose description of the voice — e.g. "warm female radio host with a soft mid-Atlantic accent". Combined with the structured slots below to compose the engine instruction.

gender

string

Voice gender. Options: auto, male, female. Default: auto.

age

string

Apparent speaker age. Options: auto, child, teenager, young-adult, middle-aged, elderly. Default: auto.

pitch

string

Pitch register. Options: auto, very-low, low, moderate, high, very-high. Default: auto.

style

string

Delivery style. Options: auto, neutral, whisper, authoritative, excited, calm, narrator, warm, cheerful. Default: auto.

emotion

string

Emotional tone. Options: neutral, happy, sad, angry, fearful, surprised, calm, excited. Default: neutral.

accent

string

Free-form accent label, e.g. american, british, australian, southern, scottish, indian.

dialect

string

Optional regional dialect refinement, e.g. cantonese, sichuanese, andalusian, received-pronunciation.

speed

number

Playback speed multiplier. Range: 0.5 to 2.0. Default: 1.0.

steps

integer

Sampling steps. Higher values trade latency for fidelity. Range: 8 to 64. Default: 16.

language

string

ISO 639-1 language code. Default: en.

format

string

Output audio format. Options: wav, pcm16. Default: wav.

sample_rate

integer

Audio sample rate in Hz. Options: 16000, 22050, 24000, 44100, 48000. Default: 24000.

save_as

string

Optional human-readable name. When supplied, the designed voice is persisted to the catalog with a stable voice_id so it can be reused via the standard text-to-audio endpoint.

Non-verbal effects

Inline tokens inside prompt are recognised as non-verbal cues and rendered as the named effect rather than spoken aloud. Use them sparingly inside the spoken text.

Token	Effect
`[laughter]`	Laughter
`[sigh]`	Sigh
`[breath]`	Audible breath
`[gasp]`	Sharp inhale
`[chuckle]`	Light chuckle
`[clear-throat]`	Throat clear
`[question]`	Inflected question lift
`[surprise]`	Surprised reaction
`[whisper]`	Whisper segment
`[shouted]`	Raised voice
`[crying]`	Tearful delivery

Example: "Grand day! [laughter] this is so wonderful. [clear-throat] you know?"

Auto behaviour

Any slot left as auto (or omitted) is filled by the engine based on the rest of the profile. Setting all four primary axes (gender, age, pitch, style) to auto and supplying only a free-form description is a valid pattern — the engine infers a coherent voice from the prose.

curl --request POST \
  --url https://geoff.ai/api/v1/voice/design \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "Grand day! [laughter] this is so wonderful.",
    "gender": "female",
    "age": "young-adult",
    "style": "warm",
    "emotion": "excited",
    "accent": "british",
    "description": "a friendly radio host",
    "format": "wav",
    "sample_rate": 24000
  }' \
  --output designed.wav

{
  "data": {
    "audio_b64": "UklGRiQAAABXQVZFZm10IBAAAAAB...",
    "format": "wav",
    "sample_rate": 24000,
    "duration_s": 4.3,
    "voice": "british_radio_host_1",
    "voice_id": "british_radio_host_1",
    "provider": "stacknet",
    "instruction": "A young adult female speaker, in a warm style, with a excited tone, speaking with a british accent. a friendly radio host.",
    "saved": true,
    "fallback": false
  },
  "trace_id": "04ede0ab069fb1ba8be5156a24b1e081",
  "extra_info": {
    "audio_megabytes": 0.21
  }
}

Notes

voice_id is only present in the response when save_as was supplied. Without it, the audio is rendered and returned, but no catalog entry is created.
instruction echoes the composed instruction string sent to the engine — useful for debugging which slot combinations the model actually saw.
fallback: true indicates the dispatcher routed through a profile-matched zero-shot path rather than the native instruct path (transparent to the caller; quality is comparable).
Designed voices saved via save_as round-trip into the catalog and can be enumerated via the Voice List endpoint.

One Shot Voice List Voices

​Authorization

​Request Body

​Non-verbal effects

​Auto behaviour

​Notes

Authorization

Request Body

Non-verbal effects

Auto behaviour

Notes