Text-to-Audio (HTTP)
Generate speech from text with a single HTTP request.Async Speech Generation
For longer content, use the async endpoint:WebSocket Streaming
For real-time audio streaming, use the WebSocket endpoint. See the T2A WebSocket API reference.Voice Design
When you want a specific voice character but don’t have an audio sample to clone from, design one by dialing in profile parameters. Pair the structured slots (gender, age, pitch, style, emotion, accent, dialect) with an optional free-formdescription to build a
voice from scratch.
auto (or omitted) is filled in by the engine. To
let the prose description drive everything, leave the structured
slots out entirely:
Non-verbal effects
Inline tokens insideprompt are recognised as non-verbal cues and
rendered as the named effect rather than spoken aloud:
[laughter], [sigh], [breath], [gasp],
[chuckle], [clear-throat], [question], [surprise],
[whisper], [shouted], [crying].
Reusing a designed voice
After saving withsave_as, the voice appears in
List Voices under scope=custom
with the designed tag. Pass its voice_id to the standard T2A
endpoint to render new text in the same voice: