Voice Training
Train Voice Model
Train a custom voice from multiple reference audio clips. For single-clip cloning, use Voice Clone instead.
POST
Train a custom voice by averaging across multiple reference clips.
Heavier than single-clip Voice Clone
but more robust for distribution use (e.g. a brand voice that needs
to sound consistent across hundreds of utterances).
The trained voice is persisted to the catalog and appears in
List Voices under
scope=custom.
Use the returned name as the voice_id on subsequent
T2A calls.
Authorization
Bearer token.
Bearer API_key.Request Body
Array of URLs pointing to reference audio clips. 3–10 clips of
5–30 seconds each is the sweet spot. Each clip should be clean
speech with consistent room tone. Pre-process noisy field
recordings with
voice denoise first.
Stable snake-case identifier for the trained voice. Becomes the
voice’s permanent
voice_id for T2A calls. Lowercase letters,
digits, and underscores only.Single-clip alternative
For a quick clone from a single reference clip, use Voice Clone instead — it skips the multi-sample averaging step and returns a usablevoice_id in
seconds.
For parameter-driven voices with no reference audio at all, use
Voice Design.