Voice Tools
Voice Extend
Continue an audio clip in the same voice. Pass a short reference plus the new text; returns audio that sounds like the original speaker speaking the new content.
POST
Continue an audio clip in the same voice — passes a short reference
clip plus the new text and returns audio that sounds like the
original speaker speaking the new content. Lower-friction than
cloning + synthesizing in two calls when the goal is a single
contiguous-feeling clip.
Authorization
Bearer token.
Bearer API_key.Request Body
Source audio URL providing voice + style. 5–30 seconds is the
sweet spot.
New text to render in the source voice.
Output audio format. Options:
wav, mp3. Default: wav.When to use what
| Goal | Tool |
|---|---|
| Continue an existing clip in the same voice | voice_extend (this endpoint) |
| Render arbitrary text in a saved catalog voice | T2A with voice_id |
| Quick single-clip clone + render in one call | One Shot Voice |
| Persist a voice for repeated use | Voice Clone → T2A |
Tips
- Reference length matters: longer reference clips capture more of the speaker’s prosody. 15+ seconds substantially improves long-form fidelity over 5-second references.
- Prosody fidelity: voice extend preserves the cadence and emotional register of the reference — useful for podcast-style continuations where consistency matters more than literal word-for-word voice match.