Speech Generation - Geoff API

Text-to-Audio (HTTP)

Generate speech from text with a single HTTP request.

curl --request POST \
  --url https://geoff.ai/api/v1/t2a \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "text": "Hello, welcome to Geoff AI. Let me show you what I can do.",
    "voice_id": "default",
    "format": "mp3"
  }' \
  --output speech.mp3

Async Speech Generation

For longer content, use the async endpoint:

import requests
import time

# 1. Create the task
response = requests.post(
    "https://geoff.ai/api/v1/t2a/async",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "text": "Your long-form content here...",
        "voice_id": "default",
        "format": "mp3",
    },
)
task_id = response.json()["data"]["task_id"]

# 2. Poll for completion
while True:
    status = requests.get(
        f"https://geoff.ai/api/v1/t2a/async/{task_id}",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
    ).json()

    if status["data"]["status"] == "completed":
        audio_url = status["data"]["audio_url"]
        break
    time.sleep(2)

print(f"Audio ready: {audio_url}")

WebSocket Streaming

For real-time audio streaming, use the WebSocket endpoint. See the T2A WebSocket API reference.

Voice Design

When you want a specific voice character but don’t have an audio sample to clone from, design one by dialing in profile parameters. Pair the structured slots (gender, age, pitch, style, emotion, accent, dialect) with an optional free-form description to build a voice from scratch.

import requests

response = requests.post(
    "https://geoff.ai/api/v1/voice/design",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "prompt": "Welcome to the show — let's get started.",
        "gender": "female",
        "age": "young-adult",
        "style": "warm",
        "emotion": "excited",
        "accent": "british",
        "description": "a friendly radio host",
        # Persist the designed voice so subsequent T2A calls can
        # reuse it without re-supplying the parameter profile.
        "save_as": "british_radio_host_1",
    },
)

result = response.json()["data"]
print("Voice id:", result["voice_id"])

Any axis left as auto (or omitted) is filled in by the engine. To let the prose description drive everything, leave the structured slots out entirely:

requests.post(
    "https://geoff.ai/api/v1/voice/design",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "prompt": "The quick brown fox jumps over the lazy dog.",
        "description": "an elderly Scottish fisherman with a gravelly voice",
    },
)

Non-verbal effects

Inline tokens inside prompt are recognised as non-verbal cues and rendered as the named effect rather than spoken aloud:

Grand day! [laughter] this is so wonderful. [clear-throat] you know?

Available tokens: [laughter], [sigh], [breath], [gasp], [chuckle], [clear-throat], [question], [surprise], [whisper], [shouted], [crying].

Reusing a designed voice

After saving with save_as, the voice appears in List Voices under scope=custom with the designed tag. Pass its voice_id to the standard T2A endpoint to render new text in the same voice:

requests.post(
    "https://geoff.ai/api/v1/t2a",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "text": "And we're back with more from the studio.",
        "voice_id": "british_radio_host_1",
        "format": "mp3",
    },
)

See the Voice Design API reference for the full parameter list, including pitch / steps / speed and the language-specific dialect refinements.

​Text-to-Audio (HTTP)

​Async Speech Generation

​WebSocket Streaming

​Voice Design

​Non-verbal effects