Training Custom Models

Geoff supports training lightweight customizations on top of the built-in models. There are four training endpoints, each producing a named asset you can pass back into the matching generation tool:

Endpoint	Produces	Use with
Image LoRA	Style or subject adapter for image generation	Image generation tools that accept `lora_name`
Video LoRA	Motion or style adapter for video generation	Video tools that accept `lora_name`
Music LoRA	Genre / style adapter for music generation	Music generation with `lora_name`
Voice Model	Cloned voice persisted to the catalog	T2A with the returned `voice_id`

Training is asynchronous — submit the job, then poll for completion. The response from each create call includes a stable name (the one you supplied) that becomes the asset’s identifier for downstream calls.

Image LoRA

Train a style or subject adapter from a dataset of images.

import requests

response = requests.post(
    "https://geoff.ai/api/v1/training/lora/image",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "dataset": "https://files.geoff.ai/datasets/my_dataset.zip",
        "name": "my_brand_style",
    },
)
task_id = response.json()["data"]["task_id"]

The dataset can be a URL to a zip of training images, or a file id returned from the file upload endpoint. Once training completes, pass lora_name: "my_brand_style" to any image generation tool that supports LoRA stacking.

Video LoRA

Same shape as image LoRA, on a video dataset:

curl --request POST \
  --url https://geoff.ai/api/v1/training/lora/video \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "dataset": "https://files.geoff.ai/datasets/my_motion_clips.zip",
    "name": "my_motion_style"
  }'

Video LoRAs capture motion patterns or stylistic treatments. Pair with generate_video_from_text or generate_video_from_image by passing lora_name: "my_motion_style".

Music LoRA

Train a music style adapter — genre, instrumentation, or production treatment — from a corpus of reference tracks.

import requests

response = requests.post(
    "https://geoff.ai/api/v1/training/lora/music",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "dataset": "https://files.geoff.ai/datasets/my_genre_pack.zip",
        "name": "lofi_2026",
    },
)

Once trained, reference the LoRA in music generation by passing lora_name: "lofi_2026".

Voice Model

Voice training is the heaviest of the four — it captures a speaker’s timbre across many samples rather than a single clip. For quick single-clip cloning, use Voice Clone instead.

response = requests.post(
    "https://geoff.ai/api/v1/training/voice",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "reference_urls": [
            "https://files.geoff.ai/voices/sample1.wav",
            "https://files.geoff.ai/voices/sample2.wav",
            "https://files.geoff.ai/voices/sample3.wav"
        ],
        "name": "my_brand_voice"
    },
)

The trained voice appears in the catalog under List Voices and can be passed as voice_id to any text-to-audio call.

Checking training status

All four endpoints return a task_id. Poll status the same way as other long-running jobs:

import time
import requests

while True:
    status = requests.get(
        f"https://geoff.ai/api/v1/training/status/{task_id}",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
    ).json()

    state = status["data"]["status"]
    print(f"progress: {status['data'].get('progress', 0)}%  state: {state}")
    if state in ("completed", "failed"):
        break
    time.sleep(30)

if state == "completed":
    print("Asset name:", status["data"]["name"])

Tips

Dataset size: 10–30 reference items is the sweet spot for style adapters. More isn’t always better — diversity matters more than count.
Naming: choose stable snake-case names; the name you supply is what you’ll pass to generation calls forever.
Voice samples: 3–10 clips of 5–30 seconds each, clean audio, consistent room tone. Pre-process noisy field recordings with voice denoise first.
Costs: training jobs are metered per-step. Watch the status response’s cost_estimate field while the job runs.

​Image LoRA

​Video LoRA

​Music LoRA

​Voice Model

​Checking training status