
Preview
Fast, lightweight model for text and vision tasks.

Duce
High-performance multimodal with SOTA music generation and Mixture of Models (MoM).

Magma
Maximum capability with 1M context, SOTA video, and Mixture of Models (MoM).
Preview
The fastest model layer, optimized for text and vision tasks.| Property | Value |
|---|---|
| Model ID | preview |
| Context Window | 128K+ |
| Input Modalities | Text, Image |
| Output Modalities | Text |
- Function calling
- Structured output
- Reasoning
Duce
High-performance multimodal model with state-of-the-art music generation and broad media capabilities.| Property | Value |
|---|---|
| Model ID | duce |
| Context Window | 128K+ |
| Input Modalities | Text, Image, Music |
| Output Modalities | Text, Image, Video, Music, Speech |
- SOTA music generation
- Function calling
- Reasoning
- Mixture of Models (MoM)
- Text-to-Speech (TTS)
- Image-to-Image
- Text/Image-to-Video
- Video-to-Video
- Text-to-Music / Music-to-Music
Magma
The most capable model layer with the largest context window and state-of-the-art video generation.| Property | Value |
|---|---|
| Model ID | magma |
| Context Window | 1M |
| Input Modalities | Text, Image, Music |
| Output Modalities | Text, Image, Video, Music, Speech |
- SOTA video generation
- Function calling
- Reasoning
- Mixture of Models (MoM)
- Text-to-Speech (TTS)
- Image-to-Image
- Text/Image-to-Video
- Video-to-Video
- Text-to-Music / Music-to-Music
Comparison
| Feature | Preview | Duce | Magma |
|---|---|---|---|
| Context Window | 128K+ | 128K+ | 1M |
| Text Generation | Yes | Yes | Yes |
| Image Analysis | Yes | Yes | Yes |
| Image Generation | — | Yes | Yes |
| Video Generation | — | Yes | Yes (SOTA) |
| Music Generation | — | Yes (SOTA) | Yes |
| Text-to-Speech | — | Yes | Yes |
| Function Calling | Yes | Yes | Yes |
| Structured Output | Yes | Yes | Yes |
| Reasoning | Yes | Yes | Yes |
| Mixture of Models | — | Yes | Yes |