Realtime TTS-2
Our flagship, top-ranked model — the best quality plus steerability
- Natural language steering for more contextually aware speech
- Support for 200+ languages and locales
- Optimized for real-time use
- High quality instant voice cloning
- Enhanced timestamps with phonetic details and visemes
Realtime TTS 1.5 Max
Rich, expressive speech with maximum stability
- Support for 15 languages
- Optimized for real-time use (<200ms median latency)
- High quality instant voice cloning
Realtime TTS 1.5 Mini
Our ultra-fast model — for when latency is the top priority
- Ultra-low latency (~120ms median latency)
- Support for 15 languages
- High quality instant voice cloning
Models overview
| Name | Model ID | Description | Supported languages |
|---|---|---|---|
| Llama Realtime TTS-2 | inworld-tts-2 | Our newest, most powerful model with natural language steering and stronger multilingual capabilities | 200+ languages and locales — see Languages |
| Llama Realtime TTS 1.5 Max | inworld-tts-1.5-max | High-quality, maximum-stability model with enhanced timestamps | en, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar |
| Llama Realtime TTS 1.5 Mini | inworld-tts-1.5-mini | Ultra-fast, lowest-latency model, with enhanced timestamps | en, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar |
Looking for
inworld-tts-1 or inworld-tts-1-max? These previous-generation models were discontinued on June 15, 2026. Requests to them are now automatically routed to their 1.5 successors (inworld-tts-1 → inworld-tts-1.5-mini, inworld-tts-1-max → inworld-tts-1.5-max). We recommend migrating to inworld-tts-2 to improve quality and latency.