Skip to main content
Inworld provides a family of state-of-the-art TTS models, optimized for different use cases, quality levels, and performance requirements.

Realtime TTS-2

Our flagship, top-ranked model — the best quality plus steerability

  • Natural language steering for more contextually aware speech
  • Support for 200+ languages and locales
  • Optimized for real-time use
  • High quality instant voice cloning
  • Enhanced timestamps with phonetic details and visemes

Realtime TTS 1.5 Max

Rich, expressive speech with maximum stability

  • Support for 15 languages
  • Optimized for real-time use (<200ms median latency)
  • High quality instant voice cloning

Realtime TTS 1.5 Mini

Our ultra-fast model — for when latency is the top priority

  • Ultra-low latency (~120ms median latency)
  • Support for 15 languages
  • High quality instant voice cloning

Models overview

NameModel IDDescriptionSupported languages
Llama Realtime TTS-2inworld-tts-2              Our newest, most powerful model with natural language steering and stronger multilingual capabilities200+ languages and locales — see Languages
Llama Realtime TTS 1.5 Maxinworld-tts-1.5-max              High-quality, maximum-stability model with enhanced timestampsen, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar
Llama Realtime TTS 1.5 Miniinworld-tts-1.5-mini                                Ultra-fast, lowest-latency model, with enhanced timestampsen, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar
Looking for inworld-tts-1 or inworld-tts-1-max? These previous-generation models were discontinued on June 15, 2026. Requests to them are now automatically routed to their 1.5 successors (inworld-tts-1inworld-tts-1.5-mini, inworld-tts-1-maxinworld-tts-1.5-max). We recommend migrating to inworld-tts-2 to improve quality and latency.