Inworld’s platform provides access to a wide variety of state-of-the-art models. These models offer diverse capabilities, performance levels, price points, and deployment options, enabling users to select and customize models that best match their specific use cases and application needs.
This section provides some high-level context on Inworld’s model offerings, and how they can be used in your application.
TTS: Text-to-Speech models can be used to generate high-quality audio for your application, such as powering a character’s voice.
LLM: Large Language Models are powerful models that can intake inputs (typically text, but certain models may also support other modalities) and generate text outputs. These models can be used to determine in-game actions, power conversations, generate dynamic narratives, and more.
Embeddings: Embeddings models convert text into high-dimensional vectors, which can be used to power intent detection, text similarity comparison, and retrieval-augmented generation (RAG).
Inworld’s Agent Runtime and API offer access to Inworld’s family of state-of-the-art TTS models, optimized for different use cases, quality levels, and performance requirements.
High-quality, maximum-stability model with enhanced timestamps
en, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar
Llama Realtime TTS 1.5 Mini
inworld-tts-1.5-mini
Ultra-fast, lowest-latency model, with enhanced timestamps
en, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar
Looking for inworld-tts-1 or inworld-tts-1-max? These previous-generation models were discontinued on June 15, 2026. Requests to them are now automatically routed to their 1.5 successors (inworld-tts-1 → inworld-tts-1.5-mini, inworld-tts-1-max → inworld-tts-1.5-max). We recommend migrating to inworld-tts-2 to improve quality and latency.
You may not violate the terms of service or policies of third-party model providers using Inworld’s platform or your account will be subject to deactivation.