Lexicon - Inworld AI Documentation

Speech recognition
Language models and reasoning
Knowledge and retrieval
Speech synthesis

Use this lexicon to quickly understand common acronyms and AI terms used in the Runtime.

Speech recognition

Term	Definition
STT (Speech-to-Text)	Converts spoken audio into text. Used to understand player speech.
VAD (Voice Activity Detection)	Detects when speech is present in an audio stream to start/stop sending audio.
AEC (Acoustic Echo Cancellation)	Reduces game audio leaking into the microphone input by cancelling echoes.

Language models and reasoning

Term	Definition
LLM (Large Language Model)	An AI model trained on massive amounts of text to understand and generate human-like language. LLMs can power capabilities like dialog generation, game state changes, reasoning, and more.
Intent	The inferred meaning or purpose behind an input. Often used in the context of inferring whether a user message falls under a certain intent, to trigger some action.

Knowledge and retrieval

Term	Definition
Embedding	A numeric vector representation of text used to measure semantic similarity.
Knowledge	Structured or unstructured information the character can reference during conversation.
RAG (Retrieval-Augmented Generation)	A technique that retrieves relevant knowledge (via embeddings) to ground LLM responses.

Speech synthesis

Term	Definition
TTS (Text to Speech)	Converts generated text into spoken audio. Often paired with an LLM to generate a character’s voice.

Migration Guide Troubleshooting

⌘I