Speech recognition
| Term | Definition |
|---|---|
| STT (Speech-to-Text) | Converts spoken audio into text. Used to understand player speech. |
| VAD (Voice Activity Detection) | Detects when speech is present in an audio stream to start/stop sending audio. |
| AEC (Acoustic Echo Cancellation) | Reduces game audio leaking into the microphone input by cancelling echoes. |
Language models and reasoning
| Term | Definition |
|---|---|
| LLM (Large Language Model) | An AI model trained on massive amounts of text to understand and generate human-like language. LLMs can power capabilities like dialog generation, game state changes, reasoning, and more. |
| Intent | The inferred meaning or purpose behind an input. Often used in the context of inferring whether a user message falls under a certain intent, to trigger some action. |
Knowledge and retrieval
| Term | Definition |
|---|---|
| Embedding | A numeric vector representation of text used to measure semantic similarity. |
| Knowledge | Structured or unstructured information the character can reference during conversation. |
| RAG (Retrieval-Augmented Generation) | A technique that retrieves relevant knowledge (via embeddings) to ground LLM responses. |
Speech synthesis
| Term | Definition |
|---|---|
| TTS (Text to Speech) | Converts generated text into spoken audio. Often paired with an LLM to generate a character’s voice. |