Intro to Realtime API (Speech-to-Speech) - Inworld AI Documentation

Inworld’s Realtime API (Speech-to-Speech) enables low-latency, speech-to-speech interactions with voice agents. The API follows the OpenAI Realtime protocol, extended to enable additional customization.

WebSocket Quickstart

Build a voice agent with WebSocket, mic input, and audio playback.

WebRTC Quickstart

Build a voice agent with browser-native WebRTC — no manual audio encoding.

API reference

See the full event schemas for the Realtime API.

JS examples

JavaScript examples for the Realtime API.

Python examples

Python examples for the Realtime API.

Using AI to code? Paste https://docs.inworld.ai/llms.txt into your assistant so it knows every page on this site. Want live search? Add the MCP server.

Key Features

WebSocket and WebRTC transports: Connect over WebSocket or WebRTC with a standard event schema.
Automatic interruption-handling and turn-taking: Your agent will manage conversations naturally and be resilient to user barge-in.
Conversational awareness: With Realtime TTS-2, the model conditions on the audio of prior conversational turns. A line delivered after a joke lands differently than the same line delivered after bad news. The model hears the difference and adjusts how it speaks based on how it was spoken to.
Router support: Utilize Realtime Router to enable a single agent to dynamically handle different user cohorts, or to facilitate A/B tests.
OpenAI compatibility: Drop-in replacement for the OpenAI Realtime API with a simple migration path.

Guides

Using realtime models

Configure sessions, send input, and orchestrate responses.

Managing conversations

Session lifecycle and conversation events.

OpenAI migration

Step-by-step guide to switch from OpenAI to Inworld.

See the API reference for full event schemas.

⌘I

WebSocket Quickstart

WebRTC Quickstart

API reference

JS examples

Python examples

​Key Features

​Guides

Using realtime models

Managing conversations

OpenAI migration

Key Features

Guides