WebRTC is ideal for browser voice apps with low latency. For server-side integrations, see the WebSocket Quickstart.
Get Started
Create an API key
Create an Inworld account.In Inworld Portal, generate an API key by going to Settings > API Keys. Copy the Base64 credentials.
Create a

.env file:.env
Create the server
Create
server.js. It serves the page and provides a /api/config endpoint that fetches ICE servers from the WebRTC proxy while keeping the API key server-side.server.js
Create the frontend
Create
index.html in the same directory. It connects via WebRTC, streams mic audio automatically, and plays agent audio through an RTP track.index.html
Install and run
Option 2: Using OpenAI Agents SDK
If you’re building a more advanced voice agent with features like agent handoffs, tool calling, and guardrails, you can use the OpenAI Agents SDK with Inworld’s WebRTC proxy. We provide a ready-to-run playground based on OpenAI’s realtime agents demo.Clone the playground
If you are unable to access this repository, please contact support@inworld.ai for access.
Configure the API key
Open
.env and set OPENAI_API_KEY to your Inworld API key (the same Base64 credentials from Inworld Portal):.env
Run
- Chat-Supervisor — A realtime chat agent handles basic conversation while a more capable text model (e.g.
gpt-4.1) handles tool calls and complex responses. - Sequential Handoff — Specialized agents transfer the user between them to handle specific intents (e.g. authentication → returns → sales).
How It Works
| Component | Role |
|---|---|
| Browser | Captures mic audio via WebRTC, plays agent audio from RTP track |
| Node.js server | Serves the page and /api/config (ICE servers + API key) |
| WebRTC proxy | Bridges WebRTC ↔ WebSocket, transcodes OPUS ↔ PCM16 |
| Inworld Realtime API | Handles speech-to-text, LLM processing, and text-to-speech |
- Audio flows via RTP tracks (no base64 encoding)
- Events flow via DataChannel (same JSON schema)
- Browser handles OPUS codec natively