Connect via WebRTC for browser-native, low-latency voice. A WebRTC proxy bridges your peer connection to the same realtime service used by the WebSocket transport, transcoding OPUS ↔ PCM16 and forwarding events transparently.
Endpoint
| Endpoint | Method | Description |
|---|
/v1/realtime/calls | POST | SDP offer/answer exchange |
/v1/realtime/ice-servers | GET | STUN/TURN server configuration |
Authentication
Pass your Inworld API key as a Bearer token. The proxy forwards it to the realtime service.
Authorization: Bearer <base64-api-key>
Keep the API key server-side. Serve it to the browser via a backend endpoint (see examples below).
Flow
- Fetch config from your server (API key + ICE servers)
- Create
RTCPeerConnection with ICE servers
- Create data channel
oai-events + add microphone track
- Create SDP offer → POST to
/v1/realtime/calls → set SDP answer
- Data channel opens → send
session.update → start conversation
Audio flows via RTP tracks (no manual encode/decode). Events flow via data channel using the same JSON schema as WebSocket.
Session Config
Same session.update as WebSocket, sent through the data channel. See model, voice, and TTS configuration for details.
dc.send(JSON.stringify({
type: 'session.update',
session: {
type: 'realtime',
model: 'openai/gpt-4o-mini',
instructions: 'You are a concise concierge.',
output_modalities: ['audio', 'text'],
audio: {
input: {
turn_detection: {
type: 'semantic_vad',
eagerness: 'medium',
create_response: true,
interrupt_response: true
}
},
output: {
voice: 'Clive',
model: 'inworld-tts-1.5-mini',
speed: 1.0
}
}
}
}));
Audio
Unlike WebSocket (manual base64 PCM), WebRTC handles audio natively:
- Input: browser captures mic and sends OPUS over RTP automatically
- Output: proxy sends AI audio back as an RTP track — attach to
<audio> to play
pc.ontrack = (e) => {
const audio = document.createElement('audio');
audio.autoplay = true;
audio.srcObject = new MediaStream([e.track]);
document.body.appendChild(audio);
};
response.output_audio.delta events are not sent through the data channel — audio is delivered via the RTP track instead.
Text & Responses
Same as WebSocket, but sent through the data channel:
dc.send(JSON.stringify({
type: 'conversation.item.create',
item: { type: 'message', role: 'user', content: [{ type: 'input_text', text: 'Hello!' }] }
}));
dc.send(JSON.stringify({ type: 'response.create' }));
Events
Same event types as WebSocket, received on the data channel.
Option 1: Direct WebRTC
Server — serves the page and a /api/config endpoint that fetches ICE servers and keeps the API key hidden:
import 'dotenv/config';
import { readFileSync } from 'fs';
import { createServer } from 'http';
const html = readFileSync('index.html');
const API_KEY = process.env.INWORLD_API_KEY || '';
const PROXY = 'https://api.inworld.ai';
const server = createServer(async (req, res) => {
if (req.url === '/api/config') {
let ice = [];
try {
const r = await fetch(`${PROXY}/v1/realtime/ice-servers`, {
headers: { Authorization: `Bearer ${API_KEY}` },
});
if (r.ok) ice = (await r.json()).ice_servers || [];
} catch {}
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ api_key: API_KEY, ice_servers: ice, url: `${PROXY}/v1/realtime/calls` }));
return;
}
res.writeHead(200, { 'Content-Type': 'text/html' });
res.end(html);
});
let port = 3000;
server.on('error', (e) => {
if (e.code === 'EADDRINUSE') { console.warn(`Port ${port} in use, trying ${++port}…`); server.listen(port); }
else throw e;
});
server.listen(port, () => console.log(`http://localhost:${port}`));
Client — full WebRTC flow in the browser:
const cfg = await (await fetch('/api/config')).json();
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const pc = new RTCPeerConnection({ iceServers: cfg.ice_servers });
const dc = pc.createDataChannel('oai-events', { ordered: true });
stream.getAudioTracks().forEach(t => pc.addTrack(t, stream));
pc.ontrack = (e) => {
const audio = document.createElement('audio');
audio.autoplay = true;
audio.srcObject = new MediaStream([e.track]);
document.body.appendChild(audio);
};
dc.onopen = () => {
dc.send(JSON.stringify({
type: 'session.update',
session: {
type: 'realtime',
model: 'openai/gpt-4o-mini',
instructions: 'You are a helpful voice assistant.',
output_modalities: ['audio', 'text'],
audio: {
input: { turn_detection: { type: 'semantic_vad', eagerness: 'medium', create_response: true, interrupt_response: true } },
output: { voice: 'Clive', model: 'inworld-tts-1.5-mini' }
}
}
}));
};
dc.onmessage = (e) => {
const msg = JSON.parse(e.data);
if (msg.type === 'response.output_text.delta') console.log(msg.delta);
if (msg.type === 'error') console.error(msg.error?.message);
};
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// wait for ICE gathering...
const res = await fetch(cfg.url, {
method: 'POST',
headers: { 'Content-Type': 'application/sdp', Authorization: `Bearer ${cfg.api_key}` },
body: pc.localDescription.sdp,
});
await pc.setRemoteDescription({ type: 'answer', sdp: await res.text() });
Option 2: OpenAI Agents SDK
The OpenAI Agents SDK manages the full WebRTC lifecycle — peer connection, SDP exchange, mic, and audio playback:
import { RealtimeSession, RealtimeAgent, OpenAIRealtimeWebRTC } from '@openai/agents/realtime';
const agent = new RealtimeAgent({
name: 'assistant',
instructions: 'You are a helpful voice assistant.',
model: 'openai/gpt-4o-mini',
});
const cfg = await (await fetch('/api/config')).json();
const audioEl = document.createElement('audio');
audioEl.autoplay = true;
const session = new RealtimeSession(agent, {
transport: new OpenAIRealtimeWebRTC({
useInsecureApiKey: true,
audioElement: audioEl,
changePeerConnection: async (pc) => {
if (cfg.ice_servers?.length) pc.setConfiguration({ iceServers: cfg.ice_servers });
return pc;
},
}),
model: 'gpt-4o-realtime-preview-2025-06-03',
});
await session.connect({ url: cfg.url, apiKey: cfg.api_key });
session.sendMessage('Hello!');
The server-side /api/config endpoint is identical to Option 1.
WebSocket vs WebRTC
| WebSocket | WebRTC |
|---|
| Audio | PCM16 base64 (manual) | OPUS via RTP (native) |
| Latency | Higher | Lower (UDP) |
| NAT | Not needed | ICE (STUN/TURN) |
| Events | WS messages | DataChannel (same schema) |
| Best for | Server-side / Node.js | Browser voice apps |
API reference for full event schemas.