Skip to main content
Connect via WebRTC for browser-native, low-latency voice. A WebRTC proxy bridges your peer connection to the same realtime service used by the WebSocket transport, transcoding OPUS ↔ PCM16 and forwarding events transparently.

Endpoint

https://api.inworld.ai
EndpointMethodDescription
/v1/realtime/callsPOSTSDP offer/answer exchange
/v1/realtime/ice-serversGETSTUN/TURN server configuration

Authentication

Pass your Inworld API key as a Bearer token. The proxy forwards it to the realtime service.
Authorization: Bearer <base64-api-key>
Keep the API key server-side. Serve it to the browser via a backend endpoint (see examples below).

Flow

  1. Fetch config from your server (API key + ICE servers)
  2. Create RTCPeerConnection with ICE servers
  3. Create data channel oai-events + add microphone track
  4. Create SDP offer → POST to /v1/realtime/calls → set SDP answer
  5. Data channel opens → send session.update → start conversation
Audio flows via RTP tracks (no manual encode/decode). Events flow via data channel using the same JSON schema as WebSocket.

Session Config

Same session.update as WebSocket, sent through the data channel. See model, voice, and TTS configuration for details.
dc.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    model: 'openai/gpt-4o-mini',
    instructions: 'You are a concise concierge.',
    output_modalities: ['audio', 'text'],
    audio: {
      input: {
        turn_detection: {
          type: 'semantic_vad',
          eagerness: 'medium',
          create_response: true,
          interrupt_response: true
        }
      },
      output: {
        voice: 'Clive',
        model: 'inworld-tts-1.5-mini',
        speed: 1.0
      }
    }
  }
}));

Audio

Unlike WebSocket (manual base64 PCM), WebRTC handles audio natively:
  • Input: browser captures mic and sends OPUS over RTP automatically
  • Output: proxy sends AI audio back as an RTP track — attach to <audio> to play
pc.ontrack = (e) => {
  const audio = document.createElement('audio');
  audio.autoplay = true;
  audio.srcObject = new MediaStream([e.track]);
  document.body.appendChild(audio);
};
response.output_audio.delta events are not sent through the data channel — audio is delivered via the RTP track instead.

Text & Responses

Same as WebSocket, but sent through the data channel:
dc.send(JSON.stringify({
  type: 'conversation.item.create',
  item: { type: 'message', role: 'user', content: [{ type: 'input_text', text: 'Hello!' }] }
}));
dc.send(JSON.stringify({ type: 'response.create' }));

Events

Same event types as WebSocket, received on the data channel.

Option 1: Direct WebRTC

Server — serves the page and a /api/config endpoint that fetches ICE servers and keeps the API key hidden:
import 'dotenv/config';
import { readFileSync } from 'fs';
import { createServer } from 'http';

const html = readFileSync('index.html');
const API_KEY = process.env.INWORLD_API_KEY || '';
const PROXY = 'https://api.inworld.ai';

const server = createServer(async (req, res) => {
  if (req.url === '/api/config') {
    let ice = [];
    try {
      const r = await fetch(`${PROXY}/v1/realtime/ice-servers`, {
        headers: { Authorization: `Bearer ${API_KEY}` },
      });
      if (r.ok) ice = (await r.json()).ice_servers || [];
    } catch {}
    res.writeHead(200, { 'Content-Type': 'application/json' });
    res.end(JSON.stringify({ api_key: API_KEY, ice_servers: ice, url: `${PROXY}/v1/realtime/calls` }));
    return;
  }
  res.writeHead(200, { 'Content-Type': 'text/html' });
  res.end(html);
});
let port = 3000;
server.on('error', (e) => {
  if (e.code === 'EADDRINUSE') { console.warn(`Port ${port} in use, trying ${++port}…`); server.listen(port); }
  else throw e;
});
server.listen(port, () => console.log(`http://localhost:${port}`));
Client — full WebRTC flow in the browser:
const cfg = await (await fetch('/api/config')).json();
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

const pc = new RTCPeerConnection({ iceServers: cfg.ice_servers });
const dc = pc.createDataChannel('oai-events', { ordered: true });
stream.getAudioTracks().forEach(t => pc.addTrack(t, stream));

pc.ontrack = (e) => {
  const audio = document.createElement('audio');
  audio.autoplay = true;
  audio.srcObject = new MediaStream([e.track]);
  document.body.appendChild(audio);
};

dc.onopen = () => {
  dc.send(JSON.stringify({
    type: 'session.update',
    session: {
      type: 'realtime',
      model: 'openai/gpt-4o-mini',
      instructions: 'You are a helpful voice assistant.',
      output_modalities: ['audio', 'text'],
      audio: {
        input: { turn_detection: { type: 'semantic_vad', eagerness: 'medium', create_response: true, interrupt_response: true } },
        output: { voice: 'Clive', model: 'inworld-tts-1.5-mini' }
      }
    }
  }));
};

dc.onmessage = (e) => {
  const msg = JSON.parse(e.data);
  if (msg.type === 'response.output_text.delta') console.log(msg.delta);
  if (msg.type === 'error') console.error(msg.error?.message);
};

const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// wait for ICE gathering...
const res = await fetch(cfg.url, {
  method: 'POST',
  headers: { 'Content-Type': 'application/sdp', Authorization: `Bearer ${cfg.api_key}` },
  body: pc.localDescription.sdp,
});
await pc.setRemoteDescription({ type: 'answer', sdp: await res.text() });

Option 2: OpenAI Agents SDK

The OpenAI Agents SDK manages the full WebRTC lifecycle — peer connection, SDP exchange, mic, and audio playback:
import { RealtimeSession, RealtimeAgent, OpenAIRealtimeWebRTC } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
  name: 'assistant',
  instructions: 'You are a helpful voice assistant.',
  model: 'openai/gpt-4o-mini',
});

const cfg = await (await fetch('/api/config')).json();
const audioEl = document.createElement('audio');
audioEl.autoplay = true;

const session = new RealtimeSession(agent, {
  transport: new OpenAIRealtimeWebRTC({
    useInsecureApiKey: true,
    audioElement: audioEl,
    changePeerConnection: async (pc) => {
      if (cfg.ice_servers?.length) pc.setConfiguration({ iceServers: cfg.ice_servers });
      return pc;
    },
  }),
  model: 'gpt-4o-realtime-preview-2025-06-03',
});

await session.connect({ url: cfg.url, apiKey: cfg.api_key });
session.sendMessage('Hello!');
The server-side /api/config endpoint is identical to Option 1.

WebSocket vs WebRTC

WebSocketWebRTC
AudioPCM16 base64 (manual)OPUS via RTP (native)
LatencyHigherLower (UDP)
NATNot neededICE (STUN/TURN)
EventsWS messagesDataChannel (same schema)
Best forServer-side / Node.jsBrowser voice apps
API reference for full event schemas.