Skip to main content
Twilio Media Streams forward live call audio to your server over a WebSocket. Because the Realtime API natively accepts G.711 μ-law (audio/pcmu) at 8 kHz, you can pipe Twilio audio straight through without transcoding. A single Realtime connection handles STT, LLM, and TTS, so the bridge server is mostly glue.

Prerequisites

  • Node.js v18 or later
  • ngrok account with a reserved static domain (the free tier is sufficient)
  • Twilio account with a phone number that has Voice capability
  • Inworld account with a Realtime API key

Setup

The steps below walk through the reference implementation in inworld-ai/inworld-api-examples.

1. Clone the example repo

Clone the examples repo and change into the Twilio integration directory:
git clone https://github.com/inworld-ai/inworld-api-examples.git
cd inworld-api-examples/integrations/twilio
The remaining steps are run from this directory.

2. Get your Inworld API key

Sign in to the Inworld Portal, open your workspace, and create an API key with Realtime scope.

3. Get a Twilio phone number

In the Twilio Console, buy a phone number with Voice capability. This is the number callers will dial.

4. Reserve an ngrok static domain

Install ngrok and reserve a free static domain in the ngrok dashboard. A static domain matters here because Twilio’s webhook URL needs to stay stable between restarts. Without one, every new ngrok session changes the tunnel URL and you have to update the Twilio webhook by hand.

5. Configure environment

Copy the example env file and fill in the two required variables:
cp .env.example .env
Set these values in .env:
INWORLD_API_KEY=your_inworld_api_key
SERVER_URL=https://your-ngrok-domain.ngrok-free.app

6. Install and run

Install dependencies:
npm install
Then start ngrok and the dev server in two separate terminals:
ngrok http 3000 --url=your-ngrok-domain.ngrok-free.app
npm run dev

7. Point your Twilio number at the webhook

In the Twilio Console, go to Phone Numbers → your number → Voice Configuration. Set A call comes in to https://your-ngrok-domain.ngrok-free.app/voice with HTTP POST.
ngrok is only needed for local development so Twilio can reach a server running on your machine. Once you deploy the bridge server to production, update the Twilio webhook to point at your server’s public URL (for example, https://voice.yourdomain.com/voice) and you can drop ngrok entirely.

How it works

  • An inbound call hits /voice, and the server responds with TwiML instructing Twilio to open a Media Stream.
  • Twilio opens a WebSocket to /media-stream and begins forwarding call audio.
  • The server shuttles mulaw 8 kHz frames between Twilio and Inworld in both directions. No format conversion is required.
  • On detected user speech, the server clears Twilio’s audio buffer and cancels the in-flight Inworld response so barge-in feels natural.
The TwiML returned from /voice looks like this:
<Response>
  <Connect>
    <Stream url="wss://your-ngrok-domain.ngrok-free.app/media-stream"/>
  </Connect>
</Response>

Test your integration

Call your Twilio number. The bot should greet you and hold a conversation.

Example implementation

Twilio integration example

A complete Node.js reference implementation that bridges Twilio Media Streams to the Realtime API.

Further reading

WebSocket Protocol Reference

Event shapes, audio formats, and session configuration for the Realtime WebSocket API.

Twilio Media Streams

Twilio’s documentation on streaming call audio over WebSockets.
If a call connects but audio never flows, the issue is almost always on the Twilio side. Check the Twilio Media Streams documentation and your webhook configuration first.