audio/pcmu) at 8 kHz, you can pipe Twilio audio straight through without transcoding. A single Realtime connection handles STT, LLM, and TTS, so the bridge server is mostly glue.
Prerequisites
- Node.js v18 or later
- ngrok account with a reserved static domain (the free tier is sufficient)
- Twilio account with a phone number that has Voice capability
- Inworld account with a Realtime API key
Setup
The steps below walk through the reference implementation in inworld-ai/inworld-api-examples.1. Clone the example repo
Clone the examples repo and change into the Twilio integration directory:2. Get your Inworld API key
Sign in to the Inworld Portal, open your workspace, and create an API key with Realtime scope.3. Get a Twilio phone number
In the Twilio Console, buy a phone number with Voice capability. This is the number callers will dial.4. Reserve an ngrok static domain
Install ngrok and reserve a free static domain in the ngrok dashboard. A static domain matters here because Twilio’s webhook URL needs to stay stable between restarts. Without one, every new ngrok session changes the tunnel URL and you have to update the Twilio webhook by hand.5. Configure environment
Copy the example env file and fill in the two required variables:.env:
6. Install and run
Install dependencies:7. Point your Twilio number at the webhook
In the Twilio Console, go to Phone Numbers → your number → Voice Configuration. Set A call comes in tohttps://your-ngrok-domain.ngrok-free.app/voice with HTTP POST.
ngrok is only needed for local development so Twilio can reach a server running on your machine. Once you deploy the bridge server to production, update the Twilio webhook to point at your server’s public URL (for example,
https://voice.yourdomain.com/voice) and you can drop ngrok entirely.How it works
- An inbound call hits
/voice, and the server responds with TwiML instructing Twilio to open a Media Stream. - Twilio opens a WebSocket to
/media-streamand begins forwarding call audio. - The server shuttles mulaw 8 kHz frames between Twilio and Inworld in both directions. No format conversion is required.
- On detected user speech, the server clears Twilio’s audio buffer and cancels the in-flight Inworld response so barge-in feels natural.
/voice looks like this:
Test your integration
Call your Twilio number. The bot should greet you and hold a conversation.Example implementation
Twilio integration example
A complete Node.js reference implementation that bridges Twilio Media Streams to the Realtime API.
Further reading
WebSocket Protocol Reference
Event shapes, audio formats, and session configuration for the Realtime WebSocket API.
Twilio Media Streams
Twilio’s documentation on streaming call audio over WebSockets.
If a call connects but audio never flows, the issue is almost always on the Twilio side. Check the Twilio Media Streams documentation and your webhook configuration first.