Overview
The Realtime WebRTC API has three parts:
Signaling endpoint — Used to negotiate the WebRTC connection via SDP offer/answer exchange.
POST https://api.inworld.ai/v1/realtime/calls
ICE servers — STUN/TURN server configurations for NAT traversal and reliable connectivity.
GET https://api.inworld.ai/v1/realtime/ice-servers
Data channel events — Once the WebRTC peer connection is established, all Realtime events flow over a data channel named oai-events.
Examples
Get started quickly with these reference implementations: JavaScript and Python .
Authentication
Use your API key for authentication. See Authentication for details.
Authorization: Bearer <API_KEY>
Signaling Endpoints
Create Call
Creates a WebRTC call by posting an SDP offer and optional session configuration. Returns the server’s SDP answer.
Request body:
Field Type Required Description sdpstring Yes SDP offer generated by the client RTCPeerConnection. sessionSession No Initial session configuration.
Response body:
Field Type Description idstring Server-assigned call identifier. sdpstring SDP answer returned by the server. ice_serversobject[] Array of ICE server configurations (same schema as the Get ICE Servers response).
{
"sdp" : "v=0 \r\n o=- 4611731400430051336 2 IN IP4 127.0.0.1 \r\n ..." ,
"session" : {
"model" : "llama-3.3-70b-versatile" ,
"instructions" : "You are a helpful assistant." ,
"output_modalities" : [ "audio" , "text" ],
"audio" : {
"input" : {
"transcription" : {
"model" : "assemblyai/universal-streaming-multilingual"
},
"turn_detection" : {
"type" : "semantic_vad" ,
"eagerness" : "medium"
}
},
"output" : {
"voice" : "Dennis" ,
"speed" : 1.0
}
}
}
}
{
"id" : "call_abc123" ,
"sdp" : "v=0 \r\n o=- 1234567890 2 IN IP4 0.0.0.0 \r\n ..." ,
"ice_servers" : [
{
"urls" : [ "stun:stun.l.google.com:19302" ]
},
{
"urls" : [ "turn:turn.example.com:3478" ],
"username" : "<TURN_USERNAME>" ,
"credential" : "<TURN_CREDENTIAL>"
}
]
}
Get ICE Servers
Returns STUN and TURN server configurations for WebRTC connectivity. Use these ICE servers when creating the RTCPeerConnection to ensure reliable connections across NATs and firewalls.
Response body:
Field Type Description ice_serversobject[] Array of ICE server configurations. ice_servers[].urlsstring[] STUN or TURN server URLs. ice_servers[].usernamestring TURN credential username (only for TURN servers). ice_servers[].credentialstring TURN credential (only for TURN servers).
{
"ice_servers" : [
{
"urls" : [
"stun:stun.l.google.com:19302" ,
"stun:stun1.l.google.com:19302"
]
},
{
"urls" : [
"turn:34.41.153.85:3478" ,
"turn:34.41.153.85:3479?transport=tcp"
],
"username" : "1772761055:6e3fa6ea-2ed0-4306-971e-aa5092cb3736" ,
"credential" : "6eBPxGW2nsktPFzjjbJSF5PK8ow="
}
]
}
Data Channel Events
Once the WebRTC connection is established, events are exchanged as JSON messages over the oai-events data channel. The event protocol is the same as the Realtime WebSocket API .
Client Events
Events sent from the client to the server.
session.update
Update the session configuration. The server responds with a session.updated event.
Field Type Required Description typesession.update Yes Event type. event_idstring No Optional client-generated event ID. sessionSession Yes Session configuration to apply.
{
"type" : "session.update" ,
"session" : {
"instructions" : "You are a friendly voice assistant." ,
"audio" : {
"input" : {
"transcription" : {
"model" : "assemblyai/universal-streaming-multilingual"
},
"turn_detection" : {
"type" : "semantic_vad" ,
"eagerness" : "medium" ,
"create_response" : true ,
"interrupt_response" : true
}
},
"output" : {
"voice" : "Dennis" ,
"speed" : 1.0
}
}
}
}
conversation.item.create
Add a conversation item (message, function call result, etc.).
Field Type Required Description typeconversation.item.create Yes Event type. event_idstring No Optional client-generated event ID. previous_item_idstring No Insert after this item ID. itemConversationItem Yes The item to add.
{
"type" : "conversation.item.create" ,
"item" : {
"type" : "message" ,
"role" : "user" ,
"content" : [
{ "type" : "input_text" , "text" : "Hello, how are you?" }
]
}
}
conversation.item.truncate
Truncate an assistant message’s audio.
Field Type Required Description typeconversation.item.truncate Yes Event type. event_idstring No Optional client-generated event ID. item_idstring Yes The ID of the assistant message item to truncate. content_indexinteger Yes Index of the content part to truncate. audio_end_msinteger Yes Millisecond offset to truncate the audio at.
conversation.item.delete
Delete a conversation item by ID.
Field Type Required Description typeconversation.item.delete Yes Event type. event_idstring No Optional client-generated event ID. item_idstring Yes The ID of the conversation item to delete.
conversation.item.retrieve
Retrieve a conversation item by ID.
Field Type Required Description typeconversation.item.retrieve Yes Event type. event_idstring No Optional client-generated event ID. item_idstring Yes The ID of the conversation item to retrieve.
response.create
Trigger a model response. The server streams back response events.
Field Type Required Description typeresponse.create Yes Event type. event_idstring No Optional client-generated event ID. responseResponseConfig No Override session defaults for this response.
{
"type" : "response.create" ,
"response" : {
"output_modalities" : [ "audio" , "text" ],
"instructions" : "Respond in a cheerful tone."
}
}
response.cancel
Cancel an in-progress response.
Field Type Required Description typeresponse.cancel Yes Event type. event_idstring No Optional client-generated event ID. response_idstring No Cancel a specific response by ID. If omitted, cancels the active response.
Append audio bytes to the input buffer.
Field Type Required Description typeinput_audio_buffer.append Yes Event type. event_idstring No Optional client-generated event ID. audiostring Yes Base64-encoded audio chunk (~100–200ms) matching the configured input format.
Commit the buffered audio as a user message.
Field Type Required typeinput_audio_buffer.commit Yes event_idstring No
Discard all audio in the input buffer.
Field Type Required typeinput_audio_buffer.clear Yes event_idstring No
output_audio_buffer.clear
Clear the server’s output audio buffer, stopping playback.
Field Type Required typeoutput_audio_buffer.clear Yes event_idstring No
Server Events
Events emitted by the server to the client.
session.created
Not currently supported. The session starts immediately with default configuration. Send a session.update to configure the session.
Field Type Description typesession.created Event type. event_idstring Server-generated event ID. sessionSession Current session configuration.
session.updated
Confirms a session.update was applied.
Field Type Description typesession.updated Event type. event_idstring Server-generated event ID. sessionSession Updated session configuration.
error
Indicates an error occurred.
Field Type Description typeerror Event type. event_idstring Server-generated event ID. error.typestring Error category. error.codestring Error code. error.messagestring Human-readable error description. error.paramstring Related parameter, if applicable. error.event_idstring The client event ID that caused the error, if applicable.
conversation.item.added
A new item was added to the conversation.
Field Type Description typeconversation.item.added Event type. event_idstring Server-generated event ID. previous_item_idstring The ID of the preceding conversation item, or null. itemConversationItem The item that was added.
conversation.item.done
An item finished being populated.
Field Type Description typeconversation.item.done Event type. event_idstring Server-generated event ID. previous_item_idstring The ID of the preceding conversation item, or null. itemConversationItem The completed item.
Other conversation item events
Event type Description conversation.item.retrievedResponse to conversation.item.retrieve. conversation.item.deletedAn item was deleted from the conversation. conversation.item.truncatedAn assistant audio item was truncated.
Streaming partial transcription for user audio.
Field Type Description typeconversation.item.input_audio_transcription.delta Event type. event_idstring Server-generated event ID. item_idstring The conversation item being transcribed. content_indexinteger Index of the content part being transcribed. deltastring Partial transcription text.
Final transcription for a user audio item.
Field Type Description typeconversation.item.input_audio_transcription.completed Event type. event_idstring Server-generated event ID. item_idstring The conversation item that was transcribed. content_indexinteger Index of the content part that was transcribed. transcriptstring Complete transcription text.
response.created
A new response was created. Contains the full response object in its initial state.
Field Type Description typeresponse.created Event type. event_idstring Server-generated event ID. response.idstring Response identifier. response.objectrealtime.response Object type. response.statusstring "in_progress".response.status_detailsobject | null Status details, if any. response.outputarray Output items (empty at creation). response.conversation_idstring Conversation this response belongs to. response.output_modalitiesstring[] "text", "audio", or both.response.max_output_tokensinteger | "inf" Token limit for this response. response.audioobject Audio output config echoed from session. response.usageobject | null Token usage (populated in response.done). response.metadataobject | null Response metadata.
response.done
The response finished. Contains the completed response object with final status and output.
Field Type Description typeresponse.done Event type. event_idstring Server-generated event ID. response.idstring Response identifier. response.objectrealtime.response Object type. response.statusstring "completed", "cancelled", or "failed".response.status_detailsobject Status details. type matches status. For cancelled: includes reason (e.g., "client_cancelled"). response.outputConversationItem []Completed output items with content. response.conversation_idstring Conversation this response belongs to. response.output_modalitiesstring[] "text", "audio", or both.response.max_output_tokensinteger | "inf" Token limit for this response. response.audioobject Audio output config. response.usageobject | null Token usage statistics. response.metadataobject | null Response metadata.
response.output_item.added
An output item was added to the response.
Field Type Description typeresponse.output_item.added Event type. event_idstring Server-generated event ID. response_idstring The response this item belongs to. output_indexinteger Index of the output item in the response. itemConversationItem The output item (initially empty content).
response.output_item.done
An output item finished.
Field Type Description typeresponse.output_item.done Event type. event_idstring Server-generated event ID. response_idstring The response this item belongs to. output_indexinteger Index of the output item in the response. itemConversationItem The completed output item with content.
response.content_part.added
A content part was added to an output item.
Field Type Description typeresponse.content_part.added Event type. event_idstring Server-generated event ID. response_idstring Response identifier. item_idstring Item identifier. output_indexinteger Index of the output item. content_indexinteger Index of the content part. partobject The content part. type: "audio" or "text". transcript: initially empty string.
response.content_part.done
A content part finished.
Field Type Description typeresponse.content_part.done Event type. event_idstring Server-generated event ID. response_idstring Response identifier. item_idstring Item identifier. output_indexinteger Index of the output item. content_indexinteger Index of the content part. partobject The completed content part with final transcript.
response.output_text.delta
Streaming text chunk from the model.
Field Type Description typeresponse.output_text.delta Event type. event_idstring Server-generated event ID. response_idstring Response identifier. item_idstring Item identifier. output_indexinteger Index of the output item. content_indexinteger Index of the content part. deltastring Text chunk.
response.output_text.done
Text output finished.
Field Type Description typeresponse.output_text.done Event type. event_idstring Server-generated event ID. response_idstring Response identifier. item_idstring Item identifier. output_indexinteger Index of the output item. content_indexinteger Index of the content part. textstring Complete text output.
response.output_audio_transcript.delta
Streaming transcript for generated audio.
Field Type Description typeresponse.output_audio_transcript.delta Event type. event_idstring Server-generated event ID. response_idstring Response identifier. item_idstring Item identifier. output_indexinteger Index of the output item. content_indexinteger Index of the content part. deltastring Transcript chunk.
response.output_audio_transcript.done
Final transcript for generated audio.
Field Type Description typeresponse.output_audio_transcript.done Event type. event_idstring Server-generated event ID. response_idstring Response identifier. item_idstring Item identifier. output_indexinteger Index of the output item. content_indexinteger Index of the content part. transcriptstring Complete transcript.
response.output_audio.done
Audio output for a content part finished.
Field Type Description typeresponse.output_audio.done Event type. event_idstring Server-generated event ID. response_idstring Response identifier. item_idstring Item identifier. output_indexinteger Index of the output item. content_indexinteger Index of the content part.
response.function_call_arguments.delta
Streaming function call arguments.
Field Type Description typeresponse.function_call_arguments.delta Event type. event_idstring Server-generated event ID. response_idstring Response identifier. item_idstring Item identifier. output_indexinteger Index of the output item. content_indexinteger Index of the content part. deltastring Arguments chunk (JSON string fragment).
response.function_call_arguments.done
Function call arguments finished.
Field Type Description typeresponse.function_call_arguments.done Event type. event_idstring Server-generated event ID. response_idstring Response identifier. item_idstring Item identifier. output_indexinteger Index of the output item. content_indexinteger Index of the content part. argumentsstring Complete function call arguments (JSON string).
Voice activity detected — user started speaking.
Field Type Description typeinput_audio_buffer.speech_started Event type. event_idstring Server-generated event ID. audio_start_msinteger Millisecond offset in the audio stream where speech was detected. item_idstring The conversation item ID associated with this speech segment.
Voice activity ended — user stopped speaking.
Field Type Description typeinput_audio_buffer.speech_stopped Event type. event_idstring Server-generated event ID. audio_end_msinteger Millisecond offset in the audio stream where speech ended. item_idstring The conversation item ID associated with this speech segment.
Buffered audio was committed as a conversation item.
Field Type Description typeinput_audio_buffer.committed Event type. event_idstring Server-generated event ID. previous_item_idstring The ID of the preceding conversation item, or null. item_idstring The new conversation item ID for the committed audio.
Other audio buffer events
Event type Description input_audio_buffer.clearedInput audio buffer was cleared. input_audio_buffer.timeout_triggeredAn idle timeout was triggered on the input buffer. output_audio_buffer.startedServer started sending output audio. output_audio_buffer.stoppedServer stopped sending output audio. output_audio_buffer.clearedOutput audio buffer was cleared.
rate_limits.updated
Reports current rate limit state.
Field Type Description typerate_limits.updated Event type. event_idstring Server-generated event ID.
Schemas
Session object
The session object configures model behavior, audio settings, tools, and more. It appears in signaling requests, session.update, session.created, and session.updated events.
Field Type Description objectrealtime.session Object type identifier (read-only). typerealtime Fixed value. idstring Server-assigned session ID (read-only). modelstring Model identifier. instructionsstring System instructions for the model. output_modalitiesstring[] Output types: "text", "audio", or both. temperaturenumber The sampling temperature used for response generation. max_output_tokensinteger | "inf" Maximum tokens per response (1–4096 or "inf"). audioAudioConfig Audio input/output settings. toolsTool []Function tools available to the model. tool_choicestring | ToolChoiceTarget "none", "auto", "required", or a specific tool target.truncationstring | object "auto", "disabled", or a retention_ratio config.tracingstring | object "auto" or a tracing config with workflow_name, group_id, metadata.includestring[] Optional data to include, e.g. "item.input_audio_transcription.logprobs". expires_atinteger Unix timestamp for session expiration (read-only).
AudioConfig
Field Type Description input.formatAudioFormat Input audio format. input.noise_reductionobject Noise reduction config. type: "near_field" or "far_field". input.transcriptionobject Transcription config. model: transcription model identifier (e.g., assemblyai/universal-streaming-multilingual). language: optional language code. prompt: optional transcription prompt. input.turn_detectionTurnDetection Turn detection config. output.formatAudioFormat Output audio format. output.voicestring Voice preset for audio output (e.g., Dennis). See the List Voices API or the Voice library page in the Inworld Portal for the full list of supported voices. output.modelstring The TTS model used for audio output. output.speednumber Playback speed (0.25–1.5).
Field Type Description typestring MIME type: "audio/pcm", "audio/pcmu", or "audio/pcma". rateinteger Sample rate in Hz. Currently 24000.
TurnDetection
Turn detection has two modes, selected by the type field.
Server VAD (type: "server_vad"):
Field Type Description typeserver_vad Mode selector. thresholdnumber VAD sensitivity (0–1). prefix_padding_msinteger Milliseconds of audio to include before speech onset. silence_duration_msinteger Silence duration (ms) before speech is considered ended. create_responseboolean Auto-trigger response.create after speech ends. interrupt_responseboolean Allow new speech to interrupt active responses. idle_timeout_msinteger Idle timeout in milliseconds.
Semantic VAD (type: "semantic_vad"):
Field Type Description typesemantic_vad Mode selector. eagernessstring "low", "medium", "high", or "auto".create_responseboolean Auto-trigger response.create after speech ends. interrupt_responseboolean Allow new speech to interrupt active responses.
ConversationItem
Field Type Required Description objectrealtime.item No Object type identifier (read-only, present in server responses). idstring No Item ID. typestring Yes Item type (e.g., "message", "function_call_result"). statusstring No Item status: "completed" or "in_progress" (read-only, present in server responses). rolestring No "system", "user", "assistant", or "tool".contentContentPart []No Array of content parts.
ContentPart
Field Type Required Description typestring Yes Content type (e.g., "input_text", "input_audio", "text", "audio"). textstring No Text content. audiostring No Base64-encoded audio. transcriptstring No Human-readable transcript accompanying audio.
ResponseConfig
Per-response overrides for session defaults.
Field Type Description conversationstring "auto" or a conversation ID.output_modalitiesstring[] "text", "audio", or both.instructionsstring Override instructions for this response. voicestring Override voice for this response. max_output_tokensinteger | "inf" Override max tokens. tool_choicestring | ToolChoiceTarget Override tool choice. toolsTool []Override available tools.
Field Type Required Description typefunction Yes Tool type. namestring Yes Function name. descriptionstring No What the function does. parametersobject No JSON Schema for function parameters.
Specifies tool choice behavior. The server always returns this as an object.
Field Type Required Description typestring Yes "auto", "none", "required", "function", or "mcp".namestring No Function name (when type is "function"). server_labelstring No MCP server label (when type is "mcp").