Skip to main content

Overview

The Realtime WebRTC API has three parts:
  1. Signaling endpoint — Used to negotiate the WebRTC connection via SDP offer/answer exchange.
POST https://api.inworld.ai/v1/realtime/calls
  1. ICE servers — STUN/TURN server configurations for NAT traversal and reliable connectivity.
GET https://api.inworld.ai/v1/realtime/ice-servers
  1. Data channel events — Once the WebRTC peer connection is established, all Realtime events flow over a data channel named oai-events.
Data Channel: oai-events

Examples

Get started quickly with these reference implementations: JavaScript and Python.

Authentication

Use your API key for authentication. See Authentication for details.
Authorization: Bearer <API_KEY>

Signaling Endpoints

Create Call

Creates a WebRTC call by posting an SDP offer and optional session configuration. Returns the server’s SDP answer. Request body:
FieldTypeRequiredDescription
sdpstringYesSDP offer generated by the client RTCPeerConnection.
sessionSessionNoInitial session configuration.
Response body:
FieldTypeDescription
idstringServer-assigned call identifier.
sdpstringSDP answer returned by the server.
ice_serversobject[]Array of ICE server configurations (same schema as the Get ICE Servers response).
Request
{
  "sdp": "v=0\r\no=- 4611731400430051336 2 IN IP4 127.0.0.1\r\n...",
  "session": {
    "model": "llama-3.3-70b-versatile",
    "instructions": "You are a helpful assistant.",
    "output_modalities": ["audio", "text"],
    "audio": {
      "input": {
        "transcription": {
          "model": "assemblyai/universal-streaming-multilingual"
        },
        "turn_detection": {
          "type": "semantic_vad",
          "eagerness": "medium"
        }
      },
      "output": {
        "voice": "Dennis",
        "speed": 1.0
      }
    }
  }
}
Response
{
  "id": "call_abc123",
  "sdp": "v=0\r\no=- 1234567890 2 IN IP4 0.0.0.0\r\n...",
  "ice_servers": [
    {
      "urls": ["stun:stun.l.google.com:19302"]
    },
    {
      "urls": ["turn:turn.example.com:3478"],
      "username": "<TURN_USERNAME>",
      "credential": "<TURN_CREDENTIAL>"
    }
  ]
}

Get ICE Servers

Returns STUN and TURN server configurations for WebRTC connectivity. Use these ICE servers when creating the RTCPeerConnection to ensure reliable connections across NATs and firewalls. Response body:
FieldTypeDescription
ice_serversobject[]Array of ICE server configurations.
ice_servers[].urlsstring[]STUN or TURN server URLs.
ice_servers[].usernamestringTURN credential username (only for TURN servers).
ice_servers[].credentialstringTURN credential (only for TURN servers).
Response
{
  "ice_servers": [
    {
      "urls": [
        "stun:stun.l.google.com:19302",
        "stun:stun1.l.google.com:19302"
      ]
    },
    {
      "urls": [
        "turn:34.41.153.85:3478",
        "turn:34.41.153.85:3479?transport=tcp"
      ],
      "username": "1772761055:6e3fa6ea-2ed0-4306-971e-aa5092cb3736",
      "credential": "6eBPxGW2nsktPFzjjbJSF5PK8ow="
    }
  ]
}

Data Channel Events

Once the WebRTC connection is established, events are exchanged as JSON messages over the oai-events data channel. The event protocol is the same as the Realtime WebSocket API.

Client Events

Events sent from the client to the server.

session.update

Update the session configuration. The server responds with a session.updated event.
FieldTypeRequiredDescription
typesession.updateYesEvent type.
event_idstringNoOptional client-generated event ID.
sessionSessionYesSession configuration to apply.
{
  "type": "session.update",
  "session": {
    "instructions": "You are a friendly voice assistant.",
    "audio": {
      "input": {
        "transcription": {
          "model": "assemblyai/universal-streaming-multilingual"
        },
        "turn_detection": {
          "type": "semantic_vad",
          "eagerness": "medium",
          "create_response": true,
          "interrupt_response": true
        }
      },
      "output": {
        "voice": "Dennis",
        "speed": 1.0
      }
    }
  }
}

conversation.item.create

Add a conversation item (message, function call result, etc.).
FieldTypeRequiredDescription
typeconversation.item.createYesEvent type.
event_idstringNoOptional client-generated event ID.
previous_item_idstringNoInsert after this item ID.
itemConversationItemYesThe item to add.
{
  "type": "conversation.item.create",
  "item": {
    "type": "message",
    "role": "user",
    "content": [
      { "type": "input_text", "text": "Hello, how are you?" }
    ]
  }
}

conversation.item.truncate

Truncate an assistant message’s audio.
FieldTypeRequiredDescription
typeconversation.item.truncateYesEvent type.
event_idstringNoOptional client-generated event ID.
item_idstringYesThe ID of the assistant message item to truncate.
content_indexintegerYesIndex of the content part to truncate.
audio_end_msintegerYesMillisecond offset to truncate the audio at.

conversation.item.delete

Delete a conversation item by ID.
FieldTypeRequiredDescription
typeconversation.item.deleteYesEvent type.
event_idstringNoOptional client-generated event ID.
item_idstringYesThe ID of the conversation item to delete.

conversation.item.retrieve

Retrieve a conversation item by ID.
FieldTypeRequiredDescription
typeconversation.item.retrieveYesEvent type.
event_idstringNoOptional client-generated event ID.
item_idstringYesThe ID of the conversation item to retrieve.

response.create

Trigger a model response. The server streams back response events.
FieldTypeRequiredDescription
typeresponse.createYesEvent type.
event_idstringNoOptional client-generated event ID.
responseResponseConfigNoOverride session defaults for this response.
{
  "type": "response.create",
  "response": {
    "output_modalities": ["audio", "text"],
    "instructions": "Respond in a cheerful tone."
  }
}

response.cancel

Cancel an in-progress response.
FieldTypeRequiredDescription
typeresponse.cancelYesEvent type.
event_idstringNoOptional client-generated event ID.
response_idstringNoCancel a specific response by ID. If omitted, cancels the active response.

input_audio_buffer.append

Append audio bytes to the input buffer.
FieldTypeRequiredDescription
typeinput_audio_buffer.appendYesEvent type.
event_idstringNoOptional client-generated event ID.
audiostringYesBase64-encoded audio chunk (~100–200ms) matching the configured input format.

input_audio_buffer.commit

Commit the buffered audio as a user message.
FieldTypeRequired
typeinput_audio_buffer.commitYes
event_idstringNo

input_audio_buffer.clear

Discard all audio in the input buffer.
FieldTypeRequired
typeinput_audio_buffer.clearYes
event_idstringNo

output_audio_buffer.clear

Clear the server’s output audio buffer, stopping playback.
FieldTypeRequired
typeoutput_audio_buffer.clearYes
event_idstringNo

Server Events

Events emitted by the server to the client.

session.created

Not currently supported. The session starts immediately with default configuration. Send a session.update to configure the session.
FieldTypeDescription
typesession.createdEvent type.
event_idstringServer-generated event ID.
sessionSessionCurrent session configuration.

session.updated

Confirms a session.update was applied.
FieldTypeDescription
typesession.updatedEvent type.
event_idstringServer-generated event ID.
sessionSessionUpdated session configuration.

error

Indicates an error occurred.
FieldTypeDescription
typeerrorEvent type.
event_idstringServer-generated event ID.
error.typestringError category.
error.codestringError code.
error.messagestringHuman-readable error description.
error.paramstringRelated parameter, if applicable.
error.event_idstringThe client event ID that caused the error, if applicable.

conversation.item.added

A new item was added to the conversation.
FieldTypeDescription
typeconversation.item.addedEvent type.
event_idstringServer-generated event ID.
previous_item_idstringThe ID of the preceding conversation item, or null.
itemConversationItemThe item that was added.

conversation.item.done

An item finished being populated.
FieldTypeDescription
typeconversation.item.doneEvent type.
event_idstringServer-generated event ID.
previous_item_idstringThe ID of the preceding conversation item, or null.
itemConversationItemThe completed item.

Other conversation item events

Event typeDescription
conversation.item.retrievedResponse to conversation.item.retrieve.
conversation.item.deletedAn item was deleted from the conversation.
conversation.item.truncatedAn assistant audio item was truncated.

conversation.item.input_audio_transcription.delta

Streaming partial transcription for user audio.
FieldTypeDescription
typeconversation.item.input_audio_transcription.deltaEvent type.
event_idstringServer-generated event ID.
item_idstringThe conversation item being transcribed.
content_indexintegerIndex of the content part being transcribed.
deltastringPartial transcription text.

conversation.item.input_audio_transcription.completed

Final transcription for a user audio item.
FieldTypeDescription
typeconversation.item.input_audio_transcription.completedEvent type.
event_idstringServer-generated event ID.
item_idstringThe conversation item that was transcribed.
content_indexintegerIndex of the content part that was transcribed.
transcriptstringComplete transcription text.

response.created

A new response was created. Contains the full response object in its initial state.
FieldTypeDescription
typeresponse.createdEvent type.
event_idstringServer-generated event ID.
response.idstringResponse identifier.
response.objectrealtime.responseObject type.
response.statusstring"in_progress".
response.status_detailsobject | nullStatus details, if any.
response.outputarrayOutput items (empty at creation).
response.conversation_idstringConversation this response belongs to.
response.output_modalitiesstring[]"text", "audio", or both.
response.max_output_tokensinteger | "inf"Token limit for this response.
response.audioobjectAudio output config echoed from session.
response.usageobject | nullToken usage (populated in response.done).
response.metadataobject | nullResponse metadata.

response.done

The response finished. Contains the completed response object with final status and output.
FieldTypeDescription
typeresponse.doneEvent type.
event_idstringServer-generated event ID.
response.idstringResponse identifier.
response.objectrealtime.responseObject type.
response.statusstring"completed", "cancelled", or "failed".
response.status_detailsobjectStatus details. type matches status. For cancelled: includes reason (e.g., "client_cancelled").
response.outputConversationItem[]Completed output items with content.
response.conversation_idstringConversation this response belongs to.
response.output_modalitiesstring[]"text", "audio", or both.
response.max_output_tokensinteger | "inf"Token limit for this response.
response.audioobjectAudio output config.
response.usageobject | nullToken usage statistics.
response.metadataobject | nullResponse metadata.

response.output_item.added

An output item was added to the response.
FieldTypeDescription
typeresponse.output_item.addedEvent type.
event_idstringServer-generated event ID.
response_idstringThe response this item belongs to.
output_indexintegerIndex of the output item in the response.
itemConversationItemThe output item (initially empty content).

response.output_item.done

An output item finished.
FieldTypeDescription
typeresponse.output_item.doneEvent type.
event_idstringServer-generated event ID.
response_idstringThe response this item belongs to.
output_indexintegerIndex of the output item in the response.
itemConversationItemThe completed output item with content.

response.content_part.added

A content part was added to an output item.
FieldTypeDescription
typeresponse.content_part.addedEvent type.
event_idstringServer-generated event ID.
response_idstringResponse identifier.
item_idstringItem identifier.
output_indexintegerIndex of the output item.
content_indexintegerIndex of the content part.
partobjectThe content part. type: "audio" or "text". transcript: initially empty string.

response.content_part.done

A content part finished.
FieldTypeDescription
typeresponse.content_part.doneEvent type.
event_idstringServer-generated event ID.
response_idstringResponse identifier.
item_idstringItem identifier.
output_indexintegerIndex of the output item.
content_indexintegerIndex of the content part.
partobjectThe completed content part with final transcript.

response.output_text.delta

Streaming text chunk from the model.
FieldTypeDescription
typeresponse.output_text.deltaEvent type.
event_idstringServer-generated event ID.
response_idstringResponse identifier.
item_idstringItem identifier.
output_indexintegerIndex of the output item.
content_indexintegerIndex of the content part.
deltastringText chunk.

response.output_text.done

Text output finished.
FieldTypeDescription
typeresponse.output_text.doneEvent type.
event_idstringServer-generated event ID.
response_idstringResponse identifier.
item_idstringItem identifier.
output_indexintegerIndex of the output item.
content_indexintegerIndex of the content part.
textstringComplete text output.

response.output_audio_transcript.delta

Streaming transcript for generated audio.
FieldTypeDescription
typeresponse.output_audio_transcript.deltaEvent type.
event_idstringServer-generated event ID.
response_idstringResponse identifier.
item_idstringItem identifier.
output_indexintegerIndex of the output item.
content_indexintegerIndex of the content part.
deltastringTranscript chunk.

response.output_audio_transcript.done

Final transcript for generated audio.
FieldTypeDescription
typeresponse.output_audio_transcript.doneEvent type.
event_idstringServer-generated event ID.
response_idstringResponse identifier.
item_idstringItem identifier.
output_indexintegerIndex of the output item.
content_indexintegerIndex of the content part.
transcriptstringComplete transcript.

response.output_audio.done

Audio output for a content part finished.
FieldTypeDescription
typeresponse.output_audio.doneEvent type.
event_idstringServer-generated event ID.
response_idstringResponse identifier.
item_idstringItem identifier.
output_indexintegerIndex of the output item.
content_indexintegerIndex of the content part.

response.function_call_arguments.delta

Streaming function call arguments.
FieldTypeDescription
typeresponse.function_call_arguments.deltaEvent type.
event_idstringServer-generated event ID.
response_idstringResponse identifier.
item_idstringItem identifier.
output_indexintegerIndex of the output item.
content_indexintegerIndex of the content part.
deltastringArguments chunk (JSON string fragment).

response.function_call_arguments.done

Function call arguments finished.
FieldTypeDescription
typeresponse.function_call_arguments.doneEvent type.
event_idstringServer-generated event ID.
response_idstringResponse identifier.
item_idstringItem identifier.
output_indexintegerIndex of the output item.
content_indexintegerIndex of the content part.
argumentsstringComplete function call arguments (JSON string).

input_audio_buffer.speech_started

Voice activity detected — user started speaking.
FieldTypeDescription
typeinput_audio_buffer.speech_startedEvent type.
event_idstringServer-generated event ID.
audio_start_msintegerMillisecond offset in the audio stream where speech was detected.
item_idstringThe conversation item ID associated with this speech segment.

input_audio_buffer.speech_stopped

Voice activity ended — user stopped speaking.
FieldTypeDescription
typeinput_audio_buffer.speech_stoppedEvent type.
event_idstringServer-generated event ID.
audio_end_msintegerMillisecond offset in the audio stream where speech ended.
item_idstringThe conversation item ID associated with this speech segment.

input_audio_buffer.committed

Buffered audio was committed as a conversation item.
FieldTypeDescription
typeinput_audio_buffer.committedEvent type.
event_idstringServer-generated event ID.
previous_item_idstringThe ID of the preceding conversation item, or null.
item_idstringThe new conversation item ID for the committed audio.

Other audio buffer events

Event typeDescription
input_audio_buffer.clearedInput audio buffer was cleared.
input_audio_buffer.timeout_triggeredAn idle timeout was triggered on the input buffer.
output_audio_buffer.startedServer started sending output audio.
output_audio_buffer.stoppedServer stopped sending output audio.
output_audio_buffer.clearedOutput audio buffer was cleared.

rate_limits.updated

Reports current rate limit state.
FieldTypeDescription
typerate_limits.updatedEvent type.
event_idstringServer-generated event ID.

Schemas

Session object

The session object configures model behavior, audio settings, tools, and more. It appears in signaling requests, session.update, session.created, and session.updated events.
FieldTypeDescription
objectrealtime.sessionObject type identifier (read-only).
typerealtimeFixed value.
idstringServer-assigned session ID (read-only).
modelstringModel identifier.
instructionsstringSystem instructions for the model.
output_modalitiesstring[]Output types: "text", "audio", or both.
temperaturenumberThe sampling temperature used for response generation.
max_output_tokensinteger | "inf"Maximum tokens per response (1–4096 or "inf").
audioAudioConfigAudio input/output settings.
toolsTool[]Function tools available to the model.
tool_choicestring | ToolChoiceTarget"none", "auto", "required", or a specific tool target.
truncationstring | object"auto", "disabled", or a retention_ratio config.
tracingstring | object"auto" or a tracing config with workflow_name, group_id, metadata.
includestring[]Optional data to include, e.g. "item.input_audio_transcription.logprobs".
expires_atintegerUnix timestamp for session expiration (read-only).

AudioConfig

FieldTypeDescription
input.formatAudioFormatInput audio format.
input.noise_reductionobjectNoise reduction config. type: "near_field" or "far_field".
input.transcriptionobjectTranscription config. model: transcription model identifier (e.g., assemblyai/universal-streaming-multilingual). language: optional language code. prompt: optional transcription prompt.
input.turn_detectionTurnDetectionTurn detection config.
output.formatAudioFormatOutput audio format.
output.voicestringVoice preset for audio output (e.g., Dennis). See the List Voices API or the Voice library page in the Inworld Portal for the full list of supported voices.
output.modelstringThe TTS model used for audio output.
output.speednumberPlayback speed (0.25–1.5).

AudioFormat

FieldTypeDescription
typestringMIME type: "audio/pcm", "audio/pcmu", or "audio/pcma".
rateintegerSample rate in Hz. Currently 24000.

TurnDetection

Turn detection has two modes, selected by the type field. Server VAD (type: "server_vad"):
FieldTypeDescription
typeserver_vadMode selector.
thresholdnumberVAD sensitivity (0–1).
prefix_padding_msintegerMilliseconds of audio to include before speech onset.
silence_duration_msintegerSilence duration (ms) before speech is considered ended.
create_responsebooleanAuto-trigger response.create after speech ends.
interrupt_responsebooleanAllow new speech to interrupt active responses.
idle_timeout_msintegerIdle timeout in milliseconds.
Semantic VAD (type: "semantic_vad"):
FieldTypeDescription
typesemantic_vadMode selector.
eagernessstring"low", "medium", "high", or "auto".
create_responsebooleanAuto-trigger response.create after speech ends.
interrupt_responsebooleanAllow new speech to interrupt active responses.

ConversationItem

FieldTypeRequiredDescription
objectrealtime.itemNoObject type identifier (read-only, present in server responses).
idstringNoItem ID.
typestringYesItem type (e.g., "message", "function_call_result").
statusstringNoItem status: "completed" or "in_progress" (read-only, present in server responses).
rolestringNo"system", "user", "assistant", or "tool".
contentContentPart[]NoArray of content parts.

ContentPart

FieldTypeRequiredDescription
typestringYesContent type (e.g., "input_text", "input_audio", "text", "audio").
textstringNoText content.
audiostringNoBase64-encoded audio.
transcriptstringNoHuman-readable transcript accompanying audio.

ResponseConfig

Per-response overrides for session defaults.
FieldTypeDescription
conversationstring"auto" or a conversation ID.
output_modalitiesstring[]"text", "audio", or both.
instructionsstringOverride instructions for this response.
voicestringOverride voice for this response.
max_output_tokensinteger | "inf"Override max tokens.
tool_choicestring | ToolChoiceTargetOverride tool choice.
toolsTool[]Override available tools.

Tool

FieldTypeRequiredDescription
typefunctionYesTool type.
namestringYesFunction name.
descriptionstringNoWhat the function does.
parametersobjectNoJSON Schema for function parameters.

ToolChoiceTarget

Specifies tool choice behavior. The server always returns this as an object.
FieldTypeRequiredDescription
typestringYes"auto", "none", "required", "function", or "mcp".
namestringNoFunction name (when type is "function").
server_labelstringNoMCP server label (when type is "mcp").