Realtime API (WebRTC) - Inworld AI Documentation

Overview

The Realtime WebRTC API has three parts:

Signaling endpoint — Used to negotiate the WebRTC connection via SDP offer/answer exchange.

POST https://api.inworld.ai/v1/realtime/calls

ICE servers — STUN/TURN server configurations for NAT traversal and reliable connectivity.

GET https://api.inworld.ai/v1/realtime/ice-servers

Data channel events — Once the WebRTC peer connection is established, all Realtime events flow over a data channel named oai-events.

Data Channel: oai-events

Examples

Get started quickly with these reference implementations: JavaScript and Python.

Authentication

Use your API key for authentication. See Authentication for details.

Authorization: Bearer <API_KEY>

Signaling Endpoints

Create Call

POST https://api.inworld.ai/v1/realtime/calls

Creates a WebRTC call by posting an SDP offer and optional session configuration. Returns the server’s SDP answer. Request body:

Field	Type	Required	Description
`sdp`	string	Yes	SDP offer generated by the client `RTCPeerConnection`.
`session`	Session	No	Initial session configuration.

Response body:

Field	Type	Description
`id`	string	Server-assigned call identifier.
`sdp`	string	SDP answer returned by the server.
`ice_servers`	object[]	Array of ICE server configurations (same schema as the Get ICE Servers response).

Example

Request

{
  "sdp": "v=0\r\no=- 4611731400430051336 2 IN IP4 127.0.0.1\r\n...",
  "session": {
    "model": "llama-3.3-70b-versatile",
    "instructions": "You are a helpful assistant.",
    "output_modalities": ["audio", "text"],
    "audio": {
      "input": {
        "transcription": {
          "model": "inworld/inworld-stt-1"
        },
        "turn_detection": {
          "type": "semantic_vad",
          "eagerness": "medium"
        }
      },
      "output": {
        "model": "inworld-tts-2",
        "voice": "Dennis",
        "speed": 1.0
      }
    }
  }
}

Response

{
  "id": "call_abc123",
  "sdp": "v=0\r\no=- 1234567890 2 IN IP4 0.0.0.0\r\n...",
  "ice_servers": [
    {
      "urls": ["stun:stun.l.google.com:19302"]
    },
    {
      "urls": ["turn:turn.example.com:3478"],
      "username": "<TURN_USERNAME>",
      "credential": "<TURN_CREDENTIAL>"
    }
  ]
}

Get ICE Servers

GET https://api.inworld.ai/v1/realtime/ice-servers

Returns STUN and TURN server configurations for WebRTC connectivity. Use these ICE servers when creating the RTCPeerConnection to ensure reliable connections across NATs and firewalls. Response body:

Field	Type	Description
`ice_servers`	object[]	Array of ICE server configurations.
`ice_servers[].urls`	string[]	STUN or TURN server URLs.
`ice_servers[].username`	string	TURN credential username (only for TURN servers).
`ice_servers[].credential`	string	TURN credential (only for TURN servers).

Example

Response

{
  "ice_servers": [
    {
      "urls": [
        "stun:stun.l.google.com:19302",
        "stun:stun1.l.google.com:19302"
      ]
    },
    {
      "urls": [
        "turn:34.41.153.85:3478",
        "turn:34.41.153.85:3479?transport=tcp"
      ],
      "username": "1772761055:6e3fa6ea-2ed0-4306-971e-aa5092cb3736",
      "credential": "6eBPxGW2nsktPFzjjbJSF5PK8ow="
    }
  ]
}

Data Channel Events

Once the WebRTC connection is established, events are exchanged as JSON messages over the oai-events data channel. The event protocol is the same as the Realtime WebSocket API.

Client Events

Events sent from the client to the server.

session.update

Update the session configuration. The server responds with a session.updated event.

Field	Type	Required	Description
`type`	session.update	Yes	Event type.
`event_id`	string	No	Optional client-generated event ID.
`session`	Session	Yes	Session configuration to apply.

Example

{
  "type": "session.update",
  "session": {
    "instructions": "You are a friendly voice assistant.",
    "audio": {
      "input": {
        "transcription": {
          "model": "inworld/inworld-stt-1"
        },
        "turn_detection": {
          "type": "semantic_vad",
          "eagerness": "medium",
          "create_response": true,
          "interrupt_response": true
        }
      },
      "output": {
        "model": "inworld-tts-2",
        "voice": "Dennis",
        "speed": 1.0
      }
    }
  }
}

conversation.item.create

Add a conversation item (message, function call result, etc.).

Field	Type	Required	Description
`type`	conversation.item.create	Yes	Event type.
`event_id`	string	No	Optional client-generated event ID.
`previous_item_id`	string	No	Insert after this item ID.
`item`	ConversationItem	Yes	The item to add.

Example

{
  "type": "conversation.item.create",
  "item": {
    "type": "message",
    "role": "user",
    "content": [
      { "type": "input_text", "text": "Hello, how are you?" }
    ]
  }
}

conversation.item.truncate

Truncate an assistant message’s audio.

Field	Type	Required	Description
`type`	conversation.item.truncate	Yes	Event type.
`event_id`	string	No	Optional client-generated event ID.
`item_id`	string	Yes	The ID of the assistant message item to truncate.
`content_index`	integer	Yes	Index of the content part to truncate.
`audio_end_ms`	integer	Yes	Millisecond offset to truncate the audio at.

conversation.item.delete

Delete a conversation item by ID.

Field	Type	Required	Description
`type`	conversation.item.delete	Yes	Event type.
`event_id`	string	No	Optional client-generated event ID.
`item_id`	string	Yes	The ID of the conversation item to delete.

conversation.item.retrieve

Retrieve a conversation item by ID.

Field	Type	Required	Description
`type`	conversation.item.retrieve	Yes	Event type.
`event_id`	string	No	Optional client-generated event ID.
`item_id`	string	Yes	The ID of the conversation item to retrieve.

response.create

Trigger a model response. The server streams back response events.

Field	Type	Required	Description
`type`	response.create	Yes	Event type.
`event_id`	string	No	Optional client-generated event ID.
`response`	ResponseConfig	No	Override session defaults for this response.

Example

{
  "type": "response.create",
  "response": {
    "output_modalities": ["audio", "text"],
    "instructions": "Respond in a cheerful tone."
  }
}

response.cancel

Cancel an in-progress response.

Field	Type	Required	Description
`type`	response.cancel	Yes	Event type.
`event_id`	string	No	Optional client-generated event ID.
`response_id`	string	No	Cancel a specific response by ID. If omitted, cancels the active response.

input_audio_buffer.append

Append audio bytes to the input buffer.

Field	Type	Required	Description
`type`	input_audio_buffer.append	Yes	Event type.
`event_id`	string	No	Optional client-generated event ID.
`audio`	string	Yes	Base64-encoded audio chunk (~100–200ms) matching the configured input format.

input_audio_buffer.commit

Commit the buffered audio as a user message.

Field	Type	Required
`type`	input_audio_buffer.commit	Yes
`event_id`	string	No

input_audio_buffer.clear

Discard all audio in the input buffer.

Field	Type	Required
`type`	input_audio_buffer.clear	Yes
`event_id`	string	No

output_audio_buffer.clear

Clear the server’s output audio buffer, stopping playback.

Field	Type	Required
`type`	output_audio_buffer.clear	Yes
`event_id`	string	No

Server Events

Events emitted by the server to the client.

session.created

Not currently supported. The session starts immediately with default configuration. Send a session.update to configure the session.

Field	Type	Description
`type`	session.created	Event type.
`event_id`	string	Server-generated event ID.
`session`	Session	Current session configuration.

session.updated

Confirms a session.update was applied.

Field	Type	Description
`type`	session.updated	Event type.
`event_id`	string	Server-generated event ID.
`session`	Session	Updated session configuration.

error

Indicates an error occurred.

Field	Type	Description
`type`	error	Event type.
`event_id`	string	Server-generated event ID.
`error.type`	string	Error category.
`error.code`	string	Error code.
`error.message`	string	Human-readable error description.
`error.param`	string	Related parameter, if applicable.
`error.event_id`	string	The client event ID that caused the error, if applicable.

conversation.item.added

A new item was added to the conversation.

Field	Type	Description
`type`	conversation.item.added	Event type.
`event_id`	string	Server-generated event ID.
`previous_item_id`	string	The ID of the preceding conversation item, or `null`.
`item`	ConversationItem	The item that was added.

conversation.item.done

An item finished being populated.

Field	Type	Description
`type`	conversation.item.done	Event type.
`event_id`	string	Server-generated event ID.
`previous_item_id`	string	The ID of the preceding conversation item, or `null`.
`item`	ConversationItem	The completed item.

Other conversation item events

Event type	Description
`conversation.item.retrieved`	Response to `conversation.item.retrieve`.
`conversation.item.deleted`	An item was deleted from the conversation.
`conversation.item.truncated`	An assistant audio item was truncated.

conversation.item.input_audio_transcription.delta

Streaming partial transcription for user audio.

Field	Type	Description
`type`	conversation.item.input_audio_transcription.delta	Event type.
`event_id`	string	Server-generated event ID.
`item_id`	string	The conversation item being transcribed.
`content_index`	integer	Index of the content part being transcribed.
`delta`	string	Partial transcription text.

conversation.item.input_audio_transcription.completed

Final transcription for a user audio item.

Field	Type	Description
`type`	conversation.item.input_audio_transcription.completed	Event type.
`event_id`	string	Server-generated event ID.
`item_id`	string	The conversation item that was transcribed.
`content_index`	integer	Index of the content part that was transcribed.
`transcript`	string	Complete transcription text.

response.created

A new response was created. Contains the full response object in its initial state.

Field	Type	Description
`type`	response.created	Event type.
`event_id`	string	Server-generated event ID.
`response.id`	string	Response identifier.
`response.object`	realtime.response	Object type.
`response.status`	string	`"in_progress"`.
`response.status_details`	object \| null	Status details, if any.
`response.output`	array	Output items (empty at creation).
`response.conversation_id`	string	Conversation this response belongs to.
`response.output_modalities`	string[]	`"text"`, `"audio"`, or both.
`response.max_output_tokens`	integer \| `"inf"`	Token limit for this response.
`response.audio`	object	Audio output config echoed from session.
`response.usage`	object \| null	Token usage (populated in `response.done`).
`response.metadata`	object \| null	Response metadata.

response.done

The response finished. Contains the completed response object with final status and output.

Field	Type	Description
`type`	response.done	Event type.
`event_id`	string	Server-generated event ID.
`response.id`	string	Response identifier.
`response.object`	realtime.response	Object type.
`response.status`	string	`"completed"`, `"cancelled"`, or `"failed"`.
`response.status_details`	object	Status details. `type` matches `status`. For cancelled: includes `reason` (e.g., `"client_cancelled"`).
`response.output`	ConversationItem[]	Completed output items with content.
`response.conversation_id`	string	Conversation this response belongs to.
`response.output_modalities`	string[]	`"text"`, `"audio"`, or both.
`response.max_output_tokens`	integer \| `"inf"`	Token limit for this response.
`response.audio`	object	Audio output config.
`response.usage`	object \| null	Token usage statistics.
`response.metadata`	object \| null	Response metadata.

response.output_item.added

An output item was added to the response.

Field	Type	Description
`type`	response.output_item.added	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	The response this item belongs to.
`output_index`	integer	Index of the output item in the response.
`item`	ConversationItem	The output item (initially empty content).

response.output_item.done

An output item finished.

Field	Type	Description
`type`	response.output_item.done	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	The response this item belongs to.
`output_index`	integer	Index of the output item in the response.
`item`	ConversationItem	The completed output item with content.

response.content_part.added

A content part was added to an output item.

Field	Type	Description
`type`	response.content_part.added	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	Response identifier.
`item_id`	string	Item identifier.
`output_index`	integer	Index of the output item.
`content_index`	integer	Index of the content part.
`part`	object	The content part. `type`: `"audio"` or `"text"`. `transcript`: initially empty string.

response.content_part.done

A content part finished.

Field	Type	Description
`type`	response.content_part.done	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	Response identifier.
`item_id`	string	Item identifier.
`output_index`	integer	Index of the output item.
`content_index`	integer	Index of the content part.
`part`	object	The completed content part with final `transcript`.

response.output_text.delta

Streaming text chunk from the model.

Field	Type	Description
`type`	response.output_text.delta	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	Response identifier.
`item_id`	string	Item identifier.
`output_index`	integer	Index of the output item.
`content_index`	integer	Index of the content part.
`delta`	string	Text chunk.

response.output_text.done

Text output finished.

Field	Type	Description
`type`	response.output_text.done	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	Response identifier.
`item_id`	string	Item identifier.
`output_index`	integer	Index of the output item.
`content_index`	integer	Index of the content part.
`text`	string	Complete text output.

response.output_audio_transcript.delta

Streaming transcript for generated audio.

Field	Type	Description
`type`	response.output_audio_transcript.delta	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	Response identifier.
`item_id`	string	Item identifier.
`output_index`	integer	Index of the output item.
`content_index`	integer	Index of the content part.
`delta`	string	Transcript chunk.

response.output_audio_transcript.done

Final transcript for generated audio.

Field	Type	Description
`type`	response.output_audio_transcript.done	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	Response identifier.
`item_id`	string	Item identifier.
`output_index`	integer	Index of the output item.
`content_index`	integer	Index of the content part.
`transcript`	string	Complete transcript.

response.output_audio.delta

Streaming audio alignment data for generated speech. Over WebRTC, audio travels on the RTP media track; this data-channel event carries only timestamp_info (the delta field is empty). Present only when providerData.tts.timestamp_type is set.

Field	Type	Description
`type`	response.output_audio.delta	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	Response identifier.
`item_id`	string	Item identifier.
`output_index`	integer	Index of the output item.
`content_index`	integer	Index of the content part.
`delta`	string	Always empty over WebRTC (audio is on the media track).
`timestamp_info`	object	TTS alignment data. Contains `word_alignment` or `character_alignment`. See TTS timestamps and alignment.

response.output_audio.done

Audio output for a content part finished.

Field	Type	Description
`type`	response.output_audio.done	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	Response identifier.
`item_id`	string	Item identifier.
`output_index`	integer	Index of the output item.
`content_index`	integer	Index of the content part.

response.function_call_arguments.delta

Streaming function call arguments.

Field	Type	Description
`type`	response.function_call_arguments.delta	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	Response identifier.
`item_id`	string	Item identifier.
`output_index`	integer	Index of the output item.
`content_index`	integer	Index of the content part.
`delta`	string	Arguments chunk (JSON string fragment).

response.function_call_arguments.done

Function call arguments finished.

Field	Type	Description
`type`	response.function_call_arguments.done	Event type.
`event_id`	string	Server-generated event ID.
`response_id`	string	Response identifier.
`item_id`	string	Item identifier.
`output_index`	integer	Index of the output item.
`content_index`	integer	Index of the content part.
`arguments`	string	Complete function call arguments (JSON string).

input_audio_buffer.speech_started

Voice activity detected — user started speaking.

Field	Type	Description
`type`	input_audio_buffer.speech_started	Event type.
`event_id`	string	Server-generated event ID.
`audio_start_ms`	integer	Millisecond offset in the audio stream where speech was detected.
`item_id`	string	The conversation item ID associated with this speech segment.

input_audio_buffer.speech_stopped

Voice activity ended — user stopped speaking.

Field	Type	Description
`type`	input_audio_buffer.speech_stopped	Event type.
`event_id`	string	Server-generated event ID.
`audio_end_ms`	integer	Millisecond offset in the audio stream where speech ended.
`item_id`	string	The conversation item ID associated with this speech segment.

input_audio_buffer.committed

Buffered audio was committed as a conversation item.

Field	Type	Description
`type`	input_audio_buffer.committed	Event type.
`event_id`	string	Server-generated event ID.
`previous_item_id`	string	The ID of the preceding conversation item, or `null`.
`item_id`	string	The new conversation item ID for the committed audio.

input_audio_buffer.timeout_triggered

Idle timeout fired on the input buffer (server VAD only — gated by turn_detection.idle_timeout_ms).

Field	Type	Description
`type`	input_audio_buffer.timeout_triggered	Event type.
`event_id`	string	Server-generated event ID.
`audio_start_ms`	integer	Audio buffer start offset (ms) at the time the idle timeout fired.
`audio_end_ms`	integer	Audio buffer end offset (ms) at the time the idle timeout fired.
`item_id`	string	Conversation item ID associated with the idle audio buffer.

input_audio_buffer.turn_suggestion

Server VAD smart-turn detector predicts an end-of-turn boundary. Use this signal to drive low-latency UI cues or to pre-warm a response without waiting for the final speech_stopped commit. May be followed by input_audio_buffer.turn_suggestion_revoked if the user resumes speaking.

Field	Type	Description
`type`	input_audio_buffer.turn_suggestion	Event type.
`event_id`	string	Server-generated event ID.
`item_id`	string	The conversation item ID associated with this utterance.
`utterance_index`	integer	Monotonic index of the utterance within the session. Pairs with the matching `turn_suggestion_revoked`.
`probability`	number	Smart-turn model end-of-turn probability (0.0–1.0).
`trailing_silence_ms`	number	Trailing silence at the time of inference, in milliseconds.
`audio_duration_ms`	number	Audio duration of the utterance at the time of inference, in milliseconds.
`inference_ms`	number	Smart-turn model inference latency, in milliseconds.

input_audio_buffer.turn_suggestion_revoked

Emitted when the user resumes speaking after a previous turn_suggestion. Pairs with the most recent turn_suggestion sharing the same utterance_index.

Field	Type	Description
`type`	input_audio_buffer.turn_suggestion_revoked	Event type.
`event_id`	string	Server-generated event ID.
`item_id`	string	The conversation item ID associated with this utterance.
`utterance_index`	integer	Index of the utterance whose previous `turn_suggestion` is being revoked.

Other audio buffer events

Event type	Description
`input_audio_buffer.cleared`	Input audio buffer was cleared.
`output_audio_buffer.started`	Server started sending output audio.
`output_audio_buffer.stopped`	Server stopped sending output audio.
`output_audio_buffer.cleared`	Output audio buffer was cleared.

response.backchannel.audio.delta

Streaming PCM audio chunk for a low-latency back-channel interjection (e.g. “uh-huh”, “right”) emitted while the user is mid-utterance. Out-of-band from the main response stream — use backchannel_id to group chunks belonging to the same interjection.

Field	Type	Description
`type`	response.backchannel.audio.delta	Event type.
`event_id`	string	Server-generated event ID.
`backchannel_id`	string	Synthetic ID grouping deltas + done for a single back-channel interjection. Use as the playback bucket key so chunks of one interjection don’t collide with the active response item.
`delta`	string	Base64-encoded audio chunk in the session’s configured `audio.output.format` (PCM16, `audio/pcmu`, or `audio/pcma`).

response.backchannel.audio.done

All audio for a back-channel interjection has been streamed. No teardown required — playback queues until exhausted.

Field	Type	Description
`type`	response.backchannel.audio.done	Event type.
`event_id`	string	Server-generated event ID.
`backchannel_id`	string	Identifies which back-channel interjection finished streaming.
`phrase`	string	The chosen back-channel utterance (e.g. `"uh-huh"`). Optional — omitted when the decider doesn’t surface the phrase to clients.

response.backchannel.skipped

An evaluation tick chose not to fire a back-channel. Useful for client-side telemetry; clients that don’t care can ignore this event.

Field	Type	Description
`type`	response.backchannel.skipped	Event type.
`event_id`	string	Server-generated event ID.
`reason`	string	Short machine-readable string describing why no back-channel was emitted on this evaluation tick (e.g. `min_gap_not_elapsed`, `deadline_missed`, `no_phrase`). Stable enough for telemetry.

rate_limits.updated

Reports current rate limit state.

Field	Type	Description
`type`	rate_limits.updated	Event type.
`event_id`	string	Server-generated event ID.

Schemas

Session object

The session object configures model behavior, audio settings, tools, and more. It appears in signaling requests, session.update, session.created, and session.updated events.

Field	Type	Description
`object`	realtime.session	Object type identifier (read-only).
`type`	realtime	Fixed value.
`id`	string	Server-assigned session ID (read-only).
`model`	string	Model identifier.
`instructions`	string	System instructions for the model.
`output_modalities`	string[]	Output types: `"text"`, `"audio"`, or both.
`temperature`	number	The sampling temperature used for response generation.
`max_output_tokens`	integer \| `"inf"`	Maximum tokens per response (1–4096 or `"inf"`).
`audio`	AudioConfig	Audio input/output settings.
`tools`	Tool[]	Function tools available to the model.
`tool_choice`	string \| ToolChoiceTarget	`"none"`, `"auto"`, `"required"`, or a specific tool target.
`truncation`	string \| object	`"auto"`, `"disabled"`, or a `retention_ratio` config.
`tracing`	string \| object	`"auto"` or a tracing config with `workflow_name`, `group_id`, `metadata`.
`include`	string[]	Optional data to include, e.g. `"item.input_audio_transcription.logprobs"`.
`text_generation_config`	object	Fine-grained LLM generation parameters including `reasoning` (`effort`, `maxTokens`, `exclude`). See text_generation_config.
`providerData`	object	Inworld extensions: `stt`, `tts`, `memory`, `backchannel`, `responsiveness`. See API Extensions.
`expires_at`	integer	Unix timestamp for session expiration (read-only).

AudioConfig

Field	Type	Description
`input.format`	AudioFormat	Input audio format.
`input.noise_reduction`	object	Noise reduction config. `type`: `"near_field"` or `"far_field"`.
`input.transcription`	object	Transcription config. `model`: transcription model identifier (e.g., `inworld/inworld-stt-1`, `assemblyai/u3-rt-pro`, `soniox/stt-rt-v4`). `language`: optional language code. `prompt`: optional transcription prompt.
`input.turn_detection`	TurnDetection	Turn detection config.
`output.format`	AudioFormat	Output audio format.
`output.voice`	string	Voice preset for audio output (e.g., `Dennis`). See the List Voices API or the Voice library page in the Inworld Portal for the full list of supported voices.
`output.model`	string	The TTS model used for audio output.
`output.speed`	number	Playback speed (0.25–1.5).

AudioFormat

Field	Type	Description
`type`	string	MIME type: `"audio/pcm"`, `"audio/pcmu"`, or `"audio/pcma"`.
`rate`	integer	Sample rate in Hz. Currently `24000`.

TurnDetection

Turn detection has two modes, selected by the type field. Server VAD (type: "server_vad"):

Field	Type	Description
`type`	server_vad	Mode selector.
`threshold`	number	VAD sensitivity (0–1).
`prefix_padding_ms`	integer	Milliseconds of audio to include before speech onset.
`silence_duration_ms`	integer	Silence duration (ms) before speech is considered ended.
`create_response`	boolean	Auto-trigger `response.create` after speech ends.
`interrupt_response`	boolean	Allow new speech to interrupt active responses.
`idle_timeout_ms`	integer	Idle timeout in milliseconds.

Semantic VAD (type: "semantic_vad"):

Field	Type	Description
`type`	semantic_vad	Mode selector.
`eagerness`	string	`"low"`, `"medium"`, `"high"`, or `"auto"`.
`create_response`	boolean	Auto-trigger `response.create` after speech ends.
`interrupt_response`	boolean	Allow new speech to interrupt active responses.

ConversationItem

Field	Type	Required	Description
`object`	realtime.item	No	Object type identifier (read-only, present in server responses).
`id`	string	No	Item ID.
`type`	string	Yes	Item type (e.g., `"message"`, `"function_call_result"`).
`status`	string	No	Item status: `"completed"` or `"in_progress"` (read-only, present in server responses).
`role`	string	No	`"system"`, `"user"`, `"assistant"`, or `"tool"`.
`content`	ContentPart[]	No	Array of content parts.

ContentPart

Field	Type	Required	Description
`type`	string	Yes	Content type (e.g., `"input_text"`, `"input_audio"`, `"text"`, `"audio"`).
`text`	string	No	Text content.
`audio`	string	No	Base64-encoded audio.
`transcript`	string	No	Human-readable transcript accompanying audio.

ResponseConfig

Per-response overrides for session defaults.

Field	Type	Description
`conversation`	string	`"auto"` or a conversation ID.
`output_modalities`	string[]	`"text"`, `"audio"`, or both.
`instructions`	string	Override instructions for this response.
`voice`	string	Override voice for this response.
`max_output_tokens`	integer \| `"inf"`	Override max tokens.
`tool_choice`	string \| ToolChoiceTarget	Override tool choice.
`tools`	Tool[]	Override available tools.

Tool

Field	Type	Required	Description
`type`	function	Yes	Tool type.
`name`	string	Yes	Function name.
`description`	string	No	What the function does.
`parameters`	object	No	JSON Schema for function parameters.

ToolChoiceTarget

Specifies tool choice behavior. The server always returns this as an object.

Field	Type	Required	Description
`type`	string	Yes	`"auto"`, `"none"`, `"required"`, `"function"`, or `"mcp"`.
`name`	string	No	Function name (when `type` is `"function"`).
`server_label`	string	No	MCP server label (when `type` is `"mcp"`).

​Overview

​Examples

​Authentication

​Signaling Endpoints

​Create Call

​Get ICE Servers

​Data Channel Events

​Client Events

​session.update

​conversation.item.create

​conversation.item.truncate

​conversation.item.delete

​conversation.item.retrieve

​response.create

​response.cancel

​input_audio_buffer.append

​input_audio_buffer.commit

​input_audio_buffer.clear

​output_audio_buffer.clear

​Server Events

​session.created

​session.updated

​error

​conversation.item.added

​conversation.item.done

​Other conversation item events

​conversation.item.input_audio_transcription.delta

​conversation.item.input_audio_transcription.completed

​response.created

​response.done

​response.output_item.added

​response.output_item.done

​response.content_part.added

​response.content_part.done

​response.output_text.delta

​response.output_text.done

​response.output_audio_transcript.delta

​response.output_audio_transcript.done

​response.output_audio.delta

​response.output_audio.done

​response.function_call_arguments.delta

​response.function_call_arguments.done

​input_audio_buffer.speech_started

​input_audio_buffer.speech_stopped

​input_audio_buffer.committed

​input_audio_buffer.timeout_triggered

​input_audio_buffer.turn_suggestion

​input_audio_buffer.turn_suggestion_revoked

​Other audio buffer events

​response.backchannel.audio.delta

​response.backchannel.audio.done

​response.backchannel.skipped

​rate_limits.updated

​Schemas

​Session object

​AudioConfig

​AudioFormat

​TurnDetection

​ConversationItem

​ContentPart

​ResponseConfig

​Tool

​ToolChoiceTarget

Overview

Examples

Authentication

Signaling Endpoints

Create Call

Get ICE Servers

Data Channel Events

Client Events

session.update

conversation.item.create

conversation.item.truncate

conversation.item.delete

conversation.item.retrieve

response.create

response.cancel

input_audio_buffer.append

input_audio_buffer.commit

input_audio_buffer.clear

output_audio_buffer.clear

Server Events

session.created

session.updated

error

conversation.item.added

conversation.item.done

Other conversation item events

conversation.item.input_audio_transcription.delta

conversation.item.input_audio_transcription.completed

response.created

response.done

response.output_item.added

response.output_item.done

response.content_part.added

response.content_part.done

response.output_text.delta

response.output_text.done

response.output_audio_transcript.delta

response.output_audio_transcript.done

response.output_audio.delta

response.output_audio.done

response.function_call_arguments.delta

response.function_call_arguments.done

input_audio_buffer.speech_started

input_audio_buffer.speech_stopped

input_audio_buffer.committed

input_audio_buffer.timeout_triggered

input_audio_buffer.turn_suggestion

input_audio_buffer.turn_suggestion_revoked

Other audio buffer events

response.backchannel.audio.delta

response.backchannel.audio.done

response.backchannel.skipped

rate_limits.updated

Schemas

Session object

AudioConfig

AudioFormat

TurnDetection

ConversationItem

ContentPart

ResponseConfig

Tool

ToolChoiceTarget