Documentation Index
Fetch the complete documentation index at: https://dev.docs.inworld.ai/llms.txt
Use this file to discover all available pages before exploring further.
Conversation Items
Conversation items represent messages and interactions in your conversation. Each item has:
- ID: Unique identifier
- Type:
message, function_call, function_call_output
- Role:
user, assistant, or tool
- Content: The actual content of the item (array of content parts)
Content Types
Conversation items support different content types depending on direction:
Input Content Types (for user messages):
input_text - Plain text input from the user
input_audio - Base64-encoded audio input from the user
Output Content Types (for assistant responses):
text - Text output from the assistant
audio - Audio output from the assistant
You can mix multiple content parts in a single conversation item. For example, you can combine text and audio in the same message.
Creating Conversation Items
Text Messages
ws.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [
{
type: 'input_text',
text: 'Hello, how are you?'
}
]
}
}));
Audio Messages
There are two ways to send audio input:
Method 1: Streaming Audio (Real-time)
Use input_audio_buffer.append for streaming real-time audio from a microphone:
// Stream audio chunks in real-time
ws.send(JSON.stringify({
type: 'input_audio_buffer.append',
audio: base64AudioData
}));
// VAD automatically detects speech boundaries and commits the buffer
Method 2: Pre-recorded Audio Chunks
Use conversation.item.create with input_audio for pre-recorded audio chunks:
ws.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{
type: 'input_audio',
audio: base64AudioData // Base64-encoded PCM16 or OPUS audio
}]
}
}));
When to use each method:
- Streaming (
input_audio_buffer.append): Use for real-time microphone input, voice conversations, live audio streaming
- Pre-recorded (
conversation.item.create with input_audio): Use for pre-recorded audio files, batch processing, or when you have complete audio chunks ready
Mixed Content
You can combine multiple content types in a single conversation item:
ws.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [
{
type: 'input_text',
text: 'Here is some context about the audio:'
},
{
type: 'input_audio',
audio: base64AudioData
},
{
type: 'input_text',
text: 'And here is additional context.'
}
]
}
}));
Receiving Conversation Items
When items are added to the conversation, you’ll receive events:
ws.on('message', (data) => {
const event = JSON.parse(data);
if (event.type === 'conversation.item.added') {
console.log('Item added:', event.item.id);
console.log('Content:', event.item.content);
}
if (event.type === 'conversation.item.done') {
console.log('Item processing complete:', event.item.id);
}
});
Retrieving Conversation Items
Retrieve specific conversation items:
ws.send(JSON.stringify({
type: 'conversation.item.retrieve',
item_id: 'item-id-here'
}));
The server will respond with:
{
type: 'conversation.item.retrieved',
item: {
id: 'item-id-here',
type: 'message',
role: 'user',
content: [...]
}
}
Deleting Conversation Items
Remove items from the conversation:
ws.send(JSON.stringify({
type: 'conversation.item.delete',
item_id: 'item-id-here'
}));
You’ll receive a confirmation:
{
type: 'conversation.item.deleted',
item_id: 'item-id-here'
}
Function Calling
The Realtime API supports function calling, allowing the assistant to invoke tools you define. Configure functions in session.update and handle function call events.
Defining Functions
ws.send(JSON.stringify({
type: 'session.update',
session: {
type: 'realtime',
tools: [{
type: 'function',
name: 'get_weather',
description: 'Get the weather for a location',
parameters: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'The city and state, e.g. San Francisco, CA'
}
},
required: ['location']
}
}],
tool_choice: 'auto'
}
}));
Handling Function Calls
ws.on('message', (data) => {
const event = JSON.parse(data);
if (event.type === 'response.function_call_arguments.done') {
const result = executeFunction(event.name, JSON.parse(event.arguments));
ws.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'function_call_output',
call_id: event.call_id,
output: JSON.stringify(result)
}
}));
ws.send(JSON.stringify({
type: 'response.create'
}));
}
});
Voice Activity Detection
Voice Activity Detection (VAD) automatically detects when speech starts and stops, enabling natural turn-taking in conversations. Configure VAD through session.update.
Configuring VAD
ws.send(JSON.stringify({
type: 'session.update',
session: {
type: 'realtime',
audio: {
input: {
turn_detection: {
type: 'semantic_vad',
eagerness: 'medium',
create_response: true,
interrupt_response: true
}
}
}
}
}));
VAD Types
semantic_vad: Uses conversational awareness to detect natural speech boundaries. Adjust eagerness (low, medium, high) to control responsiveness.
VAD Events
ws.on('message', (data) => {
const event = JSON.parse(data);
if (event.type === 'input_audio_buffer.speech_started') {
console.log('Speech detected');
// Update UI to show user is speaking
}
if (event.type === 'input_audio_buffer.speech_stopped') {
console.log('Speech ended');
// Update UI, prepare for response
}
});
Error Handling
The Realtime API emits error events for various failure scenarios. Handle these events to provide robust error recovery and user feedback.
Error Event Structure
ws.on('message', (data) => {
const event = JSON.parse(data);
if (event.type === 'error') {
const error = event.error;
switch (error.type) {
case 'invalid_request_error':
console.error('Invalid request:', error.message);
if (error.param) {
console.error('Parameter:', error.param);
}
break;
case 'server_error':
console.error('Server error:', error.message);
// Implement retry logic
break;
case 'rate_limit_error':
console.error('Rate limit exceeded');
// Pause requests, implement backoff
break;
}
}
});
Error Types
invalid_request_error: Invalid parameters or malformed requests. Check error.param for the specific field.
server_error: Transient server-side failures. Implement retry logic with exponential backoff.
rate_limit_error: Rate limit exceeded. Throttle requests and retry with exponential backoff.
Interruption Handling
Interrupt active responses when new user input arrives.
Interrupting Responses
Cancel an in-progress response when the user starts speaking again:
ws.on('message', (data) => {
const event = JSON.parse(data);
if (event.type === 'input_audio_buffer.speech_started') {
// User started speaking, cancel current response
ws.send(JSON.stringify({
type: 'response.cancel'
}));
}
});
When interrupt_response: true is set in VAD configuration, the server automatically cancels responses when new speech is detected.
Managing Context
Session Instructions
Update session instructions to guide the conversation:
ws.send(JSON.stringify({
type: 'session.update',
session: {
type: 'realtime',
instructions: 'You are a helpful assistant. Be concise and friendly.'
}
}));
Conversation History
The API automatically maintains conversation history. You can:
- Keep full history: Let the conversation grow naturally
- Selective deletion: Remove specific items that aren’t needed
- Session resets: Start a new session when you need a clean context window
Example: Conversation Manager
Here’s a complete example of managing conversations:
class ConversationManager {
constructor(ws) {
this.ws = ws;
this.items = new Map();
this.setupListeners();
}
setupListeners() {
this.ws.on('message', (data) => {
const event = JSON.parse(data);
switch (event.type) {
case 'conversation.item.added':
this.items.set(event.item.id, event.item);
break;
case 'conversation.item.deleted':
this.items.delete(event.item_id);
break;
}
});
}
sendMessage(text) {
this.ws.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{
type: 'input_text',
text: text
}]
}
}));
}
deleteItem(itemId) {
this.ws.send(JSON.stringify({
type: 'conversation.item.delete',
item_id: itemId
}));
}
getConversationHistory() {
return Array.from(this.items.values());
}
}
Best Practices
- Monitor Context Length: Keep track of conversation length to avoid exceeding limits
- Strategic Deletion: Remove old context that’s no longer relevant
- Item Tracking: Maintain a local map of conversation items for quick access
- Error Handling: Handle cases where items might not exist when deleting/retrieving
- Context Management: Use session instructions to guide conversation behavior
Use Cases
- Long Conversations: Delete old context to maintain performance
- Error Recovery: Delete incorrect items and resend
- Context Switching: Clear conversation context when changing topics
- Memory Management: Remove items that are no longer needed