Realtime TTS 2.0 is available now in research preview! Learn more
{
"transcribe_config": {
"modelId": "assemblyai/universal-streaming-multilingual",
"audioEncoding": "LINEAR16",
"sampleRateHertz": 16000,
"language": "en-US"
}
}{
"audio_chunk": {
"content": "<YOUR_AUDIO>"
}
}{
"end_turn": {}
}{
"close_stream": {}
}{
"result": {
"transcription": {
"transcript": "Hello, this is a test transcription.",
"isFinal": true,
"wordTimestamps": []
}
}
}No examples found{
"result": {
"speechStarted": {
"startTimeMs": 1250,
"confidence": 0.95
}
}
}Bidirectional streaming API for real-time speech-to-text transcription over WebSocket.
This method listens for streaming audio input and returns recognized text chunks one by one as soon as they are ready. Audio chunks are expected to be a part of a single voice input. Suitable for streaming live conversations, microphone input, or other streaming audio sources.
To use the API:
transcribe_config message first to configure the session (model, language, audio encoding, etc.).audio_chunk messages containing raw audio bytes.transcription results as they become available, including both interim (partial) and final results.end_turn to signal end of a speaker’s turn.close_stream when done.Documentation Index
Fetch the complete documentation index at: https://dev.docs.inworld.ai/llms.txt
Use this file to discover all available pages before exploring further.
{
"transcribe_config": {
"modelId": "assemblyai/universal-streaming-multilingual",
"audioEncoding": "LINEAR16",
"sampleRateHertz": 16000,
"language": "en-US"
}
}{
"audio_chunk": {
"content": "<YOUR_AUDIO>"
}
}{
"end_turn": {}
}{
"close_stream": {}
}{
"result": {
"transcription": {
"transcript": "Hello, this is a test transcription.",
"isFinal": true,
"wordTimestamps": []
}
}
}No examples found{
"result": {
"speechStarted": {
"startTimeMs": 1250,
"confidence": 0.95
}
}
}Your authentication credentials. For Basic authentication, please populate Basic $INWORLD_API_KEY
Configure the transcription session. Must be the first message sent. Contains model selection, audio format settings, and optional feature configurations.
Send a chunk of audio data for transcription. Must be sent after the initial transcribe config message.
Signal the end of a speaker's turn. Some providers do not support manual turn-taking; for those providers, sending this message will have no effect.
Signal that the client is done sending audio data. Required for HTTP/WebSocket clients since there is no equivalent to gRPC stream close.
Transcription result streamed back as audio is processed. May be an interim (partial) result or a final result depending on the isFinal field.
Usage metrics for billing and monitoring purposes. Coming soon — this field is not yet populated.
Signal to indicate the start of a speaker's speech. Sent when voice activity is detected in the audio stream.