You open a persistent WebSocket connection and send text messages. The server streams audio chunks back over the same connection — no per-request overhead, no repeated handshakes. This gives you the lowest possible latency. Best for voice agents and interactive applications that send multiple synthesis requests in a session, where avoiding connection setup on every call makes a measurable difference.Documentation Index
Fetch the complete documentation index at: https://dev.docs.inworld.ai/llms.txt
Use this file to discover all available pages before exploring further.
If you only need a single request-response with chunked audio, the Streaming API is simpler to integrate.
For tips on optimizing latency, see the latency best practices guide.
Timestamp Transport Strategy
When using timestamp alignment, you can choose how timestamps are delivered alongside audio usingtimestampTransportStrategy:
SYNC(default): Each chunk contains both audio and timestamps together.ASYNC: Audio chunks arrive first, with timestamps following in separate trailing messages. This reduces time-to-first-audio with TTS 1.5 models.
Code Examples
JavaScript
View our JavaScript implementation example
Python
View our Python WebSocket implementation example
API Reference
Synthesize Speech WebSocket
View the complete API specification
Next Steps
Voice Cloning Best Practices
Learn best practices for producing high-quality voice clones.
Speech Generation Best Practices
Learn best practices for synthesizing high-quality speech.
API Examples
Explore Python and JavaScript code examples for TTS integration.