> ## Documentation Index
> Fetch the complete documentation index at: https://dev.docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Turn Detection

> Detect end-of-turn automatically or control turn boundaries manually with the Inworld STT streaming API.

Turn detection identifies when a speaker has finished talking — the core signal a voice agent needs to know when to respond. The STT streaming API supports turn detection out of the box: the server detects end-of-turn automatically, and you can tune its sensitivity or take full manual control.

Turn detection is available on the [WebSocket streaming endpoint](/api-reference/sttAPI/speechtotext/transcribe-stream-websocket). Sync (file upload) transcription processes complete audio files, so turn detection does not apply.

## How it works

With `inworld/inworld-stt-1` streaming, turn detection runs by default — no configuration required:

1. As you stream audio, the server returns interim (partial) transcription results.
2. When the server detects end-of-turn (for example, a sustained pause), it finalizes the transcript for that turn (`isFinal: true`).
3. Speech after the turn boundary starts a new transcript.

With default settings, a sustained mid-utterance pause (on the order of a couple of seconds) is enough to split the transcript into two final results. The exact pause duration is not fixed — it depends on the end-of-turn model's confidence and can be tuned with the thresholds below.

The server also emits voice-activity events you can use to drive application behavior (e.g., interrupt playback when the user starts speaking):

| **Event**       | **Meaning**                                 |
| :-------------- | :------------------------------------------ |
| `speechStarted` | Voice activity detected in the audio stream |
| `speechStopped` | Silence detected after speech has stopped   |

## Tuning automatic turn detection

Adjust sensitivity via `transcribeConfig` in the first WebSocket message:

| **Field**                                             | **Type**     | **Default** | **Description**                                                                                                                                                     |
| :---------------------------------------------------- | :----------- | :---------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `endOfTurnConfidenceThreshold`                        | float        | `0.5`       | Confidence required to declare end-of-turn. Higher values reduce false positives (fewer premature turn splits) at the cost of slower turn detection. Range: 0.0–1.0 |
| `inworldSttV1Config.minEndOfTurnSilenceWhenConfident` | integer (ms) | —           | Minimum silence duration before finalizing a turn when confidence is high                                                                                           |
| `inworldSttV1Config.vadThreshold`                     | float        | `0.5`       | Voice activity detection threshold. Range: 0.0–1.0                                                                                                                  |
| `inactivityTimeoutSeconds`                            | integer      | —           | Stops transcription if the client is silent for this duration                                                                                                       |

```json theme={"system"}
{
  "transcribeConfig": {
    "modelId": "inworld/inworld-stt-1",
    "audioEncoding": "LINEAR16",
    "endOfTurnConfidenceThreshold": 0.7,
    "inworldSttV1Config": {
      "minEndOfTurnSilenceWhenConfident": 800
    }
  }
}
```

<Note>
  Turn-detection tuning fields are also available for AssemblyAI models via `assemblyaiConfig` (`minEndOfTurnSilenceWhenConfident`, `maxTurnSilence`, `vadThreshold`). Turn-detection behavior for third-party models follows the capabilities of each provider.
</Note>

## Manual turn control

To hand turn control fully to the client, disable server-side voice activity detection by setting `vadThreshold` to `0`:

```json theme={"system"}
{
  "transcribeConfig": {
    "modelId": "inworld/inworld-stt-1",
    "audioEncoding": "LINEAR16",
    "inworldSttV1Config": {
      "vadThreshold": 0
    }
  }
}
```

With VAD disabled, the server no longer splits turns automatically. Signal turn boundaries yourself:

* Send an `endTurn` message at the end of each speaker turn to finalize the transcript.
* Send `closeStream` when you are done sending audio.

<Note>
  With manual turn control, a single turn has a maximum length (currently around 30 seconds; subject to change). Send `endTurn` regularly at natural turn boundaries rather than relying on the limit.
</Note>

## Choosing a mode

| **Mode**                   | **When to use**                                                                                                                             |
| :------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------ |
| Automatic (default)        | Voice agents and live transcription where the server should decide when the speaker is done                                                 |
| Automatic, tuned           | Environments with background noise, slow speakers, or domain-specific pacing — adjust thresholds to reduce premature or delayed turn splits |
| Manual (`vadThreshold: 0`) | Push-to-talk UIs, client-side VAD, or applications with their own turn-taking logic                                                         |

## Next steps

<CardGroup cols={2}>
  <Card title="WebSocket API Reference" icon="code" href="/api-reference/sttAPI/speechtotext/transcribe-stream-websocket">
    Full message and configuration schema for the streaming endpoint.
  </Card>

  <Card title="Developer Quickstart" icon="bolt" href="/stt/quickstart">
    Make your first STT API call and get a transcript.
  </Card>
</CardGroup>
