SSML break tags
Use when you need precise control over silence duration and position. You can insert silences at specific points in the generated speech. The TTS API and Inworld Portal support SSML<break time="1s" /> in text input for streaming, non-streaming, and WebSocket requests, in all languages. You can specify silences in milliseconds or seconds. For example, <break time="1000ms" /> and <break time="1s" /> produce the same result.
Constraints:
- Use well-formed SSML: specify the slash and brackets—for example,
<break time="1s" />. - Tag names and attributes are case insensitive; for example,
<BREAK time="2s" />works. - Up to 20 break tags are supported per request. After the first 20 tags, the remaining ones will be ignored.
- Each break is at most 10 seconds—for example,
time="10s"ortime="10000ms".
Emotion, delivery, and non-verbal markups
Use when you want to control emotion, delivery style, or add sounds like sighs and laughs. The markups below are experimental and supported for English only. They give you finer control over how the model speaks: emotional expression, delivery style such as whispering, and non-verbal vocalizations such as sighs and coughs.These markups are currently experimental and only support English.
Emotion and Delivery Style
Emotion and delivery style markups control the way a given text is spoken. These work best when used at the beginning of a text and apply to the text that follows.- Emotion:
[happy],[sad],[angry],[surprised],[fearful],[disgusted] - Delivery Style:
[laughing],[whispering]
Non-verbal Vocalization
Non-verbal vocalization markups add in non-verbal sounds based on where they are placed in the text.[breathe],[clear_throat],[cough],[laugh],[sigh],[yawn]