Audio Markups - Inworld AI Documentation

This feature is currently experimental and only supports English.

Audio markups give you a new level of control over how the model speaks, not just what it says. These markups can be used to control emotional expression, delivery style, and non-verbal vocalizations.

Emotion and Delivery Style

Emotion and delivery style markups control the way a given text is spoken. These work best when used at the beginning of a text and apply to the text that follows.

Emotion: [happy], [sad], [angry], [surprised], [fearful], [disgusted]
Delivery Style: [laughing], [whispering]

For example:

[happy] I can't believe this is happening.

For best results, use only one emotion or delivery style markup at the beginning of your text. Using multiple emotion and delivery style markups or placing them mid-text may produce mixed results. Instead, we recommend splitting up the text into separate requests, with all markups placed at the start of the text. See our Best Practices guide for more details.

Non-verbal Vocalization

Non-verbal vocalization markups add in non-verbal sounds based on where they are placed in the text.

[breathe], [clear_throat], [cough], [laugh], [sigh], [yawn]

For example:

[clear_throat] Did you hear what I said? [sigh] You never listen to me!

Multiple non-verbal vocalizations can be used within a single piece of text to add in the appropriate vocal effects throughout the speech.

​Emotion and Delivery Style

​Non-verbal Vocalization

Emotion and Delivery Style

Non-verbal Vocalization