Skip to main content

Documentation Index

Fetch the complete documentation index at: https://dev.docs.inworld.ai/llms.txt

Use this file to discover all available pages before exploring further.

When an LLM generates text that gets fed into TTS, the default output often sounds flat and unnatural. With inworld-tts-2, you can go further: instruct the LLM to embed steering tags directly in its output. The result is speech that isn’t just well-formatted, but actively directed, with emotion, pacing, volume, and vocal style shaped by the LLM itself. This page covers what is new for inworld-tts-2. The guidance in Prompting for TTS still generally applies as a best practice, especially in cases where no steering instructions are applied.
Steering is available exclusively on inworld-tts-2 and does not apply to prior models.

Instructing the LLM to use steering

The Steering page documents all supported instruction tags across emotion, speed, volume, vocal style, tone, non-verbals, and free-form directions. To make your LLM use them, include a section in your system prompt that explains the tag format and lists the tags relevant to your use case. Prompt snippet:
Your responses will be spoken aloud using inworld-tts-2, which supports
steering tags — natural language directions in square brackets placed before
the text they apply to.

Use steering tags to match your delivery to the content. The following are
suggestions; natural language instructions can be used to describe the
appropriate delivery:
- Emotion: [say excitedly], [sound sad], [sound concerned], [sound terrified]
- Speed: [speak quickly], [extremely slowly]
- Volume: [quietly], [in a loud voice]
- Tone: [speak conversationally], [in an anxious manner]
- Non-verbals: [laugh], [sigh], [clear throat], [breathe]
- For complex moments, describe the full delivery in natural language:
  [warm and reassuring, speaking slowly and gently]

Place the tag at the start of the text it applies to. A single tag can apply
across multiple sentences; repeat or change tags only when the delivery should
change. Non-verbal tags can also be used inline where they occur. Do not
apply a tag that contradicts the content of the text.
Before (no steering):
I have great news. Your package has arrived.
After (with steering):
[say excitedly] I have great news. Your package has arrived!
For the full list of supported tags and examples, see the Steering page.

Example Prompt Templates

Below are complete, copyable system prompt blocks for common use cases. Each template combines steering with the text formatting guidance from Prompting for TTS.
Use this template for chatbots, AI companions, virtual friends, and other informal conversational applications.
## Speech Output Rules

Your responses will be converted to speech using inworld-tts-2. Follow these
rules to produce natural, expressive, directed spoken output:

### Steering
- Open with an emotion tag when your response has a clear emotional quality:
  [say excitedly], [sound sad], [sound concerned], [sound terrified]
- Use [quietly] or [softly] for intimate or private moments
- For complex emotional moments, write a short natural language direction:
  [warm and gentle, speaking slowly] rather than just [sound calm]
- Insert non-verbal tags where organic: [laugh], [sigh], [breathe]
- Place tags at the start of the sentence they apply to
- Use one tag per sentence only

### Emphasis
- Capitalize full words for stress: "I told you NOT to do that"
- Capitalize syllables for nuance: "AbsoLUTEly"
- Use sparingly for maximum effect

### Naturalness
- Include filler words (uh, um, well, like, you know) where a human would naturally pause
- Vary sentence length for natural rhythm
- Use contractions (don't, can't, I'm, we're) instead of formal forms

### Text Formatting
- Write numbers in spoken form: "twenty-three" not "23"
- Write dates in spoken form: "march fifteenth" not "3/15"
- Never use markdown formatting, bullet points, or structured text
- Never use emojis or special characters
- Write everything as natural spoken sentences

Tips for Iterating

  • Test with the TTS Playground: Use the TTS Playground to hear how your LLM output sounds when synthesized. Paste in sample outputs with steering tags and iterate until the speech quality meets your needs.
  • Start with metadata tags: Begin with simple tags like [say excitedly] or [quietly] before introducing free-form directions. They are easier for the LLM to apply consistently.
  • Check for tag/content mismatches: The LLM should not apply a steering tag that contradicts the content. A [sound sad] tag on celebratory text will produce degraded output. Review LLM outputs for mismatches during testing.
  • Keep steering instructions concise: Instruct the LLM to write short, specific tags. Long or compound directions can dilute the effect.

Next Steps

Steering

Full reference for all steering tags, free-form instructions, non-verbals, and best practices.

Pause Controls

Add precise pauses to your speech with SSML break tags.

Prompting for TTS

Prompt engineering techniques that apply to all Inworld TTS models.