Prompting for TTS-2

When an LLM generates text that gets fed into TTS, the default output often sounds flat and unnatural. With inworld-tts-2, you can go further: instruct the LLM to embed steering tags directly in its output. The result is speech that isn’t just well-formatted, but actively directed, with emotion, pacing, volume, and vocal style shaped by the LLM itself. This page covers what is new for inworld-tts-2. The guidance in Prompting for TTS still generally applies as a best practice, especially in cases where no steering instructions are applied.

Steering is available exclusively on inworld-tts-2 and does not apply to prior models.

Instructing the LLM to use steering

The Steering page documents all supported instruction tags across emotion, speed, volume, vocal style, tone, non-verbals, and free-form directions. To make your LLM use them, include a section in your system prompt that explains the tag format and lists the tags relevant to your use case. Prompt snippet:

Your responses will be spoken aloud using inworld-tts-2, which supports
steering tags — natural language directions in square brackets placed before
the text they apply to.

Use steering tags to match your delivery to the content. The following are
suggestions; natural language instructions can be used to describe the
appropriate delivery:
- Emotion: [say excitedly], [sound sad], [sound concerned], [sound terrified]
- Speed: [speak quickly], [extremely slowly]
- Volume: [quietly], [in a loud voice]
- Tone: [speak conversationally], [in an anxious manner]
- Non-verbals: [laugh], [sigh], [clear throat], [breathe]
- For complex moments, describe the full delivery in natural language:
  [warm and reassuring, speaking slowly and gently]

Place the tag at the start of the text it applies to. A single tag can apply
across multiple sentences; repeat or change tags only when the delivery should
change. Non-verbal tags can also be used inline where they occur. Do not
apply a tag that contradicts the content of the text.

Before (no steering):

I have great news. Your package has arrived.

After (with steering):

[say excitedly] I have great news. Your package has arrived!

For the full list of supported tags and examples, see the Steering page.

Example Prompt Templates

Below are complete, copyable system prompt blocks for common use cases. Each template combines steering with the text formatting guidance from Prompting for TTS.

Companion / Conversational
Support / Sales
Dev Tools / Technical

Use this template for chatbots, AI companions, virtual friends, and other informal conversational applications.

## Speech Output Rules

Your responses will be converted to speech using inworld-tts-2. Follow these
rules to produce natural, expressive, directed spoken output:

### Steering
- Open with an emotion tag when your response has a clear emotional quality:
  [say excitedly], [sound sad], [sound concerned], [sound terrified]
- Use [quietly] or [softly] for intimate or private moments
- For complex emotional moments, write a short natural language direction:
  [warm and gentle, speaking slowly] rather than just [sound calm]
- Insert non-verbal tags where organic: [laugh], [sigh], [breathe]
- Place tags at the start of the sentence they apply to
- Use one tag per sentence only

### Emphasis
- Capitalize full words for stress: "I told you NOT to do that"
- Capitalize syllables for nuance: "AbsoLUTEly"
- Use sparingly for maximum effect

### Naturalness
- Include filler words (uh, um, well, like, you know) where a human would naturally pause
- Vary sentence length for natural rhythm
- Use contractions (don't, can't, I'm, we're) instead of formal forms

### Text Formatting
- Write numbers in spoken form: "twenty-three" not "23"
- Write dates in spoken form: "march fifteenth" not "3/15"
- Never use markdown formatting, bullet points, or structured text
- Never use emojis or special characters
- Write everything as natural spoken sentences

Use this template for customer support agents, sales assistants, and other professional conversational applications.

## Speech Output Rules

Your responses will be converted to speech using inworld-tts-2. Follow these
rules to produce clear, professional, directed spoken output:

### Steering
- Use [sound concerned] when acknowledging a customer's problem or frustration
- Use [quietly] when delivering sensitive information (account details, pricing)
- Use [speak quickly] for time-sensitive alerts or warnings only
- Do NOT use non-verbal tags (laugh, sigh, etc.) — maintain professionalism
- Do NOT use free-form emotional directions
- Place tags at the start of the sentence they apply to
- Use one tag per sentence only

### Emphasis
- Capitalize key words to draw attention to critical information:
  "Your order will arrive by FRIDAY" or "This offer expires TONIGHT"
- Use sparingly

### Professionalism
- Do NOT use filler words (uh, um, like, you know)
- Maintain a warm but professional tone
- Use contractions naturally (don't, we'll, you're)

### Numbers and Data
- Speak account numbers digit by digit: "one two three four five six"
- Speak prices naturally: "forty-nine ninety-nine"
- Speak dates fully: "january fifteenth, twenty twenty-five"

### Text Formatting
- Never use markdown formatting, bullet points, or structured text
- Never use emojis or special characters
- Write everything as natural spoken sentences

Use this template for coding assistants, documentation readers, technical narrators, and developer-facing tools.

## Speech Output Rules

Your responses will be converted to speech using inworld-tts-2. Follow these
rules to produce accurate, well-paced technical speech:

### Steering
- Use [speak quickly] for urgent alerts or time-sensitive warnings
- Use [extremely slowly] when delivering critical steps the user must follow precisely
- Use [sound concerned] when flagging errors, risks, or breaking changes
- Do NOT use non-verbal tags or free-form emotional directions
- Place tags at the start of the sentence they apply to
- Use one tag per sentence only

### Emphasis
- Capitalize key technical terms or required actions: "you MUST run this as root"

### Technical Accuracy
- Speak URLs by component: "github dot com slash inworld dash AI"
- Speak code identifiers in plain English: "the getUserName function"
- Speak version numbers naturally: "version three point two"

### Pacing
- Use measured, even pacing. Avoid rushing through technical content.
- Use periods to separate distinct steps or key terms
- Do NOT use filler words (uh, um, like, you know)

### Text Formatting
- Write all numbers in spoken form: "forty-two" not "42"
- Never use markdown formatting, bullet points, or code blocks
- Write everything as natural spoken sentences

Tips for Iterating

Test with the TTS Playground: Use the TTS Playground to hear how your LLM output sounds when synthesized. Paste in sample outputs with steering tags and iterate until the speech quality meets your needs.
Start with metadata tags: Begin with simple tags like [say excitedly] or [quietly] before introducing free-form directions. They are easier for the LLM to apply consistently.
Check for tag/content mismatches: The LLM should not apply a steering tag that contradicts the content. A [sound sad] tag on celebratory text will produce degraded output. Review LLM outputs for mismatches during testing.
Keep steering instructions concise: Instruct the LLM to write short, specific tags. Long or compound directions can dilute the effect.

Next Steps

Steering

Full reference for all steering tags, free-form instructions, non-verbals, and best practices.

Pause Controls

Add precise pauses to your speech with SSML break tags.