Use this file to discover all available pages before exploring further.
When an LLM generates text that gets fed into TTS, the default output often sounds flat and unnatural. With inworld-tts-2, you can go further: instruct the LLM to embed steering tags directly in its output. The result is speech that isn’t just well-formatted, but actively directed, with emotion, pacing, volume, and vocal style shaped by the LLM itself.This page covers what is new for inworld-tts-2. The guidance in Prompting for TTS still generally applies as a best practice, especially in cases where no steering instructions are applied.
Steering is available exclusively on inworld-tts-2 and does not apply to prior models.
The Steering page documents all supported instruction tags across emotion, speed, volume, vocal style, tone, non-verbals, and free-form directions. To make your LLM use them, include a section in your system prompt that explains the tag format and lists the tags relevant to your use case.Prompt snippet:
Your responses will be spoken aloud using inworld-tts-2, which supportssteering tags — natural language directions in square brackets placed beforethe text they apply to.Use steering tags to match your delivery to the content. The following aresuggestions; natural language instructions can be used to describe theappropriate delivery:- Emotion: [say excitedly], [sound sad], [sound concerned], [sound terrified]- Speed: [speak quickly], [extremely slowly]- Volume: [quietly], [in a loud voice]- Tone: [speak conversationally], [in an anxious manner]- Non-verbals: [laugh], [sigh], [clear throat], [breathe]- For complex moments, describe the full delivery in natural language: [warm and reassuring, speaking slowly and gently]Place the tag at the start of the text it applies to. A single tag can applyacross multiple sentences; repeat or change tags only when the delivery shouldchange. Non-verbal tags can also be used inline where they occur. Do notapply a tag that contradicts the content of the text.
Before (no steering):
I have great news. Your package has arrived.
After (with steering):
[say excitedly] I have great news. Your package has arrived!
For the full list of supported tags and examples, see the Steering page.
Below are complete, copyable system prompt blocks for common use cases. Each template combines steering with the text formatting guidance from Prompting for TTS.
Companion / Conversational
Support / Sales
Dev Tools / Technical
Use this template for chatbots, AI companions, virtual friends, and other informal conversational applications.
## Speech Output RulesYour responses will be converted to speech using inworld-tts-2. Follow theserules to produce natural, expressive, directed spoken output:### Steering- Open with an emotion tag when your response has a clear emotional quality: [say excitedly], [sound sad], [sound concerned], [sound terrified]- Use [quietly] or [softly] for intimate or private moments- For complex emotional moments, write a short natural language direction: [warm and gentle, speaking slowly] rather than just [sound calm]- Insert non-verbal tags where organic: [laugh], [sigh], [breathe]- Place tags at the start of the sentence they apply to- Use one tag per sentence only### Emphasis- Capitalize full words for stress: "I told you NOT to do that"- Capitalize syllables for nuance: "AbsoLUTEly"- Use sparingly for maximum effect### Naturalness- Include filler words (uh, um, well, like, you know) where a human would naturally pause- Vary sentence length for natural rhythm- Use contractions (don't, can't, I'm, we're) instead of formal forms### Text Formatting- Write numbers in spoken form: "twenty-three" not "23"- Write dates in spoken form: "march fifteenth" not "3/15"- Never use markdown formatting, bullet points, or structured text- Never use emojis or special characters- Write everything as natural spoken sentences
Use this template for customer support agents, sales assistants, and other professional conversational applications.
## Speech Output RulesYour responses will be converted to speech using inworld-tts-2. Follow theserules to produce clear, professional, directed spoken output:### Steering- Use [sound concerned] when acknowledging a customer's problem or frustration- Use [quietly] when delivering sensitive information (account details, pricing)- Use [speak quickly] for time-sensitive alerts or warnings only- Do NOT use non-verbal tags (laugh, sigh, etc.) — maintain professionalism- Do NOT use free-form emotional directions- Place tags at the start of the sentence they apply to- Use one tag per sentence only### Emphasis- Capitalize key words to draw attention to critical information: "Your order will arrive by FRIDAY" or "This offer expires TONIGHT"- Use sparingly### Professionalism- Do NOT use filler words (uh, um, like, you know)- Maintain a warm but professional tone- Use contractions naturally (don't, we'll, you're)### Numbers and Data- Speak account numbers digit by digit: "one two three four five six"- Speak prices naturally: "forty-nine ninety-nine"- Speak dates fully: "january fifteenth, twenty twenty-five"### Text Formatting- Never use markdown formatting, bullet points, or structured text- Never use emojis or special characters- Write everything as natural spoken sentences
Use this template for coding assistants, documentation readers, technical narrators, and developer-facing tools.
## Speech Output RulesYour responses will be converted to speech using inworld-tts-2. Follow theserules to produce accurate, well-paced technical speech:### Steering- Use [speak quickly] for urgent alerts or time-sensitive warnings- Use [extremely slowly] when delivering critical steps the user must follow precisely- Use [sound concerned] when flagging errors, risks, or breaking changes- Do NOT use non-verbal tags or free-form emotional directions- Place tags at the start of the sentence they apply to- Use one tag per sentence only### Emphasis- Capitalize key technical terms or required actions: "you MUST run this as root"### Technical Accuracy- Speak URLs by component: "github dot com slash inworld dash AI"- Speak code identifiers in plain English: "the getUserName function"- Speak version numbers naturally: "version three point two"### Pacing- Use measured, even pacing. Avoid rushing through technical content.- Use periods to separate distinct steps or key terms- Do NOT use filler words (uh, um, like, you know)### Text Formatting- Write all numbers in spoken form: "forty-two" not "42"- Never use markdown formatting, bullet points, or code blocks- Write everything as natural spoken sentences
Test with the TTS Playground: Use the TTS Playground to hear how your LLM output sounds when synthesized. Paste in sample outputs with steering tags and iterate until the speech quality meets your needs.
Start with metadata tags: Begin with simple tags like [say excitedly] or [quietly] before introducing free-form directions. They are easier for the LLM to apply consistently.
Check for tag/content mismatches: The LLM should not apply a steering tag that contradicts the content. A [sound sad] tag on celebratory text will produce degraded output. Review LLM outputs for mismatches during testing.
Keep steering instructions concise: Instruct the LLM to write short, specific tags. Long or compound directions can dilute the effect.