Skip to main content
Inworld’s Voice Design lets you create a completely new voice from a text description. It is perfect for when you need a unique voice but can’t find the right voice in our Voice Library and don’t have existing audio recordings for voice cloning. Voice Design uses a model to generate a voice based on the following two inputs:
  1. Voice description - A text description of the voice you have in mind (e.g., age, gender, accent, tone, pitch).
  2. Script - The text the voice will speak. This shapes the generated voice, so using a script that matches the intended voice produces the best results.
Each time you generate, we’ll return three voice previews so you can listen, compare, and select the ones that work best for your project.
Voice Design is currently in research preview. Please share any feedback with us via the feedback form in Portal or in Discord.

Design a Voice in Portal

1

Go to Inworld Portal

In Portal, select TTS Playground from the left-hand side panel. Click Create Voice and select Design.
2

Write a voice description

Describe the voice you want to create. The description must be in English and be between 30 and 250 characters.Keep your description concise but specific, so the model can most accurately produce what you have in mind. A good voice description should include:
  • Gender and age range (e.g., “a mid-20s to early 30s female voice”, “a middle-aged male voice”)
  • Accent (e.g., “British accent”, “Southern American accent”)
  • Pitch and pace (e.g., “low-pitched”, “fast-paced”, “steady pace”)
  • Tone and emotion (e.g., “warm and friendly”, “authoritative and composed”)
  • Timbre (e.g., “rich and smooth”, “slightly raspy”, “clear and bright”)
Example: “A middle-aged male voice with a clear British accent speaking at a steady pace and with a neutral tone.”
Use the Improve Description button to automatically enhance your description based on best practices. This adds missing attributes like pitch, pace, tone, and timbre to help the model produce a more accurate voice.
3

Select a language

Choose the language for your generated voice. If you’re using the auto-generated script, the script will be written in your selected language.
4

Choose a voice script

Select how you want to provide the script that the voice will speak:
  • Auto-generate script - The system automatically generates a script that matches your voice description in the selected language. This is the easiest option and works well for most use cases.
  • Write my own - Write a custom script for the voice to speak. For best results, scripts should result in 5 to 15 seconds of audio, which is roughly between 50 and 200 characters in English.
The script shapes the voice that gets generated. Use a script that matches your imagined voice, and the model will tailor the voice to suit the content it’s speaking.
5

Generate and preview voices

Click Generate voice, which will create 3 voice previews. Listen to each preview by clicking the play button, then select the voice(s) you want to keep.Each generation produces slightly different results. If the first set of voices doesn’t sound right, click Generate voice again to regenerate or adjust your description and voice script to better match what you have in mind before regenerating.
Check out our Voice Cloning Best Practices guide for helpful tips and tricks to improve your designed voices.
6

Save your voice

After selecting one or more voices, give each voice a name, add optional tags, and save them to your voice library. Your designed voices will appear alongside your other voices in the TTS Playground.
7

Use your voice via API

To use your designed voice via API, copy the voice ID from the TTS Playground. Use that value for the voiceId when making an API call. See our Quickstart to learn how to make your first API call.

Next Steps