Voice Cloning - Inworld AI Documentation

Inworld’s text-to-speech models offer best-in-class voice cloning capabilities, enabling developers to create distinct, personalized voices for their experiences. There are three ways to clone a voice:

Instant Voice Cloning - Clone a voice in minutes, with only 5-15 seconds of audio. Also known as zero-shot cloning. Available to all users through Portal.
Voice Cloning via API - Instant voice cloning via API. Useful for workflow automation or enabling your users to clone their own voices.
Professional Voice Cloning - For the highest quality, fine-tune a model with 30+ minutes of audio.

Professional voice cloning is currently not publicly available. To get access, please reach out to our sales team.

Don’t have audio samples? Use Voice Design to create a voice from a text description instead.

Instant Voice Cloning

Go to Inworld Portal

In Portal, select TTS Playground from the left-hand side panel. In the TTS Playground, click Create Voice and select Clone.

Upload or record audio samples

Name your voice and select the language, which should match the audio samples. Voices will work best when synthesizing text that matches the language of the original audio samples.You can either upload or record audio:

Upload: Drag and drop or browse to upload up to 3 audio files. Accepted formats: wav, mp3, webm. Maximum total size is 16MB. Audio samples longer than 15 seconds will be automatically trimmed to 15 seconds.
Record: Click “Record audio” and record your audio. You can use the suggested scripts to help guide your recording, or use your own script. For best results, record in a quiet place to minimize background noise, avoid mic noise, and speak with a variety of emotions to capture the full range of the voice.

Enable “Remove background noise” it you wish to remove background noise from your audio. Confirm you have the rights to clone the voice, then click “Continue”.

Check out our Voice Cloning Best Practices for helpful tips and tricks to improve the quality of your voices clones.

Test your cloned voice

Once voice cloning completes, you’ll see the “Try your cloned voice” interface. Enter text in the input field and press play to hear your cloned voice. You can test different phrases to ensure the voice sounds as expected.If the voice doesn’t sound quite right, you can delete the voice and start over, create another voice, or test it in the TTS Playground for more advanced testing options.

Use your cloned voice via API

To use the cloned voice via API, copy the voice ID for your cloned voice in TTS Playground. Use that value for the voiceId when making an API call. See our Quickstart to learn how to make your first API call.

Instant voice cloning may not perform well for less common voices, such as children’s voices or unique accents. For those use cases, we recommend professional voice cloning.

Voice Cloning API Reference And Examples

If you want to automate voice cloning (for example, to support creator onboarding at scale), use the Voice Cloning API.

API reference: Clone a voice
Python example: example_voice_clone.py
JavaScript example: example_voice_clone.js

Voice cloning has lower rate limits than regular speech synthesis. For details, see Rate limits.

Next Steps

Looking for more tips and tricks? Check out the resources below to get started!

Voice Cloning Best Practices

Learn best practices for producing high-quality voice clones.

Speech Generation Best Practices

Learn best practices for synthesizing high-quality speech.

API Examples

Explore Python and JavaScript code examples for TTS integration.

​Instant Voice Cloning

Go to Inworld Portal

Upload or record audio samples

Test your cloned voice

Use your cloned voice via API

​Voice Cloning API Reference And Examples

​Next Steps

Voice Cloning Best Practices

Speech Generation Best Practices

API Examples

Instant Voice Cloning

Voice Cloning API Reference And Examples

Next Steps