Developer Quickstart - Inworld AI Documentation

The TTS Playground is the easiest way to experiment with Inworld’s Text-to-Speech models—try out different voices, adjust parameters, and preview instant voice clones. Once you’re ready to go beyond testing and build into a real-time application, the API gives you full access to advanced features and integration options. In this quickstart, we’ll focus on the Text-to-Speech API, guiding you through your first request to generate high-quality, ultra-realistic speech from text.

Make your first TTS API request

Create an API key

Create an Inworld account.In Inworld Portal, generate a Runtime API key, by clicking the Get API Key shortcut in the Overview tab, or going to Settings > API Keys. Copy the Base64 credentials.

Set your API key as an environment variable.

export INWORLD_API_KEY='your-base64-api-key-here'

Prepare your first request

For Python or JavaScript, create a new file called inworld_quickstart.py or inworld_quickstart.js. Copy the corresponding code into the file. For a curl request, copy the request.

import requests
import base64
import os

url = "https://api.inworld.ai/tts/v1/voice"

headers = {
    "Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}",
    "Content-Type": "application/json"
}

payload = {
    "text": "What a wonderful day to be a text-to-speech model!",
    "voiceId": "Ashley",
    "modelId": "inworld-tts-1.5-max"
}

response = requests.post(url, json=payload, headers=headers)
response.raise_for_status()
result = response.json()
audio_content = base64.b64decode(result['audioContent'])

with open("output.mp3", "wb") as f:
    f.write(audio_content)

For Python, you may also have to install requests if not already installed.

pip install requests

Run the code

Run the code for Python or JavaScript, or enter the curl command into your terminal.

python inworld_quickstart.py

You should see a saved file called output.mp3. You can play this file with any audio player.

Stream your audio output

Now that you’ve made your first TTS API request, you can try streaming responses as well. Assuming you’ve already followed the instructions above to set up your API key:

Prepare your streaming request

First, create a new file called inworld_stream_quickstart.py for Python or inworld_stream_quickstart.js for Javascript. Next, set your INWORLD_API_KEY as an environment variable. Finally, copy the following code into the file.For this streaming example, we’ll use Linear PCM format (instead of MP3), which we specify in the audio_config.

import requests
import base64
import os
import json
import wave
import io
import time

url = "https://api.inworld.ai/tts/v1/voice:stream"

headers = {
    "Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}",
    "Content-Type": "application/json"
}

payload = {
    "text": "What a wonderful day to be a text-to-speech model! I'm super excited to show you how streaming works! It makes it so much easier to generate audio for realtime applications.",
    "voiceId": "Ashley",
    "modelId": "inworld-tts-1.5-max",
    "audio_config": {
        "audio_encoding": "LINEAR16",
        "sample_rate_hertz": 48000,
    },
}

start_time = time.time()
response = requests.post(url, json=payload, headers=headers, stream=True)
response.raise_for_status()

raw_audio_data = io.BytesIO()
first_chunk_time = None

for line in response.iter_lines():
    if first_chunk_time is None:
        first_chunk_time = time.time()

    chunk = json.loads(line)
    audio_chunk = base64.b64decode(chunk["result"]["audioContent"])
    if len(audio_chunk) > 44: # skip the wav header
        audio_data = audio_chunk[44:]
        raw_audio_data.write(audio_data)
        print(f"Appended {len(audio_data)} bytes to ouput_stream.wav")

end_time = time.time()

with wave.open("ouput_stream.wav", "wb") as wf:
    wf.setnchannels(1)
    wf.setsampwidth(2)
    wf.setframerate(payload["audio_config"]["sample_rate_hertz"])
    wf.writeframes(raw_audio_data.getvalue())

# Calculate and display latency statistics
time_to_first_chunk = first_chunk_time - start_time if first_chunk_time else None
overall_latency = end_time - start_time

print("Audio file completed!")
print(f"Time to first chunk: {time_to_first_chunk:.3f}s" if time_to_first_chunk else "No chunks received")
print(f"Overall request latency: {overall_latency:.3f}s")

Run the code

Run the code for Python or JavaScript. The console will print out as streamed bytes are written to the audio file.

python inworld_stream_quickstart.py

You should see a saved file called output_stream.wav. You can play this file with any audio player.

Next Steps

Now that you’ve tried out Inworld’s TTS API, you can explore more of Inworld’s TTS capabilities.

TTS

Understand the capabilities of Inworld’s TTS models.

Voice Cloning

Create a personalized voice clone with just 5 seconds of audio.

Best Practices

Learn tips and tricks for synthesizing high-quality speech.

​Make your first TTS API request

Create an API key

Prepare your first request

Run the code

​Stream your audio output

Prepare your streaming request

Run the code

​Next Steps

TTS

Voice Cloning

Best Practices

Make your first TTS API request

Stream your audio output

Next Steps