Skip to main content
Inworld TTS On-Premises lets organizations run high-quality text-to-speech models locally — without sending text or audio data to the cloud. It’s built for enterprises that require strict data control, low latency, and compliance with internal or regulatory standards. Inworld TTS On-Premises is available for both the Inworld TTS-1.5 Mini and Inworld TTS-1.5 Max models.
To get started with TTS On-Premises, contact sales@inworld.ai for pricing and access to the container registry.

Why TTS On-Premises

Data stays in your environment

No outbound data transfer. Full ownership of text and audio.

Low-latency, real-time speech

Optimized for production workloads and interactive applications.

Designed for regulated industries

Suitable for air-gapped, private, and compliance-sensitive deployments.

Enterprise-ready deployment

Containerized architecture designed for operational stability.

How it works

Inworld TTS On-Premises is delivered as a GPU-accelerated, Docker-containerized version of the Inworld TTS API. It exposes both REST and gRPC APIs for easy integration. TTS On-Premises Architecture
PortProtocolDescription
8081HTTPREST API (recommended)
9030gRPCFor gRPC clients

Performance

  • Latency: Real-time streaming on supported NVIDIA GPUs
  • Throughput: Multiple concurrent sessions are supported depending on the GPU being utilized
Contact sales@inworld.ai to get a detailed performance report for your specific hardware.

System requirements

Inworld TTS supports all modern cloud NVIDIA GPUs: A100s, H100s, H200, B200, B300. If you have a specific target hardware platform not on this list, please reach out for custom support. The minimum inference machine requirements are as follows:
ComponentRequirement
GPUNVIDIA H100 SXM5 (80GB)
RAM64GB+ system memory
CPU8+ cores
Disk50GB free space
OSUbuntu 22.04 LTS
SoftwareDocker + NVIDIA Container Toolkit
SoftwareGoogle Cloud SDK (gcloud CLI)
CUDA13.0+

Prerequisites

Before deploying TTS On-Premises, ensure the following software is installed on your Ubuntu 22.04 LTS machine.

NVIDIA drivers

Install the latest NVIDIA drivers for your GPU. Follow the official guide at nvidia.com/drivers, or use the following commands on Ubuntu:
# Update packages
sudo apt-get update

# Install basic toolchain and kernel headers
sudo apt-get install -y gcc make wget linux-headers-$(uname -r)

# Install NVIDIA driver (check https://www.nvidia.com/en-us/drivers for the latest version)
sudo apt-get install -y nvidia-driver-580

Docker

Install Docker Engine by following the official guide: Install Docker Engine on Ubuntu. Optionally, add the current user to the docker group so you can run Docker without sudo: Linux post-installation steps.

NVIDIA Container Toolkit

Install the NVIDIA Container Toolkit to enable GPU access from Docker containers. Follow both the Installation and Configuration sections of the official guide: NVIDIA Container Toolkit install guide.

Google Cloud SDK

Install the gcloud CLI by following the official guide: Install the gcloud CLI.

Verify prerequisites

Run the following command to verify that Docker, NVIDIA drivers, and the NVIDIA Container Toolkit are all correctly installed:
docker run --rm --gpus all nvidia/cuda:13.0.0-base-ubuntu22.04 nvidia-smi
You should see your GPU listed in the output alongside the driver version and CUDA version. If this command succeeds, your environment is ready for TTS On-Premises deployment.

Firewall requirements

The TTS On-Premises container listens on the following ports for inbound traffic:
PortProtocolDescription
8081HTTPREST API
9030gRPCgRPC API
You will also need to allow the following outbound traffic:
  • us-central1-docker.pkg.dev on port 443 — GCP Artifact Registry for pulling container images

Quick start

1. Create a GCP service account

Create a service account in your GCP project and generate a key file:
# Create the service account
gcloud iam service-accounts create inworld-tts-onprem \
  --project=<YOUR_GCP_PROJECT> \
  --display-name="Inworld TTS On-Prem" \
  --description="Service account for Inworld TTS on-prem container"

# Create a key file
gcloud iam service-accounts keys create service-account-key.json \
  --iam-account=inworld-tts-onprem@<YOUR_GCP_PROJECT>.iam.gserviceaccount.com \
  --project=<YOUR_GCP_PROJECT>

2. Share the service account email with Inworld

Send the service account email (e.g., inworld-tts-onprem@<YOUR_GCP_PROJECT>.iam.gserviceaccount.com) to your Inworld contact. Inworld will provide your Customer ID.

3. Authenticate to the container registry

gcloud auth activate-service-account \
  --key-file=service-account-key.json

gcloud auth configure-docker us-central1-docker.pkg.dev
For more authentication options, see Configure authentication to Artifact Registry for Docker.

4. Configure

cp onprem.env.example onprem.env
Edit onprem.env with your values:
INWORLD_CUSTOMER_ID=<your-customer-id>
TTS_IMAGE=us-central1-docker.pkg.dev/inworld-ai-registry/tts-onprem/tts-1.5-mini-h100-onprem:<version>
KEY_FILE=./service-account-key.json

5. Start

./run.sh
The script will:
  1. Check prerequisites (Docker, GPU, NVIDIA Container Toolkit)
  2. Validate your configuration
  3. Fix key file permissions if needed
  4. Pull the Docker image
  5. Start the container
  6. Wait for services to be ready (~3 minutes)
The ML model takes approximately 3 minutes to load on first startup. This is normal.

6. Verify the deployment

Check that the container is running and services are healthy:
./run.sh status

7. Send a test request

curl -X POST http://localhost:8081/tts/v1/voice \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of the on-premises TTS system.",
    "voice_id": "Craig",
    "model_id": "inworld-tts-1.5-mini",
    "audio_config": {
      "audio_encoding": "LINEAR16",
      "sample_rate_hertz": 48000
    }
  }'

List available voices

curl http://localhost:8081/tts/v1/voices
For the full API specification, see the Synthesize Speech API reference.

Lifecycle commands

./run.sh              # Start the container
./run.sh stop         # Stop and remove the container
./run.sh status       # Check container and service health
./run.sh logs         # Show recent logs from all services
./run.sh logs -f      # Tail all service logs live
./run.sh logs export  # Export all logs to a timestamped folder
./run.sh restart      # Restart the container

Available images

ImageModelGPU
tts-1.5-mini-h100-onprem1B (mini)H100
tts-1.5-max-h100-onprem8B (max)H100
Registry: us-central1-docker.pkg.dev/inworld-ai-registry/tts-onprem/

Configuration

onprem.env

VariableRequiredDescription
INWORLD_CUSTOMER_IDYesYour customer ID
TTS_IMAGEYesDocker image URL (see Available Images)
KEY_FILEYesPath to your GCP service account key file

Logs

# Show recent logs from all services (last 20 lines each)
./run.sh logs

# Tail all service logs live
./run.sh logs -f

# Export all logs to a timestamped folder
./run.sh logs export
Individual service logs:
docker exec inworld-tts-onprem tail -f /var/log/tts-v3-trtllm.log        # ML server
docker exec inworld-tts-onprem tail -f /var/log/tts-normalization.log     # Text normalization
docker exec inworld-tts-onprem tail -f /var/log/public-tts-service.log    # TTS service
docker exec inworld-tts-onprem tail -f /var/log/grpc-gateway.log          # HTTP gateway
docker exec inworld-tts-onprem tail -f /var/log/w-proxy.log               # gRPC proxy
docker exec inworld-tts-onprem tail -f /var/log/supervisord.log           # Supervisor

Troubleshooting

IssueSolution
”INWORLD_CUSTOMER_ID is required”Set INWORLD_CUSTOMER_ID in onprem.env
”GCP credentials file not found”Check that KEY_FILE in onprem.env points to a valid file
”Credentials file is not readable”Fix permissions on host: chmod 644 <your-key-file>.json
”Topic not found”Verify your INWORLD_CUSTOMER_ID matches the PubSub topic name
”Permission denied for topic”Ensure Inworld has granted your service account publish access
Slow startup (~3 min)Normal — text processing grammars take time to initialize
# Check service status
docker exec inworld-tts-onprem supervisorctl -s unix:///tmp/supervisor.sock status

# Export logs for support
./run.sh logs export
Share the exported logs folder with Inworld support when reporting issues.

Advanced: manual Docker run

For users who prefer to run Docker directly without run.sh:
docker run -d \
  --gpus all \
  --name inworld-tts-onprem \
  -p 8081:8081 \
  -p 9030:9030 \
  -e INWORLD_CUSTOMER_ID=<your-customer-id> \
  -v $(pwd)/service-account-key.json:/app/gcp-credentials/service-account.json:ro \
  us-central1-docker.pkg.dev/inworld-ai-registry/tts-onprem/tts-1.5-mini-h100-onprem:<version>
  • Ensure your key file has 644 permissions: chmod 644 service-account-key.json
  • The container exposes port 8081 (HTTP) and 9030 (gRPC)
  • Use docker ps to check container health — STATUS will show healthy when ready
# Stop and remove
docker stop inworld-tts-onprem && docker rm inworld-tts-onprem

# View logs
docker logs inworld-tts-onprem

# Check service status
docker exec inworld-tts-onprem supervisorctl -s unix:///tmp/supervisor.sock status

Benchmarking

For performance testing, see the Benchmarking guide.

FAQs

Yes. The on-premises container is designed for production workloads. To get started, contact sales@inworld.ai for access to the repository.
For complete data control, low latency, and compliance with strict security or regulatory requirements.
No. All text and audio processing occurs entirely within your environment.
Deployment takes just a few minutes, with a brief model warm-up (~200 seconds).
Enterprises, governments, and regulated industries that cannot use cloud-based TTS.
In-scope:
  • API compatibility with Inworld public API
  • All built-in voices in Inworld’s Voice Library
  • The following model capabilities: text normalization, timestamps, and audio pre- and post-processing settings
  • Deployment how-to’s and latency benchmarks reproduction scripts
Out-of-scope:
  • Instant voice cloning features and their APIs
  • Voice design and its API