Convert Text-to-Speech (TTS) - Inworld AI Documentation

The node-tts template illustrates how to convert text-to-speech using the TTS node.

Architecture

Backend: Inworld Runtime
Frontend: N/A (CLI example)

Run the Template

Download and extract the Inworld Templates.
Install the Runtime SDK inside the cli directory.
yarn add @inworld/runtime
Set up your Base64 Runtime API key by copying the .env-sample file into a .env file in the cli folder and adding your API key.
.env
```
# Inworld Runtime Base64 API key
INWORLD_API_KEY=<your_api_key_here>
```
Try a different model or voice! You can specify the model using the --modelId parameter and a voice using the --voiceName parameter:
```
yarn node-tts "Hello, how are you?" --modelId=inworld-tts-1-max --voiceName=Ronald
```

Understanding the Template

The main functionality of the template is contained in the run function, which demonstrates how to use the Inworld Runtime to convert text-to-speech using the TTS node. Now let’s break down the template into more detail:

1) Node Initialization

We start by creating the TTS node.

const ttsNode = new RemoteTTSNode({
  id: 'tts_node',
  speakerId: voiceName,
  modelId,
  sampleRate: SAMPLE_RATE,
  temperature: 1.1,
  speakingRate: 1,
});

When creating the TTS node, you can specify:

id: A unique identifier for the node
speakerId: The voice to use for synthesis (see available voices)
modelId: The TTS model to use for synthesis
sampleRate: Audio output sample rate
temperature: Controls randomness in synthesis
speakingRate: Controls the speed of speech (1.0 is the voice’s natural speed)

2) Graph initialization

Next, we create the graph using the GraphBuilder, adding the TTS node and setting it as both start and end node:

const graph = new GraphBuilder({
  id: 'node_tts_graph',
  apiKey,
  enableRemoteConfig: false,
})
  .addNode(ttsNode)
  .setStartNode(ttsNode)
  .setEndNode(ttsNode)
  .build();

The GraphBuilder configuration includes:

id: A unique identifier for the graph
apiKey: Your Inworld API key for authentication
enableRemoteConfig: Whether to enable remote configuration (set to false for local execution)

In this example, we only have a single TTS node, setting it as the start and end node. In more complex applications, you could connect other nodes, like a LLM node, to the TTS node to create a processing pipeline.

3) Graph execution

Now we execute the graph with the text input directly:

const { outputStream } = graph.start(text);

The text input is passed directly to the graph, which will be processed by the TTS node.

4) Response handling

The audio generation results are handled using the processResponse method, which supports streaming audio responses:

let initialText = '';
let resultCount = 0;
let allAudioData: number[] = [];

for await (const result of outputStream) {
  await result.processResponse({
    TTSOutputStream: async (ttsStream: GraphTypes.TTSOutputStream) => {
      for await (const chunk of ttsStream) {
        if (chunk.text) initialText += chunk.text;
        if (chunk.audio?.data) {
          allAudioData = allAudioData.concat(Array.from(chunk.audio.data));
        }
        resultCount++;
      }
    },
  });
}

console.log(`Result count: ${resultCount}`);
console.log(`Initial text: ${initialText}`);

The response handler processes:

TTSOutputStream: Streaming audio responses containing both text and audio data
chunk.text: The text being synthesized
chunk.audio.data: The audio data as Float32Array samples

5) Audio file creation

Then, we encode the audio data and save it as a WAV file:

const audio = {
  sampleRate: SAMPLE_RATE,
  channelData: [new Float32Array(allAudioData)],
};

const buffer = await wavEncoder.encode(audio);
if (!fs.existsSync(OUTPUT_DIRECTORY)) {
  fs.mkdirSync(OUTPUT_DIRECTORY, { recursive: true });
}

fs.writeFileSync(OUTPUT_PATH, Buffer.from(buffer));

console.log(`Audio saved to ${OUTPUT_PATH}`);

​Run the Template

​Understanding the Template

​1) Node Initialization

​2) Graph initialization

​3) Graph execution

​4) Response handling

​5) Audio file creation

Run the Template

Understanding the Template

1) Node Initialization

2) Graph initialization

3) Graph execution

4) Response handling

5) Audio file creation