TTS(Text-to-speech) Node Demo - Inworld AI Documentation

This demo showcases how to use the TTSNode.

Run the Template

Go to Assets/InworldRuntime/Scenes/Nodes and play the TTSNode scene.
Once the graph is compiled, enter text or send a preview message to generate speech.

Understanding the Graph

You can find the graph on the InworldGraphExecutor of TTSCanvas. The graph is very simple. It contains a single node, TTSNode, with no edges. TTSNode is both the StartNode and the EndNode.

InworldController

The InworldController is also simple; it contains only one primitive module: TTS.

For details about the primitive module, see the TTS Primitive Demo.

Workflow

When the game starts, InworldController initializes its only module, TTSModule, which creates the TTSInterface using the voice ID selected in the dropdown.
Next, InworldGraphExecutor initializes its graph asset by calling each component’s CreateRuntime(). In this case, only TTSNode.CreateRuntime() is called, using the created TTSInterface as input.
After initialization, the graph calls Compile() and returns the executor handle.
After compilation, the OnGraphCompiled event is invoked. In this demo, TTSNodeTemplate subscribes to it and enables the UI components. Users can then interact with the graph system.

TTSNodeTemplate.cs

protected override void OnGraphCompiled(InworldGraphAsset obj)
{
    foreach (InworldUIElement element in m_UIElements)
        element.Interactable = true;

}

After the UI is initialized, pressing the Preview button sends “Hello, I’m ” as InworldText to the graph.
When you enter a sentence and press Enter or the Send button, your message is also sent as InworldText.

TTSNodeTemplate.cs

protected override void OnEnable()
{
    base.OnEnable();
    if (!m_Audio)
        return;
    m_Audio.Event.onStartCalibrating.AddListener(()=>Title("Calibrating"));
    m_Audio.Event.onStopCalibrating.AddListener(Calibrated);
    m_Audio.Event.onPlayerStartSpeaking.AddListener(()=>Title("PlayerSpeaking"));
    m_Audio.Event.onPlayerStopSpeaking.AddListener(()=>
    {
        Title("");
        if (m_STTResult)
            m_STTResult.text = "";
    });
    m_Audio.Event.onAudioSent.AddListener(SendAudio);
}

void SendAudio(List<float> audioData)
{
    if (!m_ModuleInitialized)
        return;
    InworldVector<float> wave = new InworldVector<float>();
    wave.AddRange(audioData);
    
    _ = m_InworldGraphExecutor.ExecuteGraphAsync("STT", new InworldAudio(wave, wave.Size));
}

Calling ExecuteGraphAsync() eventually produces a result and invokes OnGraphResult(), which TTSNodeTemplate subscribes to in order to receive the data.

TTSNodeTemplate.cs

protected override async void OnGraphResult(InworldBaseData obj)
{
    InworldDataStream<TTSOutput> outputStream = new InworldDataStream<TTSOutput>(obj);
    InworldInputStream<TTSOutput> stream = outputStream.ToInputStream();
    int sampleRate = 0;
    List<float> result = new List<float>();
    await Awaitable.BackgroundThreadAsync();
    while (stream != null && stream.HasNext)
    {
        TTSOutput ttsOutput = stream.Read();
        if (ttsOutput == null) 
            continue;
        InworldAudio chunk = ttsOutput.Audio;
        sampleRate = chunk.SampleRate;
        List<float> data = chunk.Waveform?.ToList();
        if (data != null && data.Count > 0)
            result.AddRange(data);
        await Awaitable.NextFrameAsync();
    }
    await Awaitable.MainThreadAsync();
    string output = $"SampleRate: {sampleRate} Sample Count: {result.Count}";
    Debug.Log(output);
    int sampleCount = result.Count;
    if (sampleRate == 0 || sampleCount == 0)
        return;
    AudioClip audioClip = AudioClip.Create("TTS", sampleCount, 1, sampleRate, false);
    audioClip.SetData(result.ToArray(), 0);
    m_AudioSource?.PlayOneShot(audioClip);
}

The returned data type from TTSNode is InworldDataStream<TTSOutput>, which does not expose read APIs. Convert it to InworldInputStream<TTSOutput> first.
In this demo, we read on a background thread using Unity’s Awaitable.
After all waveform data is collected and we switch back to the main thread, we play it using the attached AudioSource.

Switching voiceID

As you know, the InworldGraphSystem must be compiled before it can be used, and the voice ID is set during the compilation phase. Therefore, to switch the voice ID at runtime, we actually need to terminate the current graph executor and restart the initialization process with the new ID.

​Run the Template

​Understanding the Graph

​InworldController

​Workflow

​Switching voiceID

Run the Template

Understanding the Graph

InworldController

Workflow

Switching voiceID