- Speech-to-text (STT) - for understanding speech inputs
- LLM - for generating the agent text response
- Text-to-speech (TTS) - for generating agent speech audio
Architecture
- Backend: Inworld Runtime + Express.js
- Frontend: Vite + React
- Communication: WebSocket
Run the Template
Start the Server
- Download and extract the Inworld Templates and open it in an IDE.
- Open a new terminal window.
- 
Navigate to the voice agent server directory:
- 
Make a copy of the .env-sample file:
- 
Paste your Base64 Runtime API key into the .env file and set GRAPH_VISUALIZATION_ENABLED=true:.env
- 
Install dependencies:
- 
Start the server:
The server will start on port 4000. You will see the path to your graph visualization printed in your terminal.  Graph visualization is currently not supported on Windows Graph visualization is currently not supported on Windows
Start the Client
- Open a new terminal window.
- Navigate to the voice agent client directory:
- Install dependencies:
- Start the client:
The client will start on port 3000 (or the next available port if 3000 is in use) and should automatically open in your default browser.
Chat with Agent
- 
Configure your agent in the UI:
- Enter your name
- Set the agent’s name
- Provide a description for the agent
- Define the agent’s motivation
  
- Click “Start” to begin the conversation.
- 
Chat with the agent:
- Type text in the input field and hit Enter or click the send button
- Click the microphone icon to use voice input
- Click the copy icon to copy the conversation to the clipboard
  
- 
Observe Telemetry
- View dashboards, traces, and logs in the Inworld Portal