Run an Experiment
1
Build your graph
Complete the quickstart guide to build your first LLM-TTS endpoint using the Inworld CLI.
2
(Optional) Set up metrics collection
This step is only relevant if you have a client application with real user traffic. If you’re just getting started or testing variants internally, you can skip to the next step.
- Follow the metrics guide to implement metrics collection in your application. Common metrics to track include:
- User engagement (session length, messages per conversation)
- User satisfaction (ratings, task completion)
- Business metrics (conversion rates, retention)
 
- Create dashboards to verify the recorded metrics appear in Inworld dashboards
3
Set up your experiment
Navigate to the Experiments tab in Portal to
- Register your graph
- Create variants of your graph with different configurations
- Set up targeting rules to control which users see which variants
4
Monitor and analyze results
Track how your experiment impacts your metrics and make data-driven decisions:
- Check traces and logs to verify your experiment is running correctly
- Monitor results in Portal dashboards
- Export data to your own analytics platform for deeper analysis
- Deploy the winning variant when you have statistically significant results
Best Practices
Design & Setup
- Calculate sample size upfront - Use Power Analysis calculator with baseline metrics, desired MDE (typically 2-5%), α = 0.05, and power = 0.80. With sample size determined, you can calculate how much traffic to assign each variant in Experiments
- Create variants with consistent naming - Use GraphBuilder('my-graph-id').addNode(llmNode).toJSON()to export configs, then establish a naming convention likemodel-prompt-tools(e.g., “GPT5-Creative-Memory” vs “Claude4Sonnet-Analytical-RAG”)
- Always pass UserContext with targeting keys - Ex: use new UserContext({user_id: userId, user_tier: 'premium'}, userId)when executing graphs. Without the targeting key (the userId as 2nd argument), all users get the same variant regardless of your traffic splits
Running & Monitoring
- Only enable rules when ready to go live - Targeting rules are disabled by default when created to prevent accidental traffic serving. Use the 3-dot menu → Enable and the save button on the top right when you’re ready to launch your experiment
- Start with small traffic allocation - Use Experiments’s traffic distribution to validate your experiment setup and catch issues early, then scale up to reach your calculated sample size
- Use rule ordering strategically - Put specific targeting rules (premium users, specific regions) at the top since rules evaluate top-to-bottom
- Monitor via Portal dashboards - Watch your custom metrics alongside default metrics (Graph Executions Total, P99 Latency) to spot performance issues
- Use traces and logs for debugging - Check Portal Traces tab to verify graph execution flow and Logs tab for detailed error information to ensure variants are being served correctly
Analysis & Rollout
- Leverage Experiments’s gradual rollout - Deploy winners by increasing traffic allocation in your experiment rule (50/50 → 70/30 → 90/10). For 100% rollout, set the winning variant as the default in “Everyone else” rule, then delete the old targeting rule
- Clean up experiments properly - Use → Disable to stop experiments, then Delete old rules to keep Experiments organized
Troubleshooting
Why are all my users getting the same variant despite setting traffic splits? This happens when the UserContext is not properly configured in your code. Here’s what you need to check:- 
Specify a targeting key - This is typically the user ID and ensures the same user gets consistent variants:
- Include targeting attributes - Make sure to pass any attributes that your targeting rules use (e.g., country, user_tier, etc.)
If you don’t specify a targeting key, all users will share the same default key, causing everyone to get the same variant regardless of your traffic split settings.