Run an Experiment

Inworld Runtime enables you run experiments on your live application without having to redeploy your graph application logic. Test different models, prompts, and graph configurations on different segments of your user base instantly using Experiments and iterate quickly to improve user experience and increase usage.

Build your graph

Complete the quickstart guide to build your first LLM-TTS endpoint using the Inworld CLI.

(Optional) Set up metrics collection

This step is only relevant if you have a client application with real user traffic. If you’re just getting started or testing variants internally, you can skip to the next step.

Follow the metrics guide to implement metrics collection in your application. Common metrics to track include:
- User engagement (session length, messages per conversation)
- User satisfaction (ratings, task completion)
- Business metrics (conversion rates, retention)
Create dashboards to verify the recorded metrics appear in Inworld dashboards

Set up your experiment

Navigate to the Experiments tab in Portal to

Register your graph
Create variants of your graph with different configurations
Set up targeting rules to control which users see which variants

Monitor and analyze results

Track how your experiment impacts your metrics and make data-driven decisions:

Check traces and logs to verify your experiment is running correctly
Monitor results in Portal dashboards
Export data to your own analytics platform for deeper analysis
Deploy the winning variant when you have statistically significant results

Best Practices

Design & Setup

Calculate sample size upfront - Use Power Analysis calculator with baseline metrics, desired MDE (typically 2-5%), α = 0.05, and power = 0.80. With sample size determined, you can calculate how much traffic to assign each variant in Experiments
Create variants with consistent naming - Use GraphBuilder('my-graph-id').addNode(llmNode).toJSON() to export configs, then establish a naming convention like model-prompt-tools (e.g., “GPT5-Creative-Memory” vs “Claude4Sonnet-Analytical-RAG”)
Always pass UserContext with targeting keys - Ex: use new UserContext({user_id: userId, user_tier: 'premium'}, userId) when executing graphs. Without the targeting key (the userId as 2nd argument), all users get the same variant regardless of your traffic splits

Running & Monitoring

Only enable rules when ready to go live - Targeting rules are disabled by default when created to prevent accidental traffic serving. Use the 3-dot menu → Enable and the save button on the top right when you’re ready to launch your experiment
Start with small traffic allocation - Use Experiments’s traffic distribution to validate your experiment setup and catch issues early, then scale up to reach your calculated sample size
Use rule ordering strategically - Put specific targeting rules (premium users, specific regions) at the top since rules evaluate top-to-bottom
Monitor via Portal dashboards - Watch your custom metrics alongside default metrics (Graph Executions Total, P99 Latency) to spot performance issues
Use traces and logs for debugging - Check Portal Traces tab to verify graph execution flow and Logs tab for detailed error information to ensure variants are being served correctly

Analysis & Rollout

Leverage Experiments’s gradual rollout - Deploy winners by increasing traffic allocation in your experiment rule (50/50 → 70/30 → 90/10). For 100% rollout, set the winning variant as the default in “Everyone else” rule, then delete the old targeting rule
Clean up experiments properly - Use → Disable to stop experiments, then Delete old rules to keep Experiments organized

Troubleshooting

Why are all my users getting the same variant despite setting traffic splits? This happens when the UserContext is not properly configured in your code. Here’s what you need to check:

Specify a targeting key - This is typically the user ID and ensures the same user gets consistent variants:

// CORRECT: Each user gets a consistent variant
const userContext = new UserContext(
    {  
        country: country  // attributes for targeting rules
    }, 
    userId  // targeting key
);
await graphExecutor.execute(input, executionId, userContext);

// INCORRECT: All users get the same variant
await graphExecutor.execute(input, executionId); 

Include targeting attributes - Make sure to pass any attributes that your targeting rules use (e.g., country, user_tier, etc.)

If you don’t specify a targeting key, all users will share the same default key, causing everyone to get the same variant regardless of your traffic split settings.

See Experiments and UserContext for more details.