Cost Optimizer

Overview

The cost optimizer routes simple queries to cheap, fast models and reserves expensive flagship models for complex tasks that actually need them. Your application classifies each query’s complexity and passes it as metadata — Inworld Router handles the rest.

The Problem

Without intelligent routing, every API call — from “Hello” to “Analyze this legal contract” — goes to the same expensive model. You’re paying GPT-5 prices (

5.00/1M tokens) for queries that a

0.05/1M token model handles just as well.

The Solution

Your application classifies query complexity (simple vs. complex) and sends it as complexity. Inworld Router uses CEL conditions to route each query to the right model tier:

Simple queries (greetings, basic Q&A, summaries) → Cost-effective models
Complex queries (analysis, reasoning, code generation) → Premium models

Quick Start (No Router Needed)

The fastest way to optimize costs — no router creation required:

# Simple query → cheapest model
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "extra_body": {
      "sort": ["price", "latency"]
    }
  }'

# Complex query → smartest model
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Analyze this legal contract and identify liability clauses."}],
    "extra_body": {
      "sort": ["intelligence", "price"]
    }
  }'

Your application decides the sort priority per request. No router configuration needed.

Advanced: Router with Complexity-Based Routing

For production systems, create a router with conditional routes so Inworld Router handles the routing logic server-side. Your app passes complexity and the router does the rest.

Step 1: Create the Router

curl --request POST \
  --url https://api.inworld.ai/router/v1/routers \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "routers/cost-optimizer",
    "defaults": {
      "text_generation_config": {
        "max_new_tokens": 1024,
        "temperature": 0.7
      }
    },
    "routes": [
      {
        "route": {
          "route_id": "complex-queries",
          "variants": [
            {
              "variant": {
                "variant_id": "premium",
                "model_selection": {
                  "models": [
                    "openai/gpt-5",
                    "anthropic/claude-opus-4-6"
                  ],
                  "sort": [
                    {"metric": "SORT_METRIC_INTELLIGENCE"},
                    {"metric": "SORT_METRIC_PRICE"}
                  ]
                }
              },
              "weight": 100
            }
          ]
        },
        "condition": {
          "cel_expression": "complexity == \"complex\""
        }
      }
    ],
    "defaultRoute": {
      "route_id": "simple-queries",
      "variants": [
        {
          "variant": {
            "variant_id": "budget",
            "model_selection": {
              "models": [
                "groq/llama-3.1-8b-instant",
                "google-ai-studio/gemini-2.5-flash"
              ],
              "sort": [
                {"metric": "SORT_METRIC_PRICE"},
                {"metric": "SORT_METRIC_LATENCY"}
              ]
            }
          },
          "weight": 100
        }
      ]
    }
  }'

Routes are evaluated in order. Complex queries match the first route. Everything else (simple queries, or requests without complexity metadata) falls through to defaultRoute — the cheap models. This means if your app forgets to set complexity, the request defaults to the budget tier, which is the safe choice for cost optimization.

Step 2: Classify and Route

Your application determines complexity and passes it in metadata:

# Simple query → routes to defaultRoute (budget models)
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "inworld/cost-optimizer",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "extra_body": {
      "metadata": {
        "complexity": "simple"
      }
    }
  }'

# Complex query → routes to premium models
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "inworld/cost-optimizer",
    "messages": [
      {"role": "user", "content": "Analyze this legal contract and identify all potential liability clauses, then provide a risk assessment with recommendations."}
    ],
    "extra_body": {
      "metadata": {
        "complexity": "complex"
      }
    }
  }'

Step 3: Classify in Your App

Here’s a simple classification approach for your backend:

def classify_complexity(user_message: str) -> str:
    """Simple heuristic: short messages and common patterns are 'simple'."""
    simple_patterns = ["hello", "hi", "hey", "thanks", "summarize", "what is", "define"]
    message_lower = user_message.lower().strip()
    
    if len(user_message) < 50:
        return "simple"
    if any(message_lower.startswith(p) for p in simple_patterns):
        return "simple"
    return "complex"

# Use it when making requests
complexity = classify_complexity(user_message)
response = client.chat.completions.create(
    model="inworld/cost-optimizer",
    messages=[{"role": "user", "content": user_message}],
    extra_body={
        "metadata": {"complexity": complexity}
    }
)

Cost Savings Example

Consider a typical workload where 70% of queries are simple:

Simple queries (70%): 1M tokens/month
Complex queries (30%): 500K tokens/month

Without routing:

All queries use GPT-5: 1.5M tokens × $5.00 = **$ 7,500/month**

With cost-optimized routing:

Simple queries use Llama 3 8B: 1M tokens × $0.05 =$ 50
Complex queries use GPT-5: 500K tokens × $5.00 =$ 2,500
Total: $2,550/month

Savings: 66% reduction ($4,950/month saved)

Best Practices

Start simple: Use the Quick Start approach (per-request sort) before building a router
Default to cheap: Make defaultRoute the budget tier so unclassified queries don’t waste money
Monitor quality: Track response quality per tier to ensure budget models meet your minimum bar
Refine classification: Start with simple heuristics, then improve with actual usage data
Add more tiers: For production, consider three tiers — budget, standard, premium — using multiple CEL conditions

Next Steps

Failover System to add reliability on top of cost optimization
Extra Body Parameters for all available sort criteria
Conditional Routing for advanced CEL expressions

Getting Started

Core Concepts

Migration

Resources

Overview

The Problem

The Solution

Quick Start (No Router Needed)

Advanced: Router with Complexity-Based Routing

Step 1: Create the Router

Step 2: Classify and Route

Step 3: Classify in Your App

Cost Savings Example

Best Practices

Next Steps

Getting Started

Core Concepts

Migration

Resources

Documentation Index

​Overview

​The Problem

​The Solution

​Quick Start (No Router Needed)

​Advanced: Router with Complexity-Based Routing

​Step 1: Create the Router

​Step 2: Classify and Route

​Step 3: Classify in Your App

​Cost Savings Example

​Best Practices

​Next Steps

Overview

The Problem

The Solution

Quick Start (No Router Needed)

Advanced: Router with Complexity-Based Routing

Step 1: Create the Router

Step 2: Classify and Route

Step 3: Classify in Your App

Cost Savings Example

Best Practices

Next Steps