Skip to main content

Overview

The cost optimizer routes simple queries to cheap, fast models and reserves expensive flagship models for complex tasks that actually need them. Your application classifies each query’s complexity and passes it as metadata — Inworld Router handles the rest.

The Problem

Without intelligent routing, every API call — from “Hello” to “Analyze this legal contract” — goes to the same expensive model. You’re paying GPT-5 prices (5.00/1Mtokens)forqueriesthata5.00/1M tokens) for queries that a 0.05/1M token model handles just as well.

The Solution

Your application classifies query complexity (simple vs. complex) and sends it as complexity. Inworld Router uses CEL conditions to route each query to the right model tier:
  • Simple queries (greetings, basic Q&A, summaries) → Cost-effective models
  • Complex queries (analysis, reasoning, code generation) → Premium models

Quick Start (No Router Needed)

The fastest way to optimize costs — no router creation required:
# Simple query → cheapest model
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "extra_body": {
      "sort": ["price", "latency"]
    }
  }'
# Complex query → smartest model
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Analyze this legal contract and identify liability clauses."}],
    "extra_body": {
      "sort": ["intelligence", "price"]
    }
  }'
Your application decides the sort priority per request. No router configuration needed.

Advanced: Router with Complexity-Based Routing

For production systems, create a router with conditional routes so Inworld Router handles the routing logic server-side. Your app passes complexity and the router does the rest.

Step 1: Create the Router

curl --request POST \
  --url https://api.inworld.ai/router/v1/routers \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "routers/cost-optimizer",
    "defaults": {
      "text_generation_config": {
        "max_new_tokens": 1024,
        "temperature": 0.7
      }
    },
    "routes": [
      {
        "route": {
          "route_id": "complex-queries",
          "variants": [
            {
              "variant": {
                "variant_id": "premium",
                "model_selection": {
                  "models": [
                    "openai/gpt-5",
                    "anthropic/claude-opus-4-6"
                  ],
                  "sort": [
                    {"metric": "SORT_METRIC_INTELLIGENCE"},
                    {"metric": "SORT_METRIC_PRICE"}
                  ]
                }
              },
              "weight": 100
            }
          ]
        },
        "condition": {
          "cel_expression": "complexity == \"complex\""
        }
      }
    ],
    "defaultRoute": {
      "route_id": "simple-queries",
      "variants": [
        {
          "variant": {
            "variant_id": "budget",
            "model_selection": {
              "models": [
                "groq/llama-3.1-8b-instant",
                "google-ai-studio/gemini-2.5-flash"
              ],
              "sort": [
                {"metric": "SORT_METRIC_PRICE"},
                {"metric": "SORT_METRIC_LATENCY"}
              ]
            }
          },
          "weight": 100
        }
      ]
    }
  }'
Routes are evaluated in order. Complex queries match the first route. Everything else (simple queries, or requests without complexity metadata) falls through to defaultRoute — the cheap models. This means if your app forgets to set complexity, the request defaults to the budget tier, which is the safe choice for cost optimization.

Step 2: Classify and Route

Your application determines complexity and passes it in metadata:
# Simple query → routes to defaultRoute (budget models)
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "inworld/cost-optimizer",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "extra_body": {
      "metadata": {
        "complexity": "simple"
      }
    }
  }'
# Complex query → routes to premium models
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "inworld/cost-optimizer",
    "messages": [
      {"role": "user", "content": "Analyze this legal contract and identify all potential liability clauses, then provide a risk assessment with recommendations."}
    ],
    "extra_body": {
      "metadata": {
        "complexity": "complex"
      }
    }
  }'

Step 3: Classify in Your App

Here’s a simple classification approach for your backend:
def classify_complexity(user_message: str) -> str:
    """Simple heuristic: short messages and common patterns are 'simple'."""
    simple_patterns = ["hello", "hi", "hey", "thanks", "summarize", "what is", "define"]
    message_lower = user_message.lower().strip()
    
    if len(user_message) < 50:
        return "simple"
    if any(message_lower.startswith(p) for p in simple_patterns):
        return "simple"
    return "complex"

# Use it when making requests
complexity = classify_complexity(user_message)
response = client.chat.completions.create(
    model="inworld/cost-optimizer",
    messages=[{"role": "user", "content": user_message}],
    extra_body={
        "metadata": {"complexity": complexity}
    }
)

Cost Savings Example

Consider a typical workload where 70% of queries are simple:
  • Simple queries (70%): 1M tokens/month
  • Complex queries (30%): 500K tokens/month
Without routing:
  • All queries use GPT-5: 1.5M tokens × 5.00=5.00 = **7,500/month**
With cost-optimized routing:
  • Simple queries use Llama 3 8B: 1M tokens × 0.05=0.05 = 50
  • Complex queries use GPT-5: 500K tokens × 5.00=5.00 = 2,500
  • Total: $2,550/month
Savings: 66% reduction ($4,950/month saved)

Best Practices

  1. Start simple: Use the Quick Start approach (per-request sort) before building a router
  2. Default to cheap: Make defaultRoute the budget tier so unclassified queries don’t waste money
  3. Monitor quality: Track response quality per tier to ensure budget models meet your minimum bar
  4. Refine classification: Start with simple heuristics, then improve with actual usage data
  5. Add more tiers: For production, consider three tiers — budget, standard, premium — using multiple CEL conditions

Next Steps