Request-Level Routing

The Chat Completions API supports routing directly at the request level, without needing to create a router. You can use it to call a specific model, add fallbacks, or let the engine auto-select the best model, all in a single request. Request-level routing can be a good fit if you:

Want to call a specific model through a unified API without setting up a router
Are prototyping or benchmarking before committing to a router configuration

For more advanced use cases like conditional routing, A/B testing with weighted variants, or shared prompt templates, we recommend setting up a router.

Direct model call

Specify a model directly by its provider/model identifier:

curl -X POST https://api.inworld.ai/v1/chat/completions \
  -H 'Authorization: Bearer <your-api-key>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "openai/gpt-5.2",
    "messages": [{ "role": "user", "content": "Hello!" }]
  }'

This sends the request to the specified model with no routing logic. You still benefit from Inworld Router’s unified API.

Fallbacks

Add fallback models via extra_body.models. If the primary model fails, the router automatically tries the next model in the list:

curl -X POST https://api.inworld.ai/v1/chat/completions \
  -H 'Authorization: Bearer <your-api-key>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "openai/gpt-5.2",
    "messages": [{ "role": "user", "content": "Hello!" }],
    "extra_body": {
      "models": ["anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"]
    }
  }'

In this example, the router tries gpt-5.2 first, then Claude Opus, then Gemini Pro. You can inspect which models were attempted in the metadata.attempts array of the response.

Fallback by first token timeout

You can set a time-to-first-token (TTFT) timeout to trigger fallback based on latency. If the current model does not return the first token within the specified threshold, the router cancels the request and tries the next model in the chain. This is useful when your application has strict latency requirements and you’d rather try an alternative model than wait for a slow response. Add a fallback object with a ttft_timeout field under extra_body in your request (that is, extra_body.fallback.ttft_timeout):

curl -X POST https://api.inworld.ai/v1/chat/completions \
  -H 'Authorization: Bearer <your-api-key>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "openai/gpt-5.2",
    "messages": [{ "role": "user", "content": "Hello" }],
    "extra_body": {
      "models": ["openai/gpt-4o", "google-ai-studio/gemini-2.5-pro"],
      "fallback": {
        "ttft_timeout": "900ms"
      }
    }
  }'

The ttft_timeout value is a duration string (e.g., "300ms", "1s", "1.5s"). The minimum allowed value is 300ms.

Auto model selection

Set model to auto and provide sorting criteria via extra_body.sort to let the router pick the best model automatically:

curl -X POST https://api.inworld.ai/v1/chat/completions \
  -H 'Authorization: Bearer <your-api-key>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "messages": [{ "role": "user", "content": "Hello!" }],
    "extra_body": {
      "sort": ["price"]
    }
  }'

This selects the cheapest available model. Available sort criteria: price, latency, throughput, intelligence, math, coding. You can combine multiple criteria — models are ranked by the first criterion, with subsequent criteria used as tiebreakers:

curl -X POST https://api.inworld.ai/v1/chat/completions \
  -H 'Authorization: Bearer <your-api-key>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "messages": [{ "role": "user", "content": "Hello!" }],
    "extra_body": {
      "sort": ["price", "latency"]
    }
  }'

This picks the cheapest model, using latency as a tiebreaker.

Filtering models

Use extra_body.models to restrict the candidate pool, or extra_body.ignore to exclude specific models or entire providers:

{
  "model": "auto",
  "messages": [{ "role": "user", "content": "Hello!" }],
  "extra_body": {
    "models": ["openai/gpt-5.2", "anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"],
    "sort": ["latency"]
  }
}

Getting Started

Core Concepts

Migration

Resources

Direct model call

Fallbacks

Fallback by first token timeout

Auto model selection

Filtering models

Getting Started

Core Concepts

Migration

Resources

Documentation Index

​Direct model call

​Fallbacks

​Fallback by first token timeout

​Auto model selection

​Filtering models

Direct model call

Fallbacks

Fallback by first token timeout

Auto model selection

Filtering models