Use this file to discover all available pages before exploring further.
The Chat Completions API supports routing directly at the request level, without needing to create a router. You can use it to call a specific model, add fallbacks, or let the engine auto-select the best model, all in a single request.Request-level routing can be a good fit if you:
Want to call a specific model through a unified API without setting up a router
Are prototyping or benchmarking before committing to a router configuration
For more advanced use cases like conditional routing, A/B testing with weighted variants, or shared prompt templates, we recommend setting up a router.
In this example, the router tries gpt-5.2 first, then Claude Opus, then Gemini Pro. You can inspect which models were attempted in the metadata.attempts array of the response.
You can set a time-to-first-token (TTFT) timeout to trigger fallback based on latency. If the current model does not return the first token within the specified threshold, the router cancels the request and tries the next model in the chain.This is useful when your application has strict latency requirements and you’d rather try an alternative model than wait for a slow response.Add a fallback object with a ttft_timeout field under extra_body in your request (that is, extra_body.fallback.ttft_timeout):
This selects the cheapest available model. Available sort criteria: price, latency, throughput, intelligence, math, coding.You can combine multiple criteria — models are ranked by the first criterion, with subsequent criteria used as tiebreakers: