- Want to call a specific model through a unified API without setting up a router
- Are prototyping or benchmarking before committing to a router configuration
Direct model call
Specify a model directly by itsprovider/model identifier:
Fallbacks
Add fallback models viaextra_body.models. If the primary model fails, the router automatically tries the next model in the list:
metadata.attempts array of the response.
Fallback by first token timeout
You can set a time-to-first-token (TTFT) timeout to trigger fallback based on latency. If the current model does not return the first token within the specified threshold, the router cancels the request and tries the next model in the chain. This is useful when your application has strict latency requirements and you’d rather try an alternative model than wait for a slow response. Add afallback object with a ttft_timeout field under extra_body in your request (that is, extra_body.fallback.ttft_timeout):
ttft_timeout value is a duration string (e.g., "300ms", "1s", "1.5s"). The minimum allowed value is 300ms.
Auto model selection
Setmodel to auto and provide sorting criteria via extra_body.sort to let the router pick the best model automatically:
price, latency, throughput, intelligence, math, coding.
You can combine multiple criteria — models are ranked by the first criterion, with subsequent criteria used as tiebreakers:
Filtering models
Useextra_body.models to restrict the candidate pool, or extra_body.ignore to exclude specific models or entire providers: