Skip to main content
You can specify a model without a provider prefix (e.g., gpt-oss-120b instead of groq/gpt-oss-120b), and the API will automatically select a provider for you. Optionally, use the model_selection.provider field in your router config to control how providers are selected. By default, the provider with the lowest latency is selected, and if it fails, the next best provider is tried automatically.
To see which models are available from multiple providers, use the List Models endpoint.

Provider configuration

FieldTypeDefaultDescription
orderstring[]Explicit list of providers to try, in order. When specified, providers are tried in this exact order.
allow_fallbacksbooleantrueWhether to fall back to the next provider if the first one fails.

How provider selection works

When no provider.order is specified, the sort criteria determines the order providers are tried. If no sort is specified either, providers are ordered by latency (fastest first). When provider.order is specified, providers are tried in the exact order listed — sort does not apply to the provider order (but still applies to models fallbacks if specified). The ignore field applies to providers regardless of whether order is specified.

Examples

// Automatically selects the lowest-latency provider for gpt-oss-120b
// Falls back to next-best provider if it fails
{
  "variant_id": "auto-provider",
  "model_id": "gpt-oss-120b"
}

Execution order

When using provider routing with model fallbacks, the full execution order is:
  1. Try providers for the primary model — in provider.order order (if specified) or sorted by sort criteria (default: latency)
  2. If all providers fail and models is specified — fall back to the models list, sorted by sort criteria
  3. If all models fail — return an error

Use Cases

  • Reliability: Ensure your application continues working even if a specific provider is down
  • Cost Optimization: Route to cheaper providers or fall back to cheaper models
  • Performance: Prefer low-latency providers, or fall back to faster models for time-sensitive requests
  • Provider Control: Lock to specific providers for compliance or consistency