Never-Down Failover System

Overview

API outages are the nightmare of “AI-first” products. A Router acts as a high-availability load balancer, automatically routing requests to backup providers when your primary provider experiences issues.

The Problem

When your application depends on a single AI provider, any outage becomes your outage:

429 Rate Limit errors → Your application stops working
5xx Server Errors → Users see failures
Provider downtime → Complete service disruption

For AI-first products, this means lost revenue, frustrated users, and damaged reputation.

The Solution

A failover router automatically detects failures and instantly re-routes requests to backup providers. Your application stays online even if a major AI provider goes dark, providing 99.9% uptime for your AI features.

How It Works

Request arrives at your Inworld Router endpoint
Inworld Router attempts to call Provider A (e.g., OpenAI)
If Provider A fails (429, 5xx, timeout), router automatically tries Provider B
If Provider B fails, router tries Provider C
Response is returned from the first available provider

Implementation

Step 1: Create a Failover Router

Create a router with a primary model and fallbacks. Since failover applies to all requests unconditionally, use defaultRoute directly — no conditional routes or CEL expressions needed:

curl --request POST \
  --url https://api.inworld.ai/router/v1/routers \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "routers/failover-system",
    "defaults": {
      "text_generation_config": {
        "max_new_tokens": 1024,
        "temperature": 0.7
      }
    },
    "defaultRoute": {
      "route_id": "failover",
      "variants": [
        {
          "variant": {
            "variant_id": "primary-with-fallbacks",
            "model_id": "openai/gpt-5",
            "model_selection": {
              "models": [
                "anthropic/claude-opus-4-6",
                "google-ai-studio/gemini-2.5-flash"
              ]
            }
          },
          "weight": 100
        }
      ]
    }
  }'

The failover router uses defaultRoute with model_selection to specify fallback models. Since failover applies unconditionally to all requests, there is no need for conditional routes or CEL expressions — defaultRoute is the right choice. If the primary model (openai/gpt-5) fails, Inworld Router automatically tries the fallback models (anthropic/claude-opus-4-6, then google-ai-studio/gemini-2.5-flash) in order.

Step 2: Configure Automatic Failover

When using a specific model with fallbacks, Inworld Router automatically handles failover:

curl --location 'https://api.inworld.ai/v1/chat/completions' \
--header 'Authorization: Bearer <your-api-key>' \
--header 'Content-Type: application/json' \
--data '{
  "model": "openai/gpt-5",
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "extra_body": {
    "models": [
      "anthropic/claude-opus-4-6",
      "google-ai-studio/gemini-2.5-flash"
    ]
  }
}'

If OpenAI returns a 429 or 5xx error, Inworld Router automatically retries with Claude, then Gemini if needed.

Step 3: Use Router-Based Failover

For more control, use your failover router:

curl --location 'https://api.inworld.ai/v1/chat/completions' \
--header 'Authorization: Bearer <your-api-key>' \
--header 'Content-Type: application/json' \
--data '{
  "model": "inworld/failover-system",
  "messages": [
    {"role": "user", "content": "Explain quantum computing in simple terms"}
  ]
}'

Failover Scenarios

Scenario 1: Rate Limit (429)

When you hit rate limits, Inworld Router automatically routes to the next provider:

Request → OpenAI (429 Rate Limit) → Anthropic (Success) ✅

Scenario 2: Server Error (5xx)

When a provider experiences server errors, failover kicks in:

Request → OpenAI (503 Service Unavailable) → Google (Success) ✅

Scenario 3: Timeout

If a provider doesn’t respond in time, Inworld Router moves to the next:

Request → OpenAI (Timeout) → Anthropic (Success) ✅

Scenario 4: Complete Provider Outage

Even if an entire provider goes down, your application continues working:

Request → OpenAI (Complete Outage) → Anthropic (Success) ✅

Multi-Provider Architecture

For maximum resilience, configure failover across 3+ providers:

{
  "defaultRoute": {
    "route_id": "failover",
    "variants": [
      {
        "variant": {
          "variant_id": "primary-with-fallbacks",
          "model_id": "openai/gpt-5",
          "model_selection": {
            "models": [
              "anthropic/claude-opus-4-6",
              "google-ai-studio/gemini-2.5-flash",
              "groq/llama-3.3-70b-versatile"
            ]
          }
        },
        "weight": 100
      }
    ]
  }
}

Monitoring Failover Events

Track failover events in the response metadata:

{
  "id": "chatcmpl-...",
  "model": "anthropic/claude-opus-4-6",
  "choices": [...],
  "metadata": {
    "attempts": [
      {
        "provider": "openai",
        "model": "gpt-5",
        "status": "failed",
        "error": "429 Rate Limit"
      },
      {
        "provider": "anthropic",
        "model": "claude-opus-4-6",
        "status": "success"
      }
    ]
  }
}

Best Practices

Diversify providers: Don’t rely on providers from the same infrastructure (e.g., both using AWS)
Monitor failover rates: High failover rates may indicate you need to adjust rate limits or add capacity
Test regularly: Periodically test failover scenarios to ensure they work as expected
Set timeouts: Configure appropriate timeout values to avoid long waits before failover
Log everything: Track all failover events for debugging and optimization

Cost Considerations

Failover routing can help with cost optimization:

Primary: Use your preferred model (e.g., GPT-5)
Fallback: Use cost-effective alternatives (e.g., Gemini 2.5 Flash, Claude 3.5 Haiku)
Emergency: Use self-hosted models for critical paths

Advanced: Health Checks

For production systems, implement health checks to proactively route away from unhealthy providers:

{
  "defaultRoute": {
    "route_id": "failover-with-health",
    "variants": [
      {
        "variant": {
          "variant_id": "primary-openai",
          "model_id": "openai/gpt-5",
          "model_selection": {
            "models": [
              "anthropic/claude-opus-4-6",
              "google-ai-studio/gemini-2.5-flash"
            ]
          }
        },
        "weight": 100
      }
    ]
  }
}

Health checks are automatically handled by Inworld Router. The router continuously monitors provider health and automatically routes away from unhealthy providers. You don’t need to explicitly configure health checks in the router configuration.

Next Steps

Learn about cost optimization to reduce API bills
Explore specialist routing for domain-specific models
Review router management APIs for advanced configurations

Getting Started

Core Concepts

Migration

Resources

Overview

The Problem

The Solution

How It Works

Implementation

Step 1: Create a Failover Router

Step 2: Configure Automatic Failover

Step 3: Use Router-Based Failover

Failover Scenarios

Scenario 1: Rate Limit (429)

Scenario 2: Server Error (5xx)

Scenario 3: Timeout

Scenario 4: Complete Provider Outage

Multi-Provider Architecture

Monitoring Failover Events

Best Practices

Cost Considerations

Advanced: Health Checks

Next Steps

Getting Started

Core Concepts

Migration

Resources

Documentation Index

​Overview

​The Problem

​The Solution

​How It Works

​Implementation

​Step 1: Create a Failover Router

​Step 2: Configure Automatic Failover

​Step 3: Use Router-Based Failover

​Failover Scenarios

​Scenario 1: Rate Limit (429)

​Scenario 2: Server Error (5xx)

​Scenario 3: Timeout

​Scenario 4: Complete Provider Outage

​Multi-Provider Architecture

​Monitoring Failover Events

​Best Practices

​Cost Considerations

​Advanced: Health Checks

​Next Steps

Overview

The Problem

The Solution

How It Works

Implementation

Step 1: Create a Failover Router

Step 2: Configure Automatic Failover

Step 3: Use Router-Based Failover

Failover Scenarios

Scenario 1: Rate Limit (429)

Scenario 2: Server Error (5xx)

Scenario 3: Timeout

Scenario 4: Complete Provider Outage

Multi-Provider Architecture

Monitoring Failover Events

Best Practices

Cost Considerations

Advanced: Health Checks

Next Steps