Mode Failover

Name: Mode Failover
Rating: 3.5 (1 reviews)
Author: imysm

By imysm 👁 55 views ▲ 0 votes

OpenClaw plugin for flexible model selection with random, weighted, and smart failover modes

GitHub

README

# mode-failover

> OpenClaw plugin for flexible model selection with random, weighted, and smart failover modes

**Version**: 1.0.0
**Plugin ID**: `mode-failover`
**License**: MIT

## Features

- ✅ **Four Selection Modes**
  - `random` - Randomly select from model pool
  - `round-robin` - Cycle through models in order
  - `weighted` - Select based on configured weights
  - `smart` - Select based on health status and performance

- ✅ **Timeout Detection** - Automatically detect and avoid slow models (30s default)
- ✅ **Session Stickiness** - Keep using the same model within a session
- ✅ **Auto Failover** - Automatically switch to healthy models on failures
- ✅ **Statistics Collection** - Track usage, success rates, and latency
- ✅ **Agent-Level Config** - Different settings for different agents

## Quick Start

### Installation

The plugin is included with OpenClaw. Enable it in `~/.openclaw/openclaw.json`:

```json5
{
  plugins: {
    entries: {
      "mode-failover": {
        enabled: true,
        config: {
          mode: "smart",
          models: [
            { ref: "zai/glm-5", weight: 50 },
            { ref: "zai/glm-4.7", weight: 50 }
          ]
        }
      }
    }
  }
}
```

**Then restart gateway**:
```bash
openclaw gateway restart
```

**Verify**:
```bash
openclaw failover status
```

### Minimal Configuration

```json5
{
  enabled: true,
  mode: "weighted",
  models: [
    { ref: "zai/glm-5", weight: 70 },
    { ref: "zai/glm-4.7", weight: 30 }
  ]
}
```

### Recommended Configuration (High Availability)

```json5
{
  mode: "smart",
  models: [
    { ref: "zai/glm-5", weight: 50 },
    { ref: "zai/glm-4.7", weight: 50 }
  ],
  stickiness: {
    enabled: false  // ✅ Allow fast failover
  },
  failover: {
    enabled: true,
    errorThreshold: 1,  // ✅ Failover on first error
    timeoutMs: 30000,   // ✅ 30 second timeout
    cooldownMinutes: 5  // ✅ Quick recovery
  }
}
```

## Timeout Detection

The plugin includes automatic timeout detection to quickly identify and avoid slow or unresponsive models:

### Configuration

```json5
{
  failover: {
    enabled: true,
    errorThreshold: 1,
    timeoutMs: 30000  // 30 seconds (default)
  }
}
```

### How It Works

1. **Request Monitoring**: Every request is monitored for latency
2. **Timeout Detection**: If a request exceeds `timeoutMs`, it's treated as a failure
3. **Immediate Failover**: Timeout triggers immediate model failover (bypasses error threshold)
4. **Health Impact**: Timeout events immediately mark the model as unhealthy

### Example

```json5
{
  failover: {
    timeoutMs: 30000,  // 30 seconds
    errorThreshold: 1,  // Failover on first error
    cooldownMinutes: 5  // Wait 5 minutes before retrying
  }
}
```

## Smart Mode Optimizations

The `smart` mode includes aggressive failover logic for high-availability scenarios:

### Features

- **Dynamic Weight Adjustment**: Models are weighted based on health, latency, and error rates
- **Aggressive Error Penalty**: Models with errors get significantly reduced weight
- **Timeout Detection**: Slow requests immediately reduce model weight
- **Fast Failover**: No waiting for multiple errors - failover happens immediately

### Error Rate Penalties

| Error Rate | Weight Multiplier |
|------------|------------------|
| 0% errors | 1.0 (full weight) |
| 10% errors | 0.7 (30% reduction) |
| 20% errors | 0.4 (60% reduction) |
| 30%+ errors | 0.1 (90% reduction) |

### Recommended Configuration

```json5
{
  mode: "smart",
  failover: {
    enabled: true,
    errorThreshold: 1,  // Fast failover
    timeoutMs: 30000,   // 30 second timeout
    cooldownMinutes: 5  // Quick recovery
  },
  stickiness: {
    enabled: false  // Allow smart mode to work optimally
  }
}
```

## Configuration

### Agent-Level Configuration

You can configure the plugin globally and override settings for specific agents:

```json5
{
  plugins: {
    entries: {
      "mode-failover": {
        enabled: true,
        config: {
          // Global default configuration
          mode: "weighted",
          models: [
            { ref: "zai/glm-5", weight: 50 },
            { ref: "anthropic/claude", weight: 50 }
          ],
          stickiness: { enabled: true, ttlMinutes: 60 }
        },
        agents: {
          // Agent-specific overrides
          "main": {
            enabled: true,
            config: {
              mode: "smart",  // Override mode for main agent
              models: [
                { ref: "zai/glm-5", weight: 80 },
                { ref: "anthropic/claude", weight: 20 }
              ]
            }
          },
          "product": {
            enabled: false  // Disable plugin for product agent
          }
        }
      }
    }
  }
}
```

#### Configuration Merge Logic

1. **No agentId** → Use global config
2. **agentId not in agents config** → Use global config
3. **agentId with `enabled: false`** → Plugin disabled for this agent
4. **agentId with `enabled: true`** → Deep merge agent config with global config

**Example**: Agent config overrides specific fields while inheriting others:
```json5
// Global config
{
  mode: "weighted",
  models: [...],
  stickiness: { enabled: true, ttlMinutes: 60 },
  failover: { enabled: true }
}

// Agent config (only overrides mode)
{
  mode: "smart"
}

// Result for this agent
{
  mode: "smart",  // From agent config
  models: [...],  // From global config
  stickiness: { enabled: true, ttlMinutes: 60 },  // From global config
  failover: { enabled: true }  // From global config
}
```

### Basic Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | boolean | true | Enable/disable the plugin |
| `mode` | string | "weighted" | Selection mode (random, round-robin, weighted, smart) |
| `models` | array | required | List of models in the pool |

### Model Configuration

```json5
{
  models: [
    {
      ref: "zai/glm-5",    // Model reference: provider/model
      weight: 70,              // Selection weight (0-100)
      enabled: true            // Whether model is available
    }
  ]
}
```

### Session Stickiness

```json5
{
  stickiness: {
    enabled: true,           // Keep same model per session
    ttlMinutes: 60,          // Stickiness expiration
    maxSessionModels: 1      // Max models per session
  }
}
```

### Failover Settings

```json5
{
  failover: {
    enabled: true,           // Enable auto-failover
    errorThreshold: 3,       // Errors before marking unhealthy
    errorWindowMinutes: 5,   // Error counting window
    cooldownMinutes: 30,     // Cooldown for unhealthy models
    recoveryProbeInterval: 5 // Recovery check interval
  }
}
```

### Statistics Settings

```json5
{
  stats: {
    enabled: true,           // Enable stats collection
    persistInterval: 60000,  // Persist interval (ms)
    maxHistoryHours: 24      // History retention
  }
}
```

## CLI Commands

```bash
# View status
openclaw failover status

# View statistics
openclaw failover stats
openclaw failover stats --json

# Change mode
openclaw failover mode random
openclaw failover mode weighted
openclaw failover mode smart

# Manage models
openclaw failover add zai/glm-5 --weight 70
openclaw failover remove zai/glm-5

# Reset statistics
openclaw failover reset-stats

# Enable/disable
openclaw failover enable
openclaw failover disable
```

## Selection Modes

### Random Mode

Randomly selects a model from the pool. Useful for:
- Distributing API quota usage
- A/B testing models
- Load balancing

### Round-Robin Mode

Cycles through models in order. Useful for:
- Even distribution across models
- Predictable model rotation

### Weighted Mode

Selects models based on configured weights. Useful for:
- Cost optimization (use cheaper models more)
- Quality/cost tradeoffs
- Gradual rollout of new models

Example: 70% GPT-4, 30% Claude
```json5
models: [
  { ref: "zai/glm-5", weight: 70 },
  { ref: "anthropic/claude", weight: 30 }
]
```

### Smart Mode

Selects models based on health and performance. Features:
- Monitors response time and success rate
- Automatically reduces weight of degraded models
- Avoids unhealthy models during cooldown
- Prefers faster models

## Health Status

Models can have three health states:

| Status | Description |
|--------|-------------|
| `healthy` | Normal operation |
| `degraded` | High error rate (>20%), reduced weight |
| `unhealthy` | Too many errors, in cooldown |

## Example Configurations

### Cost Optimization

Use GPT-4 for 30% of requests, Claude for 70%:

```json5
{
  mode: "weighted",
  models: [
    { ref: "zai/glm-5", weight: 30 },
    { ref: "anthropic/claude", weight: 70 }
  ]
}
```

### High Availability

Primary model with fallbacks:

```json5
{
  mode: "smart",
  models: [
    { ref: "zai/glm-5", weight: 100 },
    { ref: "anthropic/claude", weight: 100 },
    { ref: "google/gemini-pro", weight: 100 }
  ],
  failover: {
    enabled: true,
    errorThreshold: 2,
    cooldownMinutes: 15
  }
}
```

### Load Balancing

Distribute evenly across providers:

```json5
{
  mode: "round-robin",
  models: [
    { ref: "zai/glm-5", weight: 100 },
    { ref: "anthropic/claude", weight: 100 },
    { ref: "google/gemini-pro", weight: 100 }
  ]
}
```

## State Persistence

The plugin persists state to `~/.openclaw/mode-failover/state.json`:
- Current selection mode
- Round-robin index
- Session bindings
- Model health status

State is restored on gateway restart.

## Troubleshooting

### Plugin Not Working

**Symptom**: Plugin enabled but not selecting models

**Solutions**:
1. Check if plugin is enabled:
   ```bash
   openclaw failover status
   ```

2. Verify configuration syntax:
   ```bash
   cat ~/.openclaw/openclaw.json | jq '.plugins.entries."mode-failover"'
   ```

3. Check gateway logs:
   ```bash
   journalctl --user -u openclaw-gateway --since "5 minutes ago" | grep failover
   ```

### Models Not Switching

**Symptom**: All requests use the same model

**Causes**:
1. 

... (truncated)

tools