smart-router

Name: smart-router
Rating: 3.5 (1 reviews)
Author: c0nspic0us7urk3r
By c0nspic0us7urk3r 👁 4 views ▲ 0 votes
Expertise-aware model router with semantic domain scoring
GitHub
---
name: smart-router
description: >
  Expertise-aware model router with semantic domain scoring, context-overflow protection,
  and security redaction. Automatically selects the optimal AI model using weighted
  expertise scoring (Feb 2026 benchmarks). Supports Claude, GPT, Gemini, Grok with
  automatic fallback chains, HITL gates, and cost optimization.
author: c0nSpIc0uS7uRk3r
version: 2.1.0
license: MIT
metadata:
  openclaw:
    requires:
      bins: ["python3"]
      env: ["ANTHROPIC_API_KEY"]
    optional_env: ["GOOGLE_API_KEY", "OPENAI_API_KEY", "XAI_API_KEY"]
  features:
    - Semantic domain detection
    - Expertise-weighted scoring (0-100)
    - Risk-based mandatory routing
    - Context overflow protection (>150K → Gemini)
    - Security credential redaction
    - Circuit breaker with persistent state
    - HITL gate for low-confidence routing
  benchmarks:
    source: "Feb 2026 MLOC Analysis"
    models:
      - "Claude Opus 4.5: SWE-bench 80.9%"
      - "GPT-5.2: AIME 100%, Control Flow 22 errors/MLOC"
      - "Gemini 3 Pro: Concurrency 69 issues/MLOC"
---

# A.I. Smart-Router

Intelligently route requests to the optimal AI model using tiered classification with automatic fallback handling and cost optimization.

## How It Works (Silent by Default)

The router operates transparently—users send messages normally and get responses from the best model for their task. No special commands needed.

**Optional visibility**: Include `[show routing]` in any message to see the routing decision.

## Tiered Classification System

The router uses a three-tier decision process:

```
┌─────────────────────────────────────────────────────────────────┐
│                    TIER 1: INTENT DETECTION                      │
│  Classify the primary purpose of the request                     │
├─────────────────────────────────────────────────────────────────┤
│  CODE        │ ANALYSIS    │ CREATIVE   │ REALTIME  │ GENERAL   │
│  write/debug │ research    │ writing    │ news/live │ Q&A/chat  │
│  refactor    │ explain     │ stories    │ X/Twitter │ translate │
│  review      │ compare     │ brainstorm │ prices    │ summarize │
└──────┬───────┴──────┬──────┴─────┬──────┴─────┬─────┴─────┬─────┘
       │              │            │            │           │
       ▼              ▼            ▼            ▼           ▼
┌─────────────────────────────────────────────────────────────────┐
│                  TIER 2: COMPLEXITY ESTIMATION                   │
├─────────────────────────────────────────────────────────────────┤
│  SIMPLE (Tier $)        │ MEDIUM (Tier $$)    │ COMPLEX (Tier $$$)│
│  • One-step task        │ • Multi-step task   │ • Deep reasoning  │
│  • Short response OK    │ • Some nuance       │ • Extensive output│
│  • Factual lookup       │ • Moderate context  │ • Critical task   │
│  → Haiku/Flash          │ → Sonnet/Grok/GPT   │ → Opus/GPT-5      │
└──────────────────────────┴─────────────────────┴───────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────┐
│                TIER 3: SPECIAL CASE OVERRIDES                    │
├─────────────────────────────────────────────────────────────────┤
│  CONDITION                           │ OVERRIDE TO              │
│  ─────────────────────────────────────┼─────────────────────────│
│  Context >100K tokens                │ → Gemini Pro (1M ctx)    │
│  Context >500K tokens                │ → Gemini Pro ONLY        │
│  Needs real-time data                │ → Grok (regardless)      │
│  Image/vision input                  │ → Opus or Gemini Pro     │
│  User explicit override              │ → Requested model        │
└──────────────────────────────────────┴──────────────────────────┘
```

## Intent Detection Patterns

### CODE Intent
- Keywords: write, code, debug, fix, refactor, implement, function, class, script, API, bug, error, compile, test, PR, commit
- File extensions mentioned: .py, .js, .ts, .go, .rs, .java, etc.
- Code blocks in input

### ANALYSIS Intent  
- Keywords: analyze, explain, compare, research, understand, why, how does, evaluate, assess, review, investigate, examine
- Long-form questions
- "Help me understand..."

### CREATIVE Intent
- Keywords: write (story/poem/essay), create, brainstorm, imagine, design, draft, compose
- Fiction/narrative requests
- Marketing/copy requests

### REALTIME Intent
- Keywords: now, today, current, latest, trending, news, happening, live, price, score, weather
- X/Twitter mentions
- Stock/crypto tickers
- Sports scores

### GENERAL Intent (Default)
- Simple Q&A
- Translations
- Summaries
- Conversational

### MIXED Intent (Multiple Intents Detected)
When a request contains multiple clear intents (e.g., "Write code to analyze this data and explain it creatively"):

1. **Identify primary intent** — What's the main deliverable?
2. **Route to highest-capability model** — Mixed tasks need versatility
3. **Default to COMPLEX complexity** — Multi-intent = multi-step

**Examples:**
- "Write code AND explain how it works" → CODE (primary) + ANALYSIS → Route to Opus
- "Summarize this AND what's the latest news on it" → REALTIME takes precedence → Grok
- "Creative story using real current events" → REALTIME + CREATIVE → Grok (real-time wins)

## Language Handling

**Non-English requests** are handled normally — all supported models have multilingual capabilities:

| Model | Non-English Support |
|-------|---------------------|
| Opus/Sonnet/Haiku | Excellent (100+ languages) |
| GPT-5 | Excellent (100+ languages) |
| Gemini Pro/Flash | Excellent (100+ languages) |
| Grok | Good (major languages) |

**Intent detection still works** because:
- Keyword patterns include common non-English equivalents
- Code intent detected by file extensions, code blocks (language-agnostic)
- Complexity estimated by query length (works across languages)

**Edge case:** If intent unclear due to language, default to GENERAL intent with MEDIUM complexity.

## Complexity Signals

### Simple Complexity ($)
- Short query (<50 words)
- Single question mark
- "Quick question", "Just tell me", "Briefly"
- Yes/no format
- Unit conversions, definitions

### Medium Complexity ($$)
- Moderate query (50-200 words)
- Multiple aspects to address
- "Explain", "Describe", "Compare"
- Some context provided

### Complex Complexity ($$$)
- Long query (>200 words) or complex task
- "Step by step", "Thoroughly", "In detail"
- Multi-part questions
- Critical/important qualifier
- Research, analysis, or creative work

## Routing Matrix

| Intent | Simple | Medium | Complex |
|--------|--------|--------|---------|
| **CODE** | Sonnet | Opus | Opus |
| **ANALYSIS** | Flash | GPT-5 | Opus |
| **CREATIVE** | Sonnet | Opus | Opus |
| **REALTIME** | Grok | Grok | Grok-3 |
| **GENERAL** | Flash | Sonnet | Opus |

## Token Exhaustion & Automatic Model Switching

When a model becomes unavailable mid-session (token quota exhausted, rate limit hit, API error), the router automatically switches to the next best available model and **notifies the user**.

### Notification Format

When a model switch occurs due to exhaustion, the user receives a notification:

```
┌─────────────────────────────────────────────────────────────────┐
│  ⚠️ MODEL SWITCH NOTICE                                         │
│                                                                  │
│  Your request could not be completed on claude-opus-4-5         │
│  (reason: token quota exhausted).                               │
│                                                                  │
│  ✅ Request completed using: anthropic/claude-sonnet-4-5        │
│                                                                  │
│  The response below was generated by the fallback model.        │
└─────────────────────────────────────────────────────────────────┘
```

### Switch Reasons

| Reason | Description |
|--------|-------------|
| `token quota exhausted` | Daily/monthly token limit reached |
| `rate limit exceeded` | Too many requests per minute |
| `context window exceeded` | Input too large for model |
| `API timeout` | Model took too long to respond |
| `API error` | Provider returned an error |
| `model unavailable` | Model temporarily offline |

### Implementation

```python
def execute_with_fallback(primary_model: str, fallback_chain: list[str], request: str) -> Response:
    """
    Execute request with automatic fallback and user notification.
    """
    attempted_models = []
    switch_reason = None
    
    # Try primary model first
    models_to_try = [primary_model] + fallback_chain
    
    for model in models_to_try:
        try:
            response = call_model(model, request)
            
            # If we switched models, prepend notification
            if attempted_models:
                notification = build_switch_notification(
                    failed_model=attempted_models[0],
                    reason=switch_reason,
                    success_model=model
                )
                return Response(
                    content=notification + "\n\n---\n\n" + response.content,
                    model_used=model,
                    switched=True
                )
            
            return Response(content=response.content, model_used=model, switched=False)
            
        except TokenQuotaExhausted:
            attempted_models.append(model)
            switch_reason = "token quota exhausted"
            log_fallback(model, switch_reason)
            continue
            
        except RateLimitExceeded:
            attempted_models.append(model)
            switch_reason = "rate limit exceeded"
            log_fallback(model, switch_reason)
            continue
            
        except ContextWindowExceeded:
            attempted_models.append(model)
            switch_reason = "context window exceeded"
            log_fallback(model, switch_reason)
 

... (truncated)
browser