Browser
smart-router
Expertise-aware model router with semantic domain scoring
---
name: smart-router
description: >
Expertise-aware model router with semantic domain scoring, context-overflow protection,
and security redaction. Automatically selects the optimal AI model using weighted
expertise scoring (Feb 2026 benchmarks). Supports Claude, GPT, Gemini, Grok with
automatic fallback chains, HITL gates, and cost optimization.
author: c0nSpIc0uS7uRk3r
version: 2.1.0
license: MIT
metadata:
openclaw:
requires:
bins: ["python3"]
env: ["ANTHROPIC_API_KEY"]
optional_env: ["GOOGLE_API_KEY", "OPENAI_API_KEY", "XAI_API_KEY"]
features:
- Semantic domain detection
- Expertise-weighted scoring (0-100)
- Risk-based mandatory routing
- Context overflow protection (>150K β Gemini)
- Security credential redaction
- Circuit breaker with persistent state
- HITL gate for low-confidence routing
benchmarks:
source: "Feb 2026 MLOC Analysis"
models:
- "Claude Opus 4.5: SWE-bench 80.9%"
- "GPT-5.2: AIME 100%, Control Flow 22 errors/MLOC"
- "Gemini 3 Pro: Concurrency 69 issues/MLOC"
---
# A.I. Smart-Router
Intelligently route requests to the optimal AI model using tiered classification with automatic fallback handling and cost optimization.
## How It Works (Silent by Default)
The router operates transparentlyβusers send messages normally and get responses from the best model for their task. No special commands needed.
**Optional visibility**: Include `[show routing]` in any message to see the routing decision.
## Tiered Classification System
The router uses a three-tier decision process:
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 1: INTENT DETECTION β
β Classify the primary purpose of the request β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β CODE β ANALYSIS β CREATIVE β REALTIME β GENERAL β
β write/debug β research β writing β news/live β Q&A/chat β
β refactor β explain β stories β X/Twitter β translate β
β review β compare β brainstorm β prices β summarize β
ββββββββ¬ββββββββ΄βββββββ¬βββββββ΄ββββββ¬βββββββ΄ββββββ¬ββββββ΄ββββββ¬ββββββ
β β β β β
βΌ βΌ βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 2: COMPLEXITY ESTIMATION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β SIMPLE (Tier $) β MEDIUM (Tier $$) β COMPLEX (Tier $$$)β
β β’ One-step task β β’ Multi-step task β β’ Deep reasoning β
β β’ Short response OK β β’ Some nuance β β’ Extensive outputβ
β β’ Factual lookup β β’ Moderate context β β’ Critical task β
β β Haiku/Flash β β Sonnet/Grok/GPT β β Opus/GPT-5 β
ββββββββββββββββββββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 3: SPECIAL CASE OVERRIDES β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β CONDITION β OVERRIDE TO β
β ββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
β Context >100K tokens β β Gemini Pro (1M ctx) β
β Context >500K tokens β β Gemini Pro ONLY β
β Needs real-time data β β Grok (regardless) β
β Image/vision input β β Opus or Gemini Pro β
β User explicit override β β Requested model β
ββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ
```
## Intent Detection Patterns
### CODE Intent
- Keywords: write, code, debug, fix, refactor, implement, function, class, script, API, bug, error, compile, test, PR, commit
- File extensions mentioned: .py, .js, .ts, .go, .rs, .java, etc.
- Code blocks in input
### ANALYSIS Intent
- Keywords: analyze, explain, compare, research, understand, why, how does, evaluate, assess, review, investigate, examine
- Long-form questions
- "Help me understand..."
### CREATIVE Intent
- Keywords: write (story/poem/essay), create, brainstorm, imagine, design, draft, compose
- Fiction/narrative requests
- Marketing/copy requests
### REALTIME Intent
- Keywords: now, today, current, latest, trending, news, happening, live, price, score, weather
- X/Twitter mentions
- Stock/crypto tickers
- Sports scores
### GENERAL Intent (Default)
- Simple Q&A
- Translations
- Summaries
- Conversational
### MIXED Intent (Multiple Intents Detected)
When a request contains multiple clear intents (e.g., "Write code to analyze this data and explain it creatively"):
1. **Identify primary intent** β What's the main deliverable?
2. **Route to highest-capability model** β Mixed tasks need versatility
3. **Default to COMPLEX complexity** β Multi-intent = multi-step
**Examples:**
- "Write code AND explain how it works" β CODE (primary) + ANALYSIS β Route to Opus
- "Summarize this AND what's the latest news on it" β REALTIME takes precedence β Grok
- "Creative story using real current events" β REALTIME + CREATIVE β Grok (real-time wins)
## Language Handling
**Non-English requests** are handled normally β all supported models have multilingual capabilities:
| Model | Non-English Support |
|-------|---------------------|
| Opus/Sonnet/Haiku | Excellent (100+ languages) |
| GPT-5 | Excellent (100+ languages) |
| Gemini Pro/Flash | Excellent (100+ languages) |
| Grok | Good (major languages) |
**Intent detection still works** because:
- Keyword patterns include common non-English equivalents
- Code intent detected by file extensions, code blocks (language-agnostic)
- Complexity estimated by query length (works across languages)
**Edge case:** If intent unclear due to language, default to GENERAL intent with MEDIUM complexity.
## Complexity Signals
### Simple Complexity ($)
- Short query (<50 words)
- Single question mark
- "Quick question", "Just tell me", "Briefly"
- Yes/no format
- Unit conversions, definitions
### Medium Complexity ($$)
- Moderate query (50-200 words)
- Multiple aspects to address
- "Explain", "Describe", "Compare"
- Some context provided
### Complex Complexity ($$$)
- Long query (>200 words) or complex task
- "Step by step", "Thoroughly", "In detail"
- Multi-part questions
- Critical/important qualifier
- Research, analysis, or creative work
## Routing Matrix
| Intent | Simple | Medium | Complex |
|--------|--------|--------|---------|
| **CODE** | Sonnet | Opus | Opus |
| **ANALYSIS** | Flash | GPT-5 | Opus |
| **CREATIVE** | Sonnet | Opus | Opus |
| **REALTIME** | Grok | Grok | Grok-3 |
| **GENERAL** | Flash | Sonnet | Opus |
## Token Exhaustion & Automatic Model Switching
When a model becomes unavailable mid-session (token quota exhausted, rate limit hit, API error), the router automatically switches to the next best available model and **notifies the user**.
### Notification Format
When a model switch occurs due to exhaustion, the user receives a notification:
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β οΈ MODEL SWITCH NOTICE β
β β
β Your request could not be completed on claude-opus-4-5 β
β (reason: token quota exhausted). β
β β
β β
Request completed using: anthropic/claude-sonnet-4-5 β
β β
β The response below was generated by the fallback model. β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Switch Reasons
| Reason | Description |
|--------|-------------|
| `token quota exhausted` | Daily/monthly token limit reached |
| `rate limit exceeded` | Too many requests per minute |
| `context window exceeded` | Input too large for model |
| `API timeout` | Model took too long to respond |
| `API error` | Provider returned an error |
| `model unavailable` | Model temporarily offline |
### Implementation
```python
def execute_with_fallback(primary_model: str, fallback_chain: list[str], request: str) -> Response:
"""
Execute request with automatic fallback and user notification.
"""
attempted_models = []
switch_reason = None
# Try primary model first
models_to_try = [primary_model] + fallback_chain
for model in models_to_try:
try:
response = call_model(model, request)
# If we switched models, prepend notification
if attempted_models:
notification = build_switch_notification(
failed_model=attempted_models[0],
reason=switch_reason,
success_model=model
)
return Response(
content=notification + "\n\n---\n\n" + response.content,
model_used=model,
switched=True
)
return Response(content=response.content, model_used=model, switched=False)
except TokenQuotaExhausted:
attempted_models.append(model)
switch_reason = "token quota exhausted"
log_fallback(model, switch_reason)
continue
except RateLimitExceeded:
attempted_models.append(model)
switch_reason = "rate limit exceeded"
log_fallback(model, switch_reason)
continue
except ContextWindowExceeded:
attempted_models.append(model)
switch_reason = "context window exceeded"
log_fallback(model, switch_reason)
... (truncated)
browser
By
Comments
Sign in to leave a comment