Automation
llm-supervisor
Graceful rate limit handling with Ollama fallback.
---
name: llm-supervisor
description: Graceful rate limit handling with Ollama fallback. Notifies on rate limits, offers local model switch with confirmation for code tasks.
---
# LLM Supervisor 🔮
Handles rate limits and model fallbacks gracefully.
## Behavior
### On Rate Limit / Overload Errors
When I encounter rate limits or overload errors from cloud providers (Anthropic, OpenAI):
1. **Tell the user immediately** — Don't silently fail or retry endlessly
2. **Offer local fallback** — Ask if they want to switch to Ollama
3. **Wait for confirmation** — Never auto-switch for code generation tasks
### Confirmation Required
Before using local models for code generation, ask:
> "Cloud is rate-limited. Switch to local Ollama (`qwen2.5:7b`)? Reply 'yes' to confirm."
For simple queries (chat, summaries), can switch without confirmation if user previously approved.
## Commands
### `/llm status`
Report current state:
- Which provider is active (cloud/local)
- Ollama availability and models
- Recent rate limit events
### `/llm switch local`
Manually switch to Ollama for the session.
### `/llm switch cloud`
Switch back to cloud provider.
## Using Ollama
```bash
# Check available models
ollama list
# Run a query
ollama run qwen2.5:7b "your prompt here"
# For longer prompts, use stdin
echo "your prompt" | ollama run qwen2.5:7b
```
## Installed Models
Check with `ollama list`. Configured default: `qwen2.5:7b`
## State Tracking
Track in memory during session:
- `currentProvider`: "cloud" | "local"
- `lastRateLimitAt`: timestamp or null
- `localConfirmedForCode`: boolean
Reset to cloud at session start.
automation
By
Comments
Sign in to leave a comment