Productivity
mlti-llm-fallback
Multi-LLM intelligent switching.
---
name: multi-llm
description: Multi-LLM intelligent switching. Use command 'multi llm' to activate local model selection based on task type. Default uses Claude Opus 4.5.
trigger: multi llm
version: 1.1.0
author: leohan123123
tags: llm, ollama, local-model, fallback, multi-model
---
# Multi-LLM - Intelligent Model Switching
**Trigger Command**: `multi llm`
> **Default Behavior**: Always use Claude Opus 4.5 (strongest model)
> Only when the message contains `multi llm` command will local model selection be activated.
## What's New in v1.1.0
- Renamed trigger from `mlti llm` to `multi llm` (clearer naming)
- Enhanced model existence checking with fallback chain
- Added detailed usage examples and troubleshooting
- Improved task detection patterns
## Usage
### Default Mode (without command)
```
Help me write a Python function -> Uses Claude Opus 4.5
Analyze this code -> Uses Claude Opus 4.5
```
### Multi-Model Mode (with command)
```
multi llm Help me write a Python function -> Selects qwen2.5-coder:32b
multi llm Analyze this math proof -> Selects deepseek-r1:70b
multi llm Translate to Chinese -> Selects glm4:9b
```
## Command Format
| Command | Description |
|---------|-------------|
| `multi llm` | Activate intelligent model selection |
| `multi llm coding` | Force coding model |
| `multi llm reasoning` | Force reasoning model |
| `multi llm chinese` | Force Chinese model |
| `multi llm general` | Force general model |
## Model Mapping
**Primary Model (Default)**: github-copilot/claude-opus-4.5
**Local Models (when `multi llm` triggered)**:
| Task Type | Model | Size | Best For |
|-----------|-------|------|----------|
| Coding | qwen2.5-coder:32b | 19GB | Code generation, debugging, refactoring |
| Reasoning | deepseek-r1:70b | 42GB | Math, logic, complex analysis |
| Chinese | glm4:9b | 5.5GB | Translation, summaries, quick tasks |
| General | qwen3:32b | 20GB | General purpose, fallback |
### Fallback Chain
If the selected model is unavailable, the system tries alternatives:
```
Coding: qwen2.5-coder:32b -> qwen2.5-coder:14b -> qwen3:32b
Reasoning: deepseek-r1:70b -> deepseek-r1:32b -> qwen3:32b
Chinese: glm4:9b -> qwen3:8b -> qwen3:32b
General: qwen3:32b -> qwen3:14b -> qwen3:8b
```
## Detection Logic
```
User Input
|
v
Contains "multi llm"?
|
+-- No -> Use Claude Opus 4.5 (default)
|
+-- Yes -> Task Type Detection
|
+-------+-------+-------+
v v v v
Coding Reasoning Chinese General
| | | |
v v v v
qwen2.5 deepseek glm4 qwen3
coder r1:70b :9b :32b
```
### Task Detection Keywords
| Category | Keywords (EN) | Keywords (CN) |
|----------|---------------|---------------|
| Coding | code, debug, function, script, api, bug, refactor, python, java, javascript | ไปฃ็ , ็ผ็จ, ๅฝๆฐ, ่ฐ่ฏ, ้ๆ |
| Reasoning | analysis, proof, logic, math, solve, algorithm, evaluate | ๆจ็, ๅๆ, ่ฏๆ, ้ป่พ, ๆฐๅญฆ, ่ฎก็ฎ, ็ฎๆณ |
| Chinese | translate, summary | ็ฟป่ฏ, ๆป็ป, ๆ่ฆ, ็ฎๅ, ๅฟซ้ |
## Examples
### Example 1: Coding Task
```bash
# Input
multi llm Write a Python function to calculate fibonacci
# Output
Selected: qwen2.5-coder:32b
Reason: Detected coding task (keywords: python, function)
```
### Example 2: Math Analysis
```bash
# Input
multi llm reasoning Prove that sqrt(2) is irrational
# Output
Selected: deepseek-r1:70b
Reason: Force command 'reasoning' used
```
### Example 3: Quick Translation
```bash
# Input
multi llm ๆ่ฟๆฎต่ฏ็ฟป่ฏๆ่ฑๆ
# Output
Selected: glm4:9b
Reason: Detected Chinese lightweight task (keywords: ็ฟป่ฏ)
```
### Example 4: Default (No trigger)
```bash
# Input
Write a REST API with authentication
# Output
Selected: claude-opus-4.5
Reason: Default model (no 'multi llm' trigger)
```
## Prerequisites
1. **Ollama** must be installed and running:
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama service
ollama serve
# Pull required models
ollama pull qwen2.5-coder:32b
ollama pull deepseek-r1:70b
ollama pull glm4:9b
ollama pull qwen3:32b
```
2. **Check available models**:
```bash
ollama list
```
## Troubleshooting
### Model not found
```bash
# Check if model exists
ollama list | grep "qwen2.5-coder"
# Pull missing model
ollama pull qwen2.5-coder:32b
```
### Ollama not running
```bash
# Check service status
curl -s http://localhost:11434/api/tags
# Start Ollama
ollama serve &
```
### Slow response
- Large models (70b) require significant RAM/VRAM
- Consider using smaller variants: `deepseek-r1:32b` instead of `70b`
### Wrong model selected
- Use force commands: `multi llm coding`, `multi llm reasoning`
- Check if keywords match your task type
## Files in This Skill
```
multi-llm/
โโโ SKILL.md # This documentation
โโโ scripts/
โโโ select-model.sh # Model selection logic
โโโ fallback-demo.sh # Interactive demo script
```
## Integration
### With OpenCode/ClaudeCode
The trigger `multi llm` is detected in your message. Simply prefix your request:
```
multi llm [your request here]
```
### Programmatic Usage
```bash
# Get recommended model for a task
./scripts/select-model.sh "multi llm write a sorting algorithm"
# Output: qwen2.5-coder:32b
# Demo with actual model call
./scripts/fallback-demo.sh --force-local "explain recursion"
```
## Author
- GitHub: [@leohan123123](https://github.com/leohan123123)
## License
MIT
productivity
By
Comments
Sign in to leave a comment