Voice
Plugin Free Optimizer
Auto-discover, benchmark, and dynamically route to the fastest free LLM across 9 providers. OpenClaw plugin.
Install
npm install
#
Configuration Example
{
"apiKeys": {
"nvidia": {
"enabled": true,
"apiKey": "nvapi-xxxxx"
},
"cloudflare": {
"enabled": true,
"apiKey": "cfat_xxxxx",
"accountId": "your-cloudflare-account-id"
}
},
"benchmark": {
"enabled": true,
"intervalMinutes": 60,
"maxResponseTimeMs": 15000,
"concurrency": 3
}
}
README
# Free Optimizer π¦
> Auto-discover free models across 9 platforms, benchmark in real-time, and dynamically route to the fastest one.
<p align="center">
<img src="Picture/ScreenShot_2026-04-29_163456_689.png" alt="Free Optimizer Screenshot" width="600">
</p>
---
## One-Liner
An OpenClaw plugin. Set `model: free-opt/auto` and it automatically picks the fastest free model from a pool to handle your conversations.
---
## Table of Contents
- [How It Works](#how-it-works)
- [Quick Start](#quick-start)
- [Supported Platforms & Free Models](#supported-platforms--free-models)
- [CLI Commands](#cli-commands)
- [Configuration Reference](#configuration-reference)
- [Ranking Table](#ranking-table)
- [Ranking Rules](#ranking-rules)
- [Health Check & Auto-Switching](#health-check--auto-switching)
- [Development](#development)
- [FAQ](#faq)
---
## How It Works
```
Every 60 minutes (cron)
β
βββ Scans 9 platforms β discovers currently available free models
β
βββ Sends a chat request to each model β measures TTFT (Time To First Token)
β β ββ Also collects the response β quality score (0-5 stars)
β βββ Auto-retry on failure, 2 attempts with exponential backoff
β
βββ Ranks results (see rules below)
β
βββ Sets the fastest model as "active model"
β
βββ Saves to local cache β next request reads from cache
```
**On every chat request (milliseconds)**:
```
1. Check cache β was the fastest model benchmarked in the last 5 min?
ββ β
Yes β route request directly
ββ β No (but stale data exists within 30 min)
ββ Route with stale data, trigger background re-benchmark
ββ When done, switch to the new fastest model automatically
```
**Worst case β all models fail**:
β Falls back to `openrouter/free` to guarantee zero request loss
---
## Quick Start
### Prerequisites
- [OpenClaw](https://github.com/openclaw/openclaw) β₯ 1.0.0
- Node.js β₯ 18
- At least one API Key (recommended for China: NVIDIA / Cloudflare / GitHub β no proxy needed)
### 1. Install
```bash
# Clone or copy to plugin directory
cp -r openclaw-plugin-free-optimizer ~/.openclaw/plugins/free-optimizer
# Enter plugin directory
cd ~/.openclaw/plugins/free-optimizer
# Install dependencies
npm install
# Build TypeScript
npm run build
# Verify β run tests (all should pass)
npm test
```
### 2. Configure API Keys
**Option A: Environment variables (recommended)**
```bash
# Copy the example config
cp .env.example ~/.openclaw/env
# Edit and fill in your keys
nano ~/.openclaw/env
# Example content:
export FREE_OPT_NVIDIA_KEY="nvapi-xxxxx"
export FREE_OPT_CLOUDFLARE_KEY="cfat_xxxxx"
export CLOUDFLARE_ACCOUNT_ID="your-cloudflare-account-id"
export FREE_OPT_GITHUB_KEY="ghp_xxxxx"
```
**Option B: Edit config.json directly**
```bash
nano ~/.openclaw/plugins/free-optimizer/config.json
```
```json
{
"apiKeys": {
"nvidia": {
"enabled": true,
"apiKey": "nvapi-xxxxx"
},
"cloudflare": {
"enabled": true,
"apiKey": "cfat_xxxxx",
"accountId": "your-cloudflare-account-id"
}
},
"benchmark": {
"enabled": true,
"intervalMinutes": 60,
"maxResponseTimeMs": 15000,
"concurrency": 3
}
}
```
### 3. Set the Main Model
In your OpenClaw configuration, set the model to:
```
model: free-opt/auto
```
### 4. Verify Installation
Run this in your OpenClaw chat:
```
/free-opt_test
```
Expected output:
```
ββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββ¬ββββββββββ¬βββββββ¬βββββββββββ¬βββββββββββββ
β Rank β Model β TTFT β Qual β Tag β Ctx β Age β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββΌββββββββββΌβββββββΌβββββββββββΌβββββββββββββ€
β 1 β cloudflare/@cf/meta/llama-3.2-3b-instruct β 444ms β β
β
β
β
β
β β 128K β 12s β
β 2 β nvidia/meta/llama-3.3-70b-instruct β 692ms β β
β
β
β
β
β β β 65K β 2s β
...
ββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββ΄ββββββββββ΄βββββββ΄βββββββββββ΄βββββββββββββ
π Best model: cloudflare/@cf/meta/llama-3.2-3b-instruct (preferred β) (444ms TTFT, fresh)
```
---
## Supported Platforms & Free Models
| Platform | Provider ID | Direct from China | Typical Models | Highlights |
|----------|-------------|-------------------|----------------|------------|
| **OpenRouter** | `openrouter` | β
| DeepSeek R1/V3, Llama 4, Qwen3 | Largest model pool, variable quality |
| **NVIDIA NIM** | `nvidia` | β
| Llama 3.3 70B, Kimi K2.5, Nemotron | Strong reasoning models, stable latency |
| **Cloudflare Workers AI** | `cloudflare` | β
| Llama 3.2 1B/3B, Mistral 7B | Lowest latency (400-1000ms), smaller models |
| **GitHub Models** | `github` | β
| GPT-4o Mini, Llama 3.3 70B, DeepSeek R1 | Microsoft ecosystem, requires GitHub PAT |
| **Google AI Studio** | `google` | β (GFW) | Gemini 2.5 Pro/Flash, Gemma 3 | Large free quota, high quality, needs proxy |
| **Groq** | `groq` | β (GFW) | Llama 3.3 70B, Qwen3 32B | Fastest (<200ms), needs proxy |
| **Mistral AI** | `mistral` | β (GFW) | Mistral Large, Codestral | Strong at coding/French, needs proxy |
| **HuggingFace** | `huggingface` | β (GFW) | Various open-source models | Maximum freedom, needs proxy |
| **Cerebras** | `cerebras` | β (GFW) | Llama 3.3 70B | Fastest inference chip, needs proxy |
> π‘ **Tip for China users**: Start with NVIDIA + Cloudflare + GitHub β no proxy required.
---
## CLI Commands
All commands run in the OpenClaw chat window.
| Command | Description | Duration |
|---------|-------------|----------|
| `/free-opt_test` | **Run a full benchmark cycle** | ~30-60s |
| `/free-opt_status` | **View current ranking & active model** | Instant |
| `/free-opt_health` | **Check if the active model is alive** | ~1-3s |
| `/free-opt_list` | **List all discovered free models** | Instant |
```
# Daily routine:
/free-opt_test β Morning benchmark
/free-opt_status β Check who's fastest at any time
/free-opt_health β Periodic health check
```
---
## Configuration Reference
Full `config.json` structure:
```json
{
"apiKeys": {
"openrouter": { "enabled": true, "apiKey": "sk-or-v1-xxxxx" },
"nvidia": { "enabled": true, "apiKey": "nvapi-xxxxx" },
"cloudflare": { "enabled": true, "apiKey": "cfat_xxxxx", "accountId": "4dd1efxxxxxxxxxxxxxxxxxxxxxx" },
"github": { "enabled": true, "apiKey": "ghp_xxxxx" }
},
"benchmark": {
"enabled": true,
"intervalMinutes": 60,
"maxResponseTimeMs": 15000,
"concurrency": 3,
"prompt": "What is the capital of France? Reply in one word.",
"maxTokens": 10,
"retryOnFailure": 2,
"retryDelayMs": 3000,
"includeModels": [],
"excludeModels": [],
"includeProviders": [],
"excludeProviders": [],
"minParamB": 0,
"minContextTokens": 0,
"pinnedModel": "",
"preferredModels": [],
"avoidModels": []
},
"routing": {
"healthCheckEnabled": true,
"healthCheckIntervalMs": 30000
}
}
```
### benchmark Field Reference
| Field | Default | Description |
|-------|---------|-------------|
| `enabled` | `true` | Master plugin switch |
| `intervalMinutes` | `60` | Background benchmark interval (minutes) |
| `maxResponseTimeMs` | `15000` | Model timeout threshold (skips if no response) |
| `concurrency` | `3` | Number of models to benchmark simultaneously |
| `prompt` | `What is the capital of France? Reply in one word.` | Benchmark probe prompt |
| `maxTokens` | `10` | Max tokens in benchmark response (shorter = faster) |
| `retryOnFailure` | `2` | Retry attempts on 429/5xx/network errors |
| `retryDelayMs` | `3000` | Retry delay with exponential backoff (3sβ6sβ12s) |
| `includeModels` | `[]` | **Only benchmark these models** (partial match `providerId/modelId`) |
| `excludeModels` | `[]` | **Skip these models** |
| `includeProviders` | `[]` | **Only benchmark these providers** (e.g. `["nvidia", "github"]`) |
| `excludeProviders` | `[]` | **Skip these providers** |
| `minParamB` | `0` | **Minimum parameters** (set `70` to only benchmark β₯70B models) |
| `minContextTokens` | `0` | **Minimum context window** (set `131072` for β₯128k models) |
| `pinnedModel` | `""` | **Pin a specific model** (format `"nvidia/meta/llama-3.3-70b-instruct"`) |
| `preferredModels` | `[]` | **Prefer these models** (partial match, e.g. `["nvidia/llama-3.3"]`) |
| `avoidModels` | `[]` | **Avoid these models** (partial match, e.g. `["cloudflare/mistral"]`) |
### Filter Examples
```jsonc
// Scenario 1: Only NVIDIA 70B+ models
{ "includeProviders": ["nvidia"], "minParamB": 70 }
// Scenario 2: Skip large models and Cloudflare
{ "excludeProviders": ["cloudflare"], "minParamB": 0, "maxContextTokens": 0 }
// Scenario 3: Pin NVIDIA Llama, no benchmarking
{ "pinnedModel": "nvidia/meta/llama-3.3-70b-instruct" }
// Scenario 4: Prefer NVIDIA Llama, avoid Cloudflare Mistral
{ "preferredModels": ["nvidia/llama-3.3"], "avoidModels": ["cloudflare/mistral"] }
```
---
## Ranking Table
```
ββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββ¬ββββββββββ¬βββββββ¬βββββββββββ¬βββββββββββββ
β Rank β Model β TTFT β Qual β Tag β Ctx β Age β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββΌββββββββββΌβββββββΌβββββββββββΌβββββββββββββ€
β 1 β cloudflare/@cf/meta/llama-3.2-3b-instruct β 444ms β β
β
β
β
β
β β 128K β 12s β
β 2 β nvidia/meta/llama-3.3-70b-instruct β 957ms β β
β
β
β
β
β β β 65K β 2s β
β 3 β github/DeepSeek-R1 β 1123ms β β
β
β
β
β
β β 128K β 1m β
β - β google/gemma-4-31b-it β - β ??? β β 262K β β
β β Error: Timeout 15000ms β β β β β β
ββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββ΄ββββββββββ΄βββββββ΄βββββββββββ΄βββββββββββββ
```
| Column | Meaning |
|--------|-
... (truncated)
voice
Comments
Sign in to leave a comment