Tools
QuantClaw
QuantClaw is a plug-and-play task-type routing quantization plugin for OpenClaw.
Install
openclaw plugins install @sparkengineai/quantclaw
Configuration Example
{
"quant": {
"enabled": true,
"detectors": ["ruleDetector", "loadModelDetector"],
"judge": {
"endpoint": "http://127.0.0.1:8000",
"model": "BAAI/bge-m3",
"providerType": "openai-compatible",
"apiKey": "",
"cacheTtlMs": 300000
}
}
}
README
<p align="center">
<img src="./figs/favicon.png" alt="QuantClaw logo" width="240">
</p>
<h1 align="center">QuantClaw: Precision Where It Matters for OpenClaw</h1>
<p align="center">
<a href="./README_zh.md">中文文档</a>
</p>
<div align="center">
<p>
<img src="https://img.shields.io/badge/OpenClaw-Plugin-0f172a" alt="OpenClaw Plugin">
<a href="https://sparkengineai.github.io/QuantClaw/"><img src="https://img.shields.io/badge/Blog-Live-0ea5e9" alt="Blog"></a>
<a href="PAPER_URL_PLACEHOLDER"><img src="https://img.shields.io/badge/Paper-Placeholder-f97316" alt="Paper"></a>
<img src="https://img.shields.io/badge/Routing-4bit%20%7C%208bit%20%7C%2016bit-2563eb" alt="Routing tiers">
<img src="https://img.shields.io/badge/License-MIT-16a34a" alt="MIT License">
</p>
</div>

QuantClaw is a plug-and-play task-type routing quantization plugin for OpenClaw. It classifies each incoming request, maps it to a precision tier (`4bit`, `8bit`, or `16bit`), and routes the request to the right model target so you can balance quality, latency, and cost without asking users to choose precision manually.
## 🔍 About QuantClaw
QuantClaw is built from quantization studies on OpenClaw workloads rather than from fixed intuition. We evaluate quantized and high-precision models across 24 task types, 104 tasks, 6 models, and scales from 9B to 744B.
Results on Claw-Eval (release v0.0.0):
<div align="center">
<table>
<thead>
<tr>
<th align="left">Model</th>
<th align="center">Params (B)</th>
<th align="center">BF16 / FP8</th>
<th align="center">NVFP4</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>GLM-4.7-Flash</strong></td>
<td align="center">30</td>
<td align="center">0.6370</td>
<td align="center"><strong>0.6034</strong></td>
</tr>
<tr>
<td><strong>GLM-5</strong></td>
<td align="center">744</td>
<td align="center">0.7130</td>
<td align="center"><strong>0.7229</strong></td>
</tr>
<tr>
<td><strong>MiniMax-M2.5</strong></td>
<td align="center">229</td>
<td align="center">0.6760</td>
<td align="center"><strong>0.6823</strong></td>
</tr>
<tr>
<td><strong>Qwen3.5-9B</strong></td>
<td align="center">9</td>
<td align="center">0.4267</td>
<td align="center"><strong>0.4107</strong></td>
</tr>
<tr>
<td><strong>Qwen3.5-35B-A3B</strong></td>
<td align="center">35</td>
<td align="center">0.6686</td>
<td align="center"><strong>0.6549</strong></td>
</tr>
<tr>
<td><strong>Qwen3.5-397B-A17B</strong></td>
<td align="center">397</td>
<td align="center">0.7048</td>
<td align="center"><strong>0.6937</strong></td>
</tr>
</tbody>
</table>
</div>
- High-sensitivity tasks such as coding, safety, and complex workflows benefit from higher precision.
- Low-sensitivity tasks such as research, multimodal understanding, comprehension, knowledge lookup, office QA, and data analysis can often run well on lower precision.
<p align="center">
<img src="./figs/sensitivity_chart.png" alt="sensitivity chart" width="600">
</p>
## ✨ Key Features
<table align="center">
<tr align="center">
<th><p align="center"> Automatic Adaptation</p></th>
<th><p align="center"> Intelligent Routing</p></th>
<th><p align="center"> Full Customizability</p></th>
<th><p align="center"> Built-in Observability</p></th>
</tr>
<tr>
<td align="center"><p align="center"><img src="figs/ruleDetector.png" width="400" height="250"></p></td>
<td align="center"><p align="center"><img src="figs/session.png" width="400" height="250"></p></td>
<td align="center"><p align="center"><img src="figs/config.png" width="400" height="250"></p></td>
<td align="center"><p align="center"><img src="figs/dashboard.png" width="400" height="250"></p></td>
</tr>
<tr>
<td align="center">Rules first, then a judge model for requests.</td>
<td align="center">Map each query to 4bit, 8bit, or 16bit targets.</td>
<td align="center">Tune task types, patterns, targets, pricing, and backends.</td>
<td align="center">Track routing, tokens, cost, sessions, and live config changes.</td>
</tr>
</table>
## 🚀 Quick Start
**Install**
```bash
# Prerequisite: OpenClaw is already installed.
# Install from npm (recommended)
openclaw plugins install @sparkengineai/quantclaw
# If OpenClaw is running from a source checkout and the CLI is not on PATH:
cd /path/to/openclaw
node openclaw.mjs plugins install @sparkengineai/quantclaw
# Or install from source
git clone https://github.com/SparkEngineAI/QuantClaw.git ./quantclaw
openclaw plugins install ./quantclaw
# If the OpenClaw CLI is not on PATH:
cd /path/to/openclaw
node openclaw.mjs plugins install /path/to/quantclaw
```
**Create or bootstrap the runtime config**
QuantClaw reads its runtime config from:
```text
~/.openclaw/quantclaw.json
```
If the file does not exist, starting OpenClaw with the plugin enabled will generate a default `quantclaw.json`. If you are working from this repository directly, you can also start from the provided example:
```bash
cp config.example.json ~/.openclaw/quantclaw.json
```
**Edit the detector chain and targets**
```json
{
"quant": {
"enabled": true,
"detectors": ["ruleDetector", "loadModelDetector"],
"judge": {
"endpoint": "http://127.0.0.1:8000",
"model": "BAAI/bge-m3",
"providerType": "openai-compatible",
"apiKey": "",
"cacheTtlMs": 300000
}
}
}
```
**Start OpenClaw and open the dashboard**
```text
http://127.0.0.1:18789/plugins/quantclaw/stats
```
## ⚙️ Configuration Notes
The runtime schema supports:
- ordered detectors: `ruleDetector`, `loadModelDetector`
- per-task-type `id`, `description`, `precision`, `keywords`, and `patterns`
- per-tier model targets with independent provider, model, endpoint, api key, and pricing
- model-level pricing overrides for cost reporting
- hot reload when `~/.openclaw/quantclaw.json` changes
Example `taskTypes` config:
```json
{
"taskTypes": [
{
"id": "coding",
"precision": "16bit",
"description": "code review, bug analysis, implementation, debugging, kernels, async behavior, web development",
"keywords": ["code", "debug", "bug", "Python", "CUDA", "编程", "代码"],
"patterns": [
"fix the bug in this repository",
"(?=.*(?:refactor|重构))(?=.*(?:typescript|ts|node)).*"
]
}
],
"defaultTaskType": "standard"
}
```
Example `targets` config:
```json
{
"targets": {
"4bit": {
"provider": "quantclaw-4bit",
"model": "glm-4.7-flash-int4-autoround",
"endpoint": "https://api.example.com/v1",
"apiKey": "${QC_4BIT_API_KEY}",
"displayName": "4-bit Target",
"pricing": {
"inputPer1M": 0.051,
"outputPer1M": 0.34
}
},
"16bit": {
"provider": "quantclaw-16bit",
"model": "glm-4.7-flash",
"endpoint": "https://api.openai.com/v1",
"apiKey": "${QC_16BIT_API_KEY}",
"displayName": "16-bit Target",
"pricing": {
"inputPer1M": 0.06,
"outputPer1M": 0.4
}
}
}
}
```
Example `modelPricing` overrides:
```json
{
"modelPricing": {
"glm-4.7-flash": {
"inputPer1M": 0.06,
"outputPer1M": 0.4
},
"glm-4.7-flash-int4-autoround": {
"inputPer1M": 0.051,
"outputPer1M": 0.34
}
}
}
```
Target-level `pricing` is used first for that precision tier. If it is absent, QuantClaw falls back to `modelPricing` for cost reporting.
## 🧠 `loadModelDetector` Backends
`loadModelDetector` supports either a local embedding-based router exposed through an OpenAI-compatible API or a regular OpenAI-compatible LLM judge.
Build a local embedding router index:
```bash
python router/embedding_task_router.py --model-name BAAI/bge-m3 --device cuda --config-path ~/.openclaw/quantclaw.json --output-dir ./embedding_router_index-bge-m3 build --print-summary
```
Serve that router as an OpenAI-compatible endpoint:
```bash
python router/embedding_task_router_server.py --model-name BAAI/bge-m3 --device cuda --output-dir ./embedding_router_index-bge-m3 --port 8012
```
If your machine does not have a GPU, change `--device cuda` to `--device cpu`.
If you do not want to run the local embedding router, you can point `quant.judge.endpoint` at any OpenAI-compatible LLM endpoint instead.
## 🙏 Acknowledgements
We especially acknowledge:
- [Claw-Eval](https://github.com/claw-eval/claw-eval)
- [PinchBench](https://github.com/pinchbench/skill)
- [WildClawBench](https://github.com/InternLM/WildClawBench)
- [ClawXRouter](https://github.com/OpenBMB/ClawXRouter/tree/main)
## 👥 Core Contributors
[Manyi Zhang](https://openreview.net/profile?id=%7EManyi_Zhang2), [Ji-Fu Li*](https://openreview.net/profile?id=~Ji-Fu_Li1), [Zhongao Sun](https://openreview.net/profile?id=~Zhongao_Sun1), [Xiaohao Liu](https://xiaohao-liu.github.io), [Zhenhua Dong](https://scholar.google.com/citations?user=JeePtHEAAAAJ&hl=en), [Xianzhi Yu](https://scholar.google.com/citations?user=tGnJRYQAAAAJ&hl=en), [Haoli Bai](https://haolibai.github.io/) (Project Lead), [Xiaobo Xia](https://xiaoboxia.github.io/)
*Follow SparkEngineAI on WeChat. We hope to share cutting-edge progress in AI Infra, light up stars in the AI field, and help everyone learn and draw inspiration.*
<p align="left">
<img src="./figs/SparkEngineAI.jpg" alt="SparkEngineAI official account" width="240">
</p>
## 📖 Citation
If QuantClaw helps your research, engineering work, or benchmark studies, please cite:
```bibtex
@misc{QuantClawBlog,
title = {QuantClaw: Precision Where It Matters for OpenClaw},
url = {https://sparkengineai.github.io/QuantClaw/},
author = {SparkEngineAI Team},
month = {April},
year = {2026}
}
```
tools
Comments
Sign in to leave a comment