Integration
ClawCut
ClawCut-MLX Proxy - A high-performance bridge between OpenClaw and Apple Silicon (MLX-LM).
Install
pip install flask
Configuration Example
"models": {
"mode": "merge",
"providers": {
"ollama": {
"baseUrl": "http://127.0.0.1:5000/v1",
"apiKey": "ollama-local",
"api": "ollama",
"models": [
{
"id": "ollama/qwen2.5:14b",
"name": "qwen 2.5 14b",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 16384,
"maxTokens": 4096,
"compat": {
"supportsDeveloperRole": false
}
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "ollama/qwen2.5:14b"
}
}
}
README
# **ClawCut MLX Proxy**
A high-performance bridge between **OpenClaw** and **Apple Silicon (MLX-LM)**.
This proxy allows you to run large language models (like Qwen 2.5) on a Mac mini/MacBook (M1/M2/M3/M4) while maintaining compatibility with the OpenClaw framework running on a Raspberry Pi or other Linux servers.
## **Motivation**
OpenClaw is a powerful framework, but it often sends a massive amount of "JSON Clutter" (system prompts, tool definitions, and metadata) in every request. This often leads to:
* **LLM Timeouts:** Standard setups frequently run into timeouts because the model takes too long to process the massive context.
* **Poor Reasoning:** Models can get "lost" in the clutter, leading to hallucinations or ignored tool calls.
**ClawCut-MLX** solves this by optimizing the communication and leveraging the power of Apple Silicon. While this setup is optimized for speed, the performance depends on your hardware:
* **Example:** With a **Mac mini M4 Pro (24 GB RAM)** and a **14B model**, this setup achieves generation speeds of up to **21+ tokens/s** with a warm KV-cache.
* **Flexibility:** The proxy works with any MLX-compatible model. You can use smaller models (e.g., 7B) for even higher speeds or larger models (e.g., 32B+) if your Mac has sufficient Unified Memory.
## **Typical Use Case (Split Setup)**
This proxy is specifically designed for users who run a **split-system architecture**:
1. **The Brain (Mac):** A powerful Mac mini or MacBook acts as the LLM engine, providing high-speed inference.
2. **The Heart (Raspberry Pi/Linux):** A Pi or Linux server hosts the OpenClaw framework, managing integrations like WhatsApp, Telegram, or home automation.
By using **ClawCut** the LLM responses become near-instant.
## **Key Features**
* **KV-Cache Optimization:** Automatically filters dynamic timestamps from system prompts to enable near-instant responses via hardware caching.
* **Protocol Translation:** Translates between OpenAI-compatible streams (MLX) and the Ollama/NDJSON format expected by OpenClaw.
* **Performance Tracking:** Real-time console output of prefill duration, token count, and generation speed (tokens per second).
* **Transparency:** With the **DEBUG\_MODE** enabled, you can inspect the full "JSON Clutter" sent by OpenClaw to understand exactly what the model is processing.
## **How to find & download MLX Models**
You don't need to manually download model files. The mlx-lm server handles everything automatically.
1. **Browse Models:** Go to [Hugging Face](https://huggingface.co/mlx-community) and search for the `mlx-community organization`. They provide pre-converted models optimized for Apple Silicon.
2. **Choose your Model:** Copy the repository name (e.g., mlx-community/Qwen2.5-14B-Instruct-4bit).
3. **Automatic Download:** When you start the server for the first time using the --model flag, mlx-lm will automatically download the files (several GBs) and cache them locally on your Mac.
## **Prerequisites**
* **Python 3.x**
* **Network Access:** Both devices must be in the same local network.
* **MLX-LM Server (on Mac):** The Mac must be configured to listen to network requests.
## **Configuration & Network Setup**
### **1\. Prepare the Mac (The LLM Server)**
To allow the Raspberry Pi to talk to your Mac, the MLX server must not only run on localhost. You **must** bind it to your network interface.
Start the server on your Mac with the `--host 0.0.0.0` flag:
```bash
python -m mlx_lm.server --model [YOUR_MODEL_ID] --host 0.0.0.0 --port 8080
```
*Note: Using 0.0.0.0 makes the LLM accessible to any device in your local network.*
⚠️ **IMPORTANT:** Replace `[YOUR_MODEL_ID]` with the model of your choice (e.g., `mlx-community/Qwen2.5-14B-Instruct-4bit`). Ensure that the model fits your available RAM (a 14B model requires approx. 9-10 GB RAM, a 32B model approx. 19 GB). Choose a smaller model (e.g., 7B) if your Mac only has 8 GB or 16 GB of RAM.
⚠️ **Note on Performance:** The very first request (or the first one after clearing a chat session) will take significantly longer (often 30-60 seconds) because the Mac has to process the entire 16k context for the first time. **ClawCut-MLX** optimization becomes effective starting with the **second** request, reducing response times to just a few seconds.
#### **⚠️ macOS Firewall Note**
If the connection is still refused (Error 502/61), your macOS firewall might be blocking the port.
* Go to **`System Settings > Network > Firewall`**.
* Either disable it temporarily for testing or click **Options** and ensure that your Python binary (inside your `mlx_env`) is allowed to receive incoming connections.
* **Test connection from Pi:** Run `nc -zv [MAC_IP] 8080\`. It should say "succeeded".
### **2\. Configure the Proxy (on Raspberry Pi)**
Edit the clawcut-mlx.py file and adjust the constants:
* `MAC_IP:` The local IP address of your Mac (e.g., `192.168.0.5`).
* `OPENCLAW_MODEL_ID:` The exact model ID used in your openclaw.json.
* `MLX_MODEL_IDENTIFIER:` The name of the model loaded on your Mac.
* `DEBUG_MODE:` Set to True to see the raw communication and JSON clutter.
## **Installation**
### **1\. Clone the repository**
```bash
git clone [https://github.com/back-me-up-scotty/ClawCut.git](https://github.com/back-me-up-scotty/ClawCut.git)
cd clawcut-mlx
```
### **2\. Create a Virtual Environment (on MAC / Recommended)**
```bash
python3 -m venv proxy env
source proxy_env/bin/activate
```
### **3\. Install Dependencie (on MAC)s**
```bash
pip install flask requests
```
### **3\. Install Dependencies (on Pi)**
```bash
chmod +x /home/user/clawcut-mlx/clawcut-mlx.py
```
## **Usage**
Start the proxy on your Raspberry Pi:
```bash
python3 clawcut-mlx.py
```
### **OpenClaw Configuration (openclaw.json)**
Point your OpenClaw provider to the proxy. If OpenClaw and the Proxy are on the same Pi, use the following configuration:
```json
"models": {
"mode": "merge",
"providers": {
"ollama": {
"baseUrl": "http://127.0.0.1:5000/v1",
"apiKey": "ollama-local",
"api": "ollama",
"models": [
{
"id": "ollama/qwen2.5:14b",
"name": "qwen 2.5 14b",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 16384,
"maxTokens": 4096,
"compat": {
"supportsDeveloperRole": false
}
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "ollama/qwen2.5:14b"
}
}
}
```
integration
Comments
Sign in to leave a comment