Tools

Tacitclaw

Name: Tacitclaw
Rating: 3.5 (1 reviews)
Author: TacitClaw

By TacitClaw 👁 73 views ▲ 0 votes

**TacitClaw** is a long-term cognitive and memory architecture designed exclusively for the OpenClaw AI Agent framework. Integrating seamlessly as a ContextEngine plugin, it empowers your Agent with ultra-fast short-term working memory, long-term semantic persistence, autonomous self-pruning, and adaptive compute routing.

GitHub

README

# 🧠 TacitClaw: a contextEngine Plugin for OpenClaw

> **The ultimate evolution from a "stateless execution shell" to a "resident cognitive entity."**

[![Framework](https://img.shields.io/badge/OpenClaw-2026.3.8-blue.svg)](https://github.com/)
[![Stack](https://img.shields.io/badge/Stack-Node.js%20%7C%20Python%20%7C%20Qdrant-success.svg)](https://github.com/)
[![Hardware Optimized](https://img.shields.io/badge/Optimized%20for-Apple%20Silicon%20M4-lightgrey.svg)](https://github.com/)
[![License](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)

**TacitClaw** is a long-term cognitive and memory architecture designed exclusively for the OpenClaw AI Agent framework. Integrating seamlessly as a ContextEngine plugin, it empowers your Agent with ultra-fast short-term working memory, long-term semantic persistence, autonomous self-pruning, and adaptive compute routing.

Engineered specifically for **heavy local deployments on single-machine, single-process environments** (e.g., Mac mini M4 32GB), TacitClaw permanently eliminates the token anxiety and memory collapse commonly associated with long-context LLM interactions.

---

## 📐 Core Philosophy: CQRS & Dynamic Attention Allocation

The soul of this architecture lies in **Command Query Responsibility Segregation (CQRS)** and **Asymmetric Compute Routing**. We acknowledge and exploit the LLM's "stateless ping-pong" mechanism during tool calls to build a dual-system possessing both "neural reflexes" and "deep reasoning":

* **Hook Intent Smuggling**: Captures intent in milliseconds at the gateway, stockpiling raw materials for future memories.
* **Asymmetric Compute Scheduling**: Human queries trigger heavy, high-dimensional RAG (Cognitive Route); autonomous error-correction loops trigger lightning-fast skeletal truncation (Reflex Route).
* **Offline Dimensionality Reduction**: Real-time interactions merely append to a Write-Ahead Log (WAL); expensive macro-folding, deduplication, and memory decay are deferred to cheap cloud compute during nightly batch processing.

---

## 🏗️ The Four-Layer Architecture

### 1. 🛡️ Physical Gateway Layer (The Hook Sentinel)
**Role**: Ultra-fast shield & intent smuggler | **Time Budget**: `< 50ms`

* **Level-0 Physical Shield**: Utilizes a local 0.6B XGuard model for synchronous interception. Malicious injections are blocked instantly, ensuring the high-dimensional memory pool remains untainted.
* **State Smuggling & Dual-Track Delivery**:
    * **Hot Track (Immediate Ammo)**: Pushes clean instructions into a short-lived `HookCache` (TTL 10s), bypassing OpenClaw's native state-awareness limitations during the `assemble` phase.
    * **Cold Track (Long-term Evolution)**: Asynchronously appends raw instructions to the disk WAL (Write-Ahead Log) as episodic memory, consuming zero real-time embedding compute.

### 2. 🧠 Cognitive Forge Layer (Assemble Dual-Engine Routing)
**Role**: The central nervous system for pushing token limits and memory dispatching. It automatically detects whether the current turn is a "Human Prompt" or an "Agent Self-Loop", routing via a zero-I/O, millisecond-level triage (P0/P1/P2):

* **Engine A: Cognitive Route (Slow Deep-Thinking | 3-15s)**
    * **Intent Fission**: Deconstructs user instructions into sharp, high-dimensional search probes.
    * **Matrix Retrieval**: Concurrently queries `hot_state` (procedural) + `WAL/numpy` (episodic) + `Qdrant` (semantic).
    * **Adaptive Compute**: A 5-dimensional complexity evaluator (checking code blocks, file references, etc.) dynamically issues an `overrideConfig`, autonomously toggling between economy and premium LLM models (< 10ms).
* **Engine B: Reflex Route (Ultra-fast Neural Reflex | < 50ms)**
    * **Absolute Silence**: Bypasses all high-dimensional retrievals to strictly focus on debugging loops.
    * **Skeletal Truncation**: Strips massive Tool payloads (retaining only the top/bottom 10% and injecting UUIDs), forcing the LLM to lazy-load via `read_code_block`. Combines with Prompt Caching for near-zero latency reuse.

### 3. 💾 Digital Hippocampus (PlugMem Storage Topology)
**Role**: Strictly physically isolated memory mediums.

* **Workspace & Hot State**: Static Markdown constitution alongside runtime KV differential patches.
* **Episodic WAL**: Extreme sliding window storage supporting a 72-hour lifecycle with raw numpy cosine matching.
* **Semantic Qdrant**: The sacred ground for objective facts and topological graphs, supporting 768/1536-dimensional local or cloud multimodal embeddings.

### 4. 🧬 Immune & Evolution System (The Nightly Batch)
**Role**: Defying entropy and preventing vector database explosions. **Trigger**: `Cron 0 3 * * *`.

* **Cloud Macro-Folding**: LLMs act as "information juicers", filtering out noise and extracting structured traits from the daily WAL.
* **The Collision Shield**:
    * If similarity $\ge 0.92$ (Collision): Triggers **Immune Rejection**. Prohibits new nodes, instead amplifying the weight of existing nodes (Use It or Lose It).
    * If similarity $< 0.92$: Triggers **Neural Genesis**, solidifying into a brand-new knowledge node.
* **Global Ebbinghaus Decay**: Based on $\max(1, I \times e^{-0.0462t})$, applies daily weight degradation to unhit nodes, keeping the vector space razor-sharp.

---

## ⚡ Tech Stack & Performance Metrics

| Module | Technical Implementation | Performance Benchmark |
| :--- | :--- | :--- |
| **Security Shield** | Node 50ms Client + Python FastAPI Inference | Latency < 50ms |
| **Episodic Retrieval** | M4 Local CPU Matrix Math (`matrix @ q_vec`) | < 1ms (up to 1k entries) |
| **Dual-Engine Routing** | TypeScript Pure Sync Memory Diffing | < 1ms |
| **Semantic Retrieval** | Qdrant v1.12+ Vector DB | KNN < 3s |
| **Token Pruning** | Head/Tail Protection + Sliding Window Strategy | O(N) Ultra-fast Trimming |

*Containerization: Fully equipped with Dockerfile & docker-compose, optimized for Apple Silicon GPU passthrough via OrbStack.*

---

## 🛠️ Getting Started

### 0. Prerequisites
Ensure your target environment (optimized for Mac mini M4 / OrbStack) has the following baseline dependencies installed:
* Node.js 18+
* Python 3.12+
* Ollama
* OrbStack (or compatible macOS Docker runtime)

### 1. Deployment & Bootstrapping (5 Minutes)
First, spin up the local inference engine and pull the embedding model:
```bash
# Install Ollama (Skip if already installed)
brew install ollama

# Start the background service and pull the 768d nomic-embed-text model
ollama serve &
ollama pull nomic-embed-text

tools