← Back to Plugins
Tools

Smart Context Engine

JIemyp By JIemyp 👁 5 views ▲ 0 votes

RAG-based context memory plugin for OpenClaw. Local embeddings, vector search, automatic context injection. No external API needed.

GitHub

Install

npm install
npm

Configuration Example

{
  "plugins": {
    "entries": {
      "smart-context-engine": {
        "path": "/absolute/path/to/smart-context-engine/plugin/src/index.js",
        "enabled": true
      }
    }
  },
  "contextEngine": "smart-context-engine"
}

README

# ๐Ÿง  Smart Context Engine

**RAG-based context memory plugin for [OpenClaw](https://openclaw.dev).**  
Local embeddings. Vector search. Automatic context injection. Zero external API cost.

---

## What It Does

Smart Context Engine (SCE) gives OpenClaw a persistent, searchable memory. Every conversation turn and every file in your workspace is automatically embedded into a local SQLite vector database. Before each LLM call, SCE searches that database and injects the most relevant context snippets into the system prompt โ€” without you having to do anything.

Think of it as a long-term memory that survives session compaction.

---

## Features

| Feature | Description |
|---------|-------------|
| **Auto-ingest** | Every user/assistant message is automatically embedded during the session |
| **Compaction preservation** | When OpenClaw compacts history, SCE saves the dropped messages to the DB first |
| **`sce_search` tool** | LLM can explicitly call `sce_search` to find past context on demand |
| **Auto context injection** | Relevant context is injected into every system prompt automatically |
| **Smart query classification** | Detects heartbeats, subagent completions, etc. โ€” searches using the actual user intent |
| **Thinking block preservation** | Correctly reorders thinking blocks in assistant messages for Anthropic models |
| **Incremental file indexing** | Re-indexes only changed files (SHA-256 hash tracking) |
| **Zero external API** | Everything runs locally โ€” no OpenAI, no Pinecone, no cloud |

---

## Architecture

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    OpenClaw                         โ”‚
โ”‚                                                     โ”‚
โ”‚  User message โ”€โ”€โ–บ ingest() โ”€โ”€โ–บ embed.py โ”€โ”€โ–บ sce.db  โ”‚
โ”‚                                                     โ”‚
โ”‚  LLM call     โ”€โ”€โ–บ assemble()                        โ”‚
โ”‚                      โ”‚                              โ”‚
โ”‚                      โ”œโ”€โ”€ vectorSearch(query)        โ”‚
โ”‚                      โ”‚       โ”‚                      โ”‚
โ”‚                      โ”‚       โ””โ”€โ”€ embed.py โ”€โ”€โ–บ sce.dbโ”‚
โ”‚                      โ”‚                              โ”‚
โ”‚                      โ””โ”€โ”€ systemPromptAddition       โ”‚
โ”‚                           (injected context)        โ”‚
โ”‚                                                     โ”‚
โ”‚  Compaction   โ”€โ”€โ–บ compact() โ”€โ”€โ–บ save to sce.db      โ”‚
โ”‚                                                     โ”‚
โ”‚  sce_search tool โ”€โ”€โ–บ vectorSearch() โ”€โ”€โ–บ sce.db      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

sce.db (SQLite)
โ”œโ”€โ”€ embeddings table
โ”‚   โ”œโ”€โ”€ source: "session_turn"  (conversation messages)
โ”‚   โ”œโ”€โ”€ source: "compaction"    (compacted history)
โ”‚   โ””โ”€โ”€ source: "file"          (workspace files via reindex.py)
โ””โ”€โ”€ indexes on (source), (text, source)
```

**Stats from production use:**
- 20,000+ chunks in the database
- Sub-100ms vector search
- ~176MB database for ~2,000 indexed files
- ~384-dimensional embeddings (all-MiniLM-L6-v2)

---

## Quick Install (~5 minutes)

### 1. Set up the Python embedding environment

```bash
# Create directory for SCE data
mkdir -p ~/.smart-context-engine/embeddings

# Create a virtual environment (recommended)
python3 -m venv ~/.smart-context-engine/embeddings/venv

# Activate and install dependencies
source ~/.smart-context-engine/embeddings/venv/bin/activate
pip install sentence-transformers numpy scikit-learn
```

### 2. Copy the embedding scripts

```bash
# Clone the repo
git clone https://github.com/JIemyp/smart-context-engine.git
cd smart-context-engine

# Copy Python scripts to your SCE data directory
cp embeddings/*.py ~/.smart-context-engine/embeddings/
```

### 3. Build the TypeScript plugin

```bash
npm install
npm run build
```

### 4. Configure OpenClaw

Add to your `openclaw.json`:

```json
{
  "plugins": {
    "entries": {
      "smart-context-engine": {
        "path": "/absolute/path/to/smart-context-engine/plugin/src/index.js",
        "enabled": true
      }
    }
  },
  "contextEngine": "smart-context-engine"
}
```

### 5. Set environment variables

Create a `.env` file or export in your shell:

```bash
export SCE_DB_PATH=~/.smart-context-engine/sce.db
export SCE_EMBEDDINGS_DIR=~/.smart-context-engine/embeddings
export SCE_PYTHON_PATH=~/.smart-context-engine/embeddings/venv/bin/python3
export SCE_MODEL=sentence-transformers/all-MiniLM-L6-v2
```

Or copy `.env.example` to `.env` and adjust.

### 6. (Optional) Index your workspace files

```bash
SCE_DIRS=/path/to/your/workspace python3 embeddings/reindex.py
```

### 7. Restart OpenClaw

```bash
openclaw gateway restart
```

That's it. SCE will start automatically embedding conversations.

---

## Choosing an Embedding Model

The default model (`all-MiniLM-L6-v2`) is optimized for **English**. For other languages, set `SCE_MODEL`:

| Language | Recommended Model |
|----------|-------------------|
| English | `sentence-transformers/all-MiniLM-L6-v2` (default, fast) |
| English (high quality) | `sentence-transformers/all-mpnet-base-v2` |
| **Multilingual** | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` |
| Russian / Slavic | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` |
| Chinese | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` |
| German / French | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` |

> **Important:** All documents in the database must be embedded with the **same model**. If you change `SCE_MODEL`, delete `sce.db` and re-run `reindex.py` to rebuild from scratch.

```bash
# Switch to multilingual model
export SCE_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

# Rebuild the database
rm ~/.smart-context-engine/sce.db
python3 embeddings/reindex.py
```

---

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `SCE_DB_PATH` | `~/.smart-context-engine/sce.db` | Path to SQLite database |
| `SCE_EMBEDDINGS_DIR` | `~/.smart-context-engine/embeddings` | Directory with `embed.py` |
| `SCE_PYTHON_PATH` | `python3` | Python executable (use venv path for best results) |
| `SCE_MODEL` | `sentence-transformers/all-MiniLM-L6-v2` | Embedding model |
| `SCE_TTL_DAYS` | `30` | Days to retain session conversation records |
| `SCE_DIRS` | current directory | Colon-separated dirs to index with `reindex.py` |

---

## Maintenance

### Manual vector search

```bash
python3 embeddings/embed.py "your search query" 5
```

### Re-index workspace files

```bash
# Incremental (only changed files)
SCE_DIRS=/your/workspace python3 embeddings/reindex.py

# Full re-index
SCE_DIRS=/your/workspace python3 embeddings/reindex.py --force
```

### Database cleanup (TTL + deduplication)

```bash
python3 embeddings/cleanup.py
```

### Index a single file

```bash
python3 embeddings/index_file.py /path/to/file.md
```

---

## How Context Injection Works

On every `assemble()` call (before the LLM receives messages), SCE:

1. Classifies the last user message (user query / heartbeat / subagent completion / system event)
2. Extracts the effective search query (skipping system messages and heartbeats)
3. Runs a cosine similarity search against the vector database
4. Injects the top-3 results (similarity โ‰ฅ 0.25) as a `## Smart Context Enhancement` block in the system prompt

The `sce_search` tool additionally lets the LLM search the database explicitly when it needs specific context.

---

## File Structure

```
smart-context-engine/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ package.json
โ”œโ”€โ”€ tsconfig.json
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ .env.example
โ”œโ”€โ”€ plugin/
โ”‚   โ””โ”€โ”€ src/
โ”‚       โ”œโ”€โ”€ index.ts       # Plugin entry point, registers engine + sce_search tool
โ”‚       โ”œโ”€โ”€ engine.ts      # SmartContextEngine: ingest, assemble, compact, search
โ”‚       โ””โ”€โ”€ types.ts       # Shared TypeScript types
โ””โ”€โ”€ embeddings/
    โ”œโ”€โ”€ embed.py           # Core: store_embedding(), search(), init_db()
    โ”œโ”€โ”€ reindex.py         # Batch index workspace files (incremental)
    โ”œโ”€โ”€ cleanup.py         # TTL + deduplication maintenance
    โ”œโ”€โ”€ index_file.py      # Index a single file
    โ””โ”€โ”€ requirements.txt   # Python dependencies
```

---

## Requirements

**Node.js:** 18+  
**Python:** 3.8+  
**Disk:** ~200MB for model download + database growth  
**RAM:** ~500MB during embedding (model loaded in memory)

---

## License

MIT โ€” see [LICENSE](LICENSE)

---

## Author

**Oleksandr Pavlenko**  
[LinkedIn](https://www.linkedin.com/in/pavlenkoall/) ยท [GitHub](https://github.com/JIemyp)

---

*Built for [OpenClaw](https://openclaw.dev) โ€” the extensible AI gateway.*
tools

Comments

Sign in to leave a comment

Loading comments...