← Back to Plugins
Tools

Knowledge Engine

alberthild By alberthild 👁 109 views ▲ 0 votes

OpenClaw plugin for real-time knowledge extraction from conversational data

GitHub

Install

npm install @vainplex/openclaw-knowledge-engine

Configuration Example

{
  "plugins": {
    "entries": {
      "openclaw-knowledge-engine": {
        "enabled": true,
        "config": {
          "workspace": "/path/to/your/workspace",
          "extraction": {
            "regex": { "enabled": true },
            "llm": {
              "enabled": true,
              "endpoint": "http://localhost:11434/api/generate",
              "model": "mistral:7b",
              "batchSize": 10,
              "cooldownMs": 30000
            }
          }
        }
      }
    }
  }
}

README

# @vainplex/openclaw-knowledge-engine

A real-time knowledge extraction plugin for [OpenClaw](https://github.com/openclaw/openclaw). Automatically extracts entities, facts, and relationships from conversations โ€” building a persistent, queryable knowledge base that grows with every message.

## What it does

Every message your OpenClaw agent processes flows through the Knowledge Engine:

1. **Regex Extraction** (instant, zero cost) โ€” Detects people, organizations, technologies, URLs, emails, and other entities using pattern matching
2. **LLM Enhancement** (optional, batched) โ€” Groups messages and sends them to a local LLM for deeper entity and fact extraction
3. **Fact Storage** โ€” Stores extracted knowledge as structured subject-predicate-object triples with relevance scoring
4. **Relevance Decay** โ€” Automatically decays old facts so recent knowledge surfaces first
5. **Vector Sync** โ€” Optionally syncs facts to ChromaDB for semantic search
6. **Background Maintenance** โ€” Prunes low-relevance facts, compacts storage, runs cleanup

```
User: "We're meeting with Alex from Acme Corp next Tuesday"
  โ”‚
  โ”œโ”€ Regex โ†’ entities: [Alex (person), Acme Corp (organization)]
  โ””โ”€ LLM   โ†’ facts:   [Alex โ€” works-at โ€” Acme Corp]
                       [Meeting โ€” scheduled-with โ€” Acme Corp]
```

## Quick Start

### 1. Install

```bash
cd ~/.openclaw
npm install @vainplex/openclaw-knowledge-engine
```

### 2. Sync to extensions

OpenClaw loads plugins from the `extensions/` directory:

```bash
mkdir -p extensions/openclaw-knowledge-engine
cp -r node_modules/@vainplex/openclaw-knowledge-engine/{dist,package.json,openclaw.plugin.json} extensions/openclaw-knowledge-engine/
```

### 3. Configure

Add to your `openclaw.json`:

```json
{
  "plugins": {
    "entries": {
      "openclaw-knowledge-engine": {
        "enabled": true,
        "config": {
          "workspace": "/path/to/your/workspace",
          "extraction": {
            "regex": { "enabled": true },
            "llm": {
              "enabled": true,
              "endpoint": "http://localhost:11434/api/generate",
              "model": "mistral:7b",
              "batchSize": 10,
              "cooldownMs": 30000
            }
          }
        }
      }
    }
  }
}
```

### 4. Restart gateway

```bash
openclaw gateway restart
```

## Configuration

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `enabled` | boolean | `true` | Enable/disable the plugin |
| `workspace` | string | `~/.clawd/plugins/knowledge-engine` | Storage directory for knowledge files |
| `extraction.regex.enabled` | boolean | `true` | High-speed regex entity extraction |
| `extraction.llm.enabled` | boolean | `true` | LLM-based deep extraction |
| `extraction.llm.model` | string | `"mistral:7b"` | Ollama/OpenAI-compatible model |
| `extraction.llm.endpoint` | string | `"http://localhost:11434/api/generate"` | LLM API endpoint (HTTP or HTTPS) |
| `extraction.llm.batchSize` | number | `10` | Messages per LLM batch |
| `extraction.llm.cooldownMs` | number | `30000` | Wait time before sending batch |
| `decay.enabled` | boolean | `true` | Periodic relevance decay |
| `decay.intervalHours` | number | `24` | Hours between decay cycles |
| `decay.rate` | number | `0.02` | Decay rate per interval (2%) |
| `embeddings.enabled` | boolean | `false` | Sync facts to ChromaDB |
| `embeddings.endpoint` | string | `"http://localhost:8000/..."` | ChromaDB API endpoint |
| `embeddings.collectionName` | string | `"openclaw-facts"` | Vector collection name |
| `embeddings.syncIntervalMinutes` | number | `15` | Minutes between vector syncs |
| `storage.maxEntities` | number | `5000` | Max entities before pruning |
| `storage.maxFacts` | number | `10000` | Max facts before pruning |
| `storage.writeDebounceMs` | number | `15000` | Debounce delay for disk writes |

### Minimal config (regex only, no LLM)

```json
{
  "openclaw-knowledge-engine": {
    "enabled": true,
    "config": {
      "extraction": {
        "llm": { "enabled": false }
      }
    }
  }
}
```

This gives you zero-cost entity extraction with no external dependencies.

### Full config (LLM + ChromaDB)

```json
{
  "openclaw-knowledge-engine": {
    "enabled": true,
    "config": {
      "workspace": "~/my-agent/knowledge",
      "extraction": {
        "llm": {
          "enabled": true,
          "endpoint": "http://localhost:11434/api/generate",
          "model": "mistral:7b"
        }
      },
      "embeddings": {
        "enabled": true,
        "endpoint": "http://localhost:8000/api/v1/collections/facts/add"
      },
      "decay": {
        "intervalHours": 12,
        "rate": 0.03
      }
    }
  }
}
```

## How it works

### Extraction Pipeline

```
Message received
      โ”‚
      โ”œโ”€โ”€โ–ถ Regex Engine (sync, <1ms)
      โ”‚     โ””โ”€ Extracts: proper nouns, organizations, tech terms,
      โ”‚        URLs, emails, monetary amounts, dates
      โ”‚
      โ””โ”€โ”€โ–ถ LLM Batch Queue (async, batched)
            โ””โ”€ Every N messages or after cooldown:
               โ””โ”€ Sends batch to local LLM
               โ””โ”€ Extracts: entities + fact triples
               โ””โ”€ Stores in FactStore
```

### Fact Lifecycle

Facts are stored as structured triples:

```json
{
  "id": "f-abc123",
  "subject": "Alex",
  "predicate": "works-at",
  "object": "Acme Corp",
  "source": "extracted-llm",
  "relevance": 0.95,
  "createdAt": 1707123456789,
  "lastAccessedAt": 1707123456789
}
```

- **Relevance** starts at 1.0 and decays over time
- **Accessed facts** get a relevance boost (LRU-style)
- **Pruning** removes facts below the relevance floor when storage limits are hit
- **Minimum floor** (0.1) prevents complete decay โ€” old facts never fully disappear

### Storage

All data is persisted as JSON files in your workspace:

```
workspace/
โ”œโ”€โ”€ entities.json    # Extracted entities with types and counts
โ””โ”€โ”€ facts.json       # Fact triples with relevance scores
```

Writes use atomic file operations (write to `.tmp`, then rename) to prevent corruption.

## Architecture

```
index.ts                 โ†’ Plugin entry point
src/
โ”œโ”€โ”€ types.ts             โ†’ All TypeScript interfaces
โ”œโ”€โ”€ config.ts            โ†’ Config resolution + validation
โ”œโ”€โ”€ patterns.ts          โ†’ Regex factories (Proxy-based, no /g state bleed)
โ”œโ”€โ”€ entity-extractor.ts  โ†’ Regex-based entity extraction
โ”œโ”€โ”€ llm-enhancer.ts      โ†’ Batched LLM extraction with cooldown
โ”œโ”€โ”€ fact-store.ts        โ†’ In-memory fact store with decay + pruning
โ”œโ”€โ”€ hooks.ts             โ†’ OpenClaw hook registration + orchestration
โ”œโ”€โ”€ http-client.ts       โ†’ Shared HTTP/HTTPS transport
โ”œโ”€โ”€ embeddings.ts        โ†’ ChromaDB vector sync
โ”œโ”€โ”€ storage.ts           โ†’ Atomic JSON I/O with debounce
โ””โ”€โ”€ maintenance.ts       โ†’ Scheduled background tasks
```

- **12 modules**, each with a single responsibility
- **Zero runtime dependencies** โ€” Node.js built-ins only
- **TypeScript strict** โ€” no `any` in source code
- **All functions โ‰ค40 lines**

## Hooks

| Hook | Priority | Description |
|------|----------|-------------|
| `session_start` | 200 | Loads fact store from disk |
| `message_received` | 100 | Extracts entities + queues LLM batch |
| `message_sent` | 100 | Same extraction on outbound messages |
| `gateway_stop` | 50 | Flushes writes, stops timers |

## Testing

```bash
npm test
# Runs 83 tests across 10 test files
```

Tests cover: config validation, entity extraction, fact CRUD, decay, pruning, LLM batching, HTTP client, embeddings, storage atomicity, maintenance scheduling, hook orchestration.

## Part of the Vainplex Plugin Suite

| # | Plugin | Status | Description |
|---|--------|--------|-------------|
| 1 | [@vainplex/nats-eventstore](https://github.com/alberthild/openclaw-nats-eventstore) | โœ… Published | NATS JetStream event persistence |
| 2 | [@vainplex/openclaw-cortex](https://github.com/alberthild/openclaw-cortex) | โœ… Published | Conversation intelligence (threads, decisions, boot context) |
| 3 | **@vainplex/openclaw-knowledge-engine** | โœ… Published | Real-time knowledge extraction (this plugin) |
| 4 | @vainplex/openclaw-governance | ๐Ÿ“‹ Planned | Policy enforcement + guardrails |
| 5 | @vainplex/openclaw-memory-engine | ๐Ÿ“‹ Planned | Unified memory layer |
| 6 | @vainplex/openclaw-health-monitor | ๐Ÿ“‹ Planned | System health + auto-healing |

## License

MIT
tools

Comments

Sign in to leave a comment

Loading comments...