Tools

Conversation Memory Engine

Name: Conversation Memory Engine
Rating: 3.5 (1 reviews)
Author: cortexuvula

By cortexuvula 👁 222 views ▲ 0 votes

OpenClaw plugin: captures conversations before compaction, clusters into topics via Cerebras, injects relevant context via qmd

GitHub

Install

pip install qmd

README

# Conversation Memory Engine

An OpenClaw plugin that captures conversations before compaction, clusters them into topics using Cerebras, and injects relevant context back into future prompts — giving your agent persistent, searchable memory across sessions.

## How It Works

```
Every session                    On compaction / /reset
─────────────────────            ──────────────────────────────────────────
message_received  →  update      before_compaction  →  extract user/assistant
last_message_at                                         pairs
                                                    →  Cerebras clusters into
llm_output        →  increment                          topics (JSON)
pair_count                                          →  append to
                                                        conversation-log.jsonl
before_prompt_build:                                →  write topic .md files
  Path A — gap > 20 min          →  qmd search on   →  qmd update (index them)
  Path B — every 5 pairs         →  Cerebras intent
                                    → qmd search
  Path C — nothing               →  skip (zero cost)
```

### Three paths for context injection

| Path | Trigger | Method | Cost |
|------|---------|--------|------|
| **A** | Gap > 20 min OR post-compaction | qmd BM25 on raw prompt | ~0ms (local) |
| **B** | Every 5th exchange | Cerebras intent extract → keywords → qmd | ~4s (Cerebras API) |
| **C** | Otherwise | Skip | Zero |

Path A uses a **corpus gate** — it won't fire until you have 5+ topic files (avoids wasteful searches on an empty corpus).

## Requirements

- [OpenClaw](https://github.com/openclaw/openclaw) with the extensions API
- [Cerebras API key](https://cloud.cerebras.ai) (free tier available)
- [qmd](https://github.com/qmd-app/qmd) — local hybrid BM25+semantic search

## Installation

### 1. Install qmd

```bash
pip install qmd
```

### 2. Index your workspace

```bash
qmd init    # creates a qmd.config.json in your workspace
qmd update  # indexes your memory/ folder
qmd embed   # generates embeddings (optional but improves Path B)
```

### 3. Copy the plugin

```bash
mkdir -p ~/.openclaw/extensions/conversation-memory
cp index.ts ~/.openclaw/extensions/conversation-memory/
```

Or if your OpenClaw workspace is `~/clawd`:

```bash
mkdir -p ~/clawd/.openclaw/extensions/conversation-memory
cp index.ts ~/clawd/.openclaw/extensions/conversation-memory/
```

### 4. Set your Cerebras API key

**Option A** — Environment variable (recommended):
```bash
export CONV_MEMORY_CEREBRAS_KEY="your-api-key-here"
```

**Option B** — Key file (default path):
```bash
mkdir -p ~/.credentials
echo "your-api-key-here" > ~/.credentials/cerebras-api-key.txt
chmod 600 ~/.credentials/cerebras-api-key.txt
```

**Option C** — Custom key file path:
```bash
export CONV_MEMORY_CEREBRAS_KEY_FILE="/path/to/your/api-key.txt"
```

### 5. Restart the OpenClaw gateway

```bash
openclaw gateway restart
# or
systemctl --user restart openclaw-gateway.service
```

### 6. Verify it loaded

```bash
journalctl --user -u openclaw-gateway.service -n 20 | grep conv-memory
# Expected: [conv-memory] All 5 hooks registered — Phase 4+6 active
```

## Configuration

All configuration is via environment variables. Defaults are shown.

```bash
# Workspace path (where OpenClaw lives)
CONV_MEMORY_WORKSPACE=~/clawd

# Cerebras API key (inline — takes priority over key file)
CONV_MEMORY_CEREBRAS_KEY=

# Cerebras API key file path
CONV_MEMORY_CEREBRAS_KEY_FILE=~/.credentials/cerebras-api-key.txt

# Cerebras model to use
CONV_MEMORY_CEREBRAS_MODEL=qwen-3-235b-a22b-instruct-2507

# Session key to monitor (your main human<>agent session)
CONV_MEMORY_SESSION=agent:main:main

# qmd collections to search (comma-separated)
CONV_MEMORY_QMD_COLLECTIONS=memory,obsidian

# Gap in minutes before Path A fires
CONV_MEMORY_GAP_THRESHOLD_MIN=20

# Minimum topic files before Path A fires (corpus gate)
CONV_MEMORY_MIN_CORPUS_SIZE=5

# Maximum characters to inject into system prompt
CONV_MEMORY_MAX_INJECT_CHARS=800

# Pair check frequency for Path B
CONV_MEMORY_PAIR_CHECK_FREQ=5

# Minimum confidence for Cerebras intent to trigger injection (0.0–1.0)
CONV_MEMORY_INTENT_CONFIDENCE=0.7
```

You can set these in your shell profile, or in a `.env` file if your OpenClaw setup loads one.

## Output Files

```
<workspace>/memory/
    conversation-log.jsonl          # one JSON block per compaction
    hook-state.json                 # pair_count, last_message_at, etc.
    conversation-topics/
        2026-02-21T08-05-07Z-plugin-architecture-0.md
        2026-02-21T08-05-07Z-qmd-integration-1.md
        ...                         # grows with each compaction
```

Each topic file is a short Markdown document — searchable via qmd, readable by humans.

## Tuning

The corpus gate (`MIN_CORPUS_SIZE=5`) means Path A stays quiet until you've had enough compaction cycles to build a useful search corpus. After the first 5 compactions (~1-2 weeks of active use), you'll start seeing Path A fire and inject relevant context.

**Milestones** — the included `conversation-compact.sh` companion script logs a note to your daily memory file when the corpus crosses 5, 20, 50, or 100 topic files — signalling when to re-tune thresholds.

Re-tuning checklist (run at each milestone):
- What's the Path A hit rate? (check gateway logs for "qmd returned nothing")
- Are Path B injections relevant? (look at `intent=` in logs)
- Is 800 chars enough context, or too much?
- Should you adjust `PAIR_CHECK_FREQ` up or down?

## Cerebras Model

Default: `qwen-3-235b-a22b-instruct-2507` — a fast, high-quality model for structured JSON extraction. Cerebras's hardware delivers responses in ~1–2 seconds even at 235B parameters.

You can swap to any Cerebras-hosted model:
```bash
CONV_MEMORY_CEREBRAS_MODEL=llama-4-scout-17b-16e-instruct
```

## Architecture Notes

- **State is in-memory** — `hook-state.json` is the source of truth but the in-memory cache avoids read-modify-write races between hooks
- **All disk writes are atomic** — temp file + rename, unique per call
- **Fire-and-forget compaction** — `processCompaction` runs async after returning from the hook; never blocks the main session
- **Fallback on Cerebras failure** — if the API call fails, raw pairs are written to `conversation-log.jsonl` so nothing is lost
- **qmd is optional** — if qmd isn't installed or returns nothing, the plugin silently skips injection (no errors)

## License

MIT

tools