← Back to Plugins
Tools

Ai Longterm Wiki Memory OpenClaw

giovannifrontera By giovannifrontera 👁 17 views ▲ 0 votes

Long-term semantic memory for AI agents (OpenClaw plugin) โ€” dual-representation wiki + LanceDB vectors, atomic writes, auto-synthesis, bge-m3 semantic search, PDF ingestion.

GitHub

Install

npm install &&

Configuration Example

{
  "frontend": {
    "password": "your-password",
    "session_days": 7
  }
}

README

<div align="center">

# AI Longterm Wiki Memory โ€” OpenClaw Plugin

**Semantic long-term memory for AI agents**

Your AI agent forgets everything between sessions. This gives it a structured, self-healing knowledge base it actually maintains โ€” where every page is simultaneously a readable document and a searchable vector.

[![Version](https://img.shields.io/badge/version-3.1.2-informational)](CHANGELOG.md)
[![Tests](https://img.shields.io/badge/tests-124%20passed-brightgreen)](tests/)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/license-AGPL--3.0-blue)](LICENSE)
[![OpenClaw](https://img.shields.io/badge/works%20with-OpenClaw-purple)](https://github.com/openclaw/openclaw)

[Quick Start](#quick-start) ยท [Features](#features) ยท [Architecture](#architecture) ยท [PDF Ingestion](#multi-source-pdf-ingestion-v20) ยท [Web Interface](#web-interface-v21) ยท [Dashboard](#dashboard-observability-v22) ยท [Integration](#integration) ยท [CLI Reference](#cli-reference)

---

</div>

> ### ๐Ÿค– If you are an AI agent reading this repo
>
> **Do not proceed without reading your platform's install file first.**
>
> | You are running onโ€ฆ | Read this file immediately |
> |---------------------|---------------------------|
> | **OpenClaw** | [`AGENTS.md`](AGENTS.md) |
>
> This file contains mandatory installation steps. Skipping them means the wiki context injection will not work and every subsequent action may silently fail.

## Interface

**Graph view** โ€” the D3 force graph shows all wiki pages as nodes, coloured by category. When the agent retrieves pages to answer a prompt, the activated nodes pulse gold in real time via WebSocket.

![Graph view with query-hit animation](docs/images/graph-view.svg)

**Stats tab** โ€” embedding coverage, most-queried pages, stale pages, semantic duplicate warnings, and auto-lint countdown.

![Stats dashboard](docs/images/stats-tab.svg)

---

## The problem

AI agents forget everything between sessions. Existing memory systems are flat โ€” a pile of timestamped facts, not a knowledge base. When you work on recurring research (academic literature, competitive analysis, trading signals, legal cases), you need knowledge that is **organized, interconnected, and semantically searchable** โ€” and that grows over time without human bookkeeping.

## What it does

AI Longterm Wiki Memory gives your agent a **three-layer brain** it maintains autonomously โ€” all layers indexed together in a single LanceDB vector space:

| Layer | Directory | Contents | Who writes |
|-------|-----------|----------|------------|
| **Domain knowledge** | `wiki-works/<topic>/` | Deep knowledge per domain: concepts, research, entities | INGEST workflow |
| **Distilled knowledge** | `wiki/` | Cross-domain knowledge, promoted autonomously when useful across โ‰ฅ2 topics | Agent (autonomous promotion) |
| **Identity** | `wiki/identity/` | Behavioral patterns, values, style โ€” learned from corrections | Only `wiki.py self-reflect` |

The agent ingests web pages, papers, and PDFs; retrieves by semantic meaning (not keywords); promotes knowledge autonomously between layers; detects stale or contradictory knowledge; and synthesizes new pages automatically when multiple sources support a non-obvious inference โ€” all without corrupting the knowledge base even if a process crashes mid-operation.

```
User: "study this paper on RAG architectures"

Agent: [INTENT: INGEST | WORKSPACE: research | CONFIDENCE: high]
       โ†’ writes structured pages as .tmp files
       โ†’ wiki.py ingest: atomic staging โ†’ production commit
       โ†’ markdown + embeddings written in the same operation
       โ†’ "2 pages written. Mini-lint: ok."
       โ†’ checks promotion criteria: retrieved in โ‰ฅ3 queries, cross-domain?
       โ†’ promotes to wiki/concepts/rag.md autonomously if criteria met

User: "what do you know about retrieval-augmented generation?"

Agent: [INTENT: QUERY | WORKSPACE: research | CONFIDENCE: high]
       โ†’ <wiki-context> already injected (pre-prompt hook)
       โ†’ reads relevant pages, synthesizes with citations
       โ†’ synthesis meets threshold โ†’ auto-saved as new wiki page

User: "stop adding a summary at the end of every response"

Agent: [INTENT: BEHAVIOR_FEEDBACK | CONFIDENCE: high]
       โ†’ wiki.py behavior-log --event "no trailing summary"
       โ†’ at session end: wiki.py self-reflect โ†’ wiki/identity/ updated
```

---

## The core idea: wiki and vector DB as one

> **Karpathy's wiki pattern** ([gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)) has the LLM navigate the wiki by *reading* markdown files. This breaks down at scale โ€” the agent cannot scan dozens of pages on every query.

This project solves that with a **dual-representation architecture**: every page has two synchronized forms.

```
  Write a wiki page
        โ”‚
        โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Markdown file    โ”‚     โ”‚  LanceDB vector store     โ”‚
โ”‚  wiki/concepts/   โ”‚โ—„โ”€โ”€โ”€โ”€โ–บ  bge-m3 embeddings        โ”‚
โ”‚  rag.md           โ”‚     โ”‚  (1024-dim, HNSW index)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
   humans browse               LLM retrieves
   LLM generates               semantically
```

Markdown and embeddings are **written atomically** and kept in sync at all times. The lint pass detects and repairs any drift.

A query about *"how LLMs handle long context"* retrieves pages about *"positional encoding"* and *"sliding window attention"* โ€” with no keyword overlap โ€” because the meaning is close in embedding space.

---

## Features

### Semantic vector search
[bge-m3](https://huggingface.co/BAAI/bge-m3) embeddings โ€” multilingual (100+ languages), 1024-dim, HNSW index. Queries retrieve by meaning. No re-indexing step. The vector DB is the index, maintained continuously.

### Atomic writes โ€” crash-safe
Every ingest follows a `.tmp โ†’ staging LanceDB โ†’ atomic promotion` pattern. A crash leaves the system in a detectable state (`in-progress` in `wiki-session.md`). The agent recovers at the next session with no data loss, no silent corruption.

### Pre-prompt context injection
`wiki_context.py` runs a vector search **before every user message** and prepends a `<wiki-context>` block with the most relevant pages. This eliminates the main failure mode of skill-based approaches โ€” the agent getting context only when it classifies a message as QUERY:

```
User types a message
        โ”‚
        โ–ผ
wiki_context.py โ†’ vector search
        โ”‚
        โ–ผ
<wiki-context> block prepended to the prompt
        โ”‚
        โ–ผ
Agent has relevant context โ€” regardless of intent classification
```

Install for OpenClaw (one command):
```bash
py scripts/setup_openclaw.py --workspace /path/to/workspace
```

### Multi-project routing
Define multiple research domains in `wiki.config.json` with keyword lists. The agent auto-selects the right workspace from message content โ€” no manual specification needed.

### Automatic synthesis
When a query response integrates โ‰ฅ2 wiki sources, exceeds 300 tokens, and adds non-literal inference, the agent saves it as a new wiki page with embeddings. Knowledge compounds over time.

### Self-healing lint
`wiki.py lint --full` detects and repairs:
- **Broken wiki links** (`[[page]]` with no matching file)
- **Orphan LanceDB entries** (vectors for deleted files โ€” auto-removed)
- **Renames** (file moved โ†’ updates DB path without re-embedding via `content_hash`)
- **Semantic duplicates** (cosine similarity > 0.95 across pages)

### Token-budget index
`index.md` respects a configurable token budget (default 4000). When exceeded, applies reduction strategies automatically โ€” so the agent can navigate even on small context windows.

### Observability dashboard
A `[Stats]` tab in the web frontend gives a live view of the wiki health: pages embedded vs unembedded, stale pages (configurable threshold), top-10 most queried pages, lint status with last-run timestamp and warning count, and the auto-lint schedule. Lint can also be triggered manually from the browser.

### Autonomous promotion
When a page from `wiki-works/<topic>/` is retrieved in โ‰ฅ3 distinct queries and proves relevant across โ‰ฅ2 topics, the agent promotes it to `wiki/` without user confirmation โ€” cross-domain knowledge compounds automatically.

### Semantic deduplication
`wiki.py lint --full` detects semantically similar pages via cosine similarity. Similarity โ‰ฅ 0.90 โ†’ auto-merge candidate; 0.75โ€“0.90 โ†’ user warning. Configurable via `thresholds.dedup_auto` and `thresholds.dedup_warn`.

### Behavioral self-reflection (Identity layer)
When the user corrects the agent's behavior ("always", "never", "stop doing X"), the correction is logged with `wiki.py behavior-log`. At end of session, `wiki.py self-reflect` reads the log and autonomously updates `wiki/identity/` when a pattern reaches the threshold (default: 3 occurrences). The agent learns without human approval of each update.

---

## Multi-source PDF ingestion *(v2.0)*

Any PDF from any source converges at `pdf-inbox/` and is processed automatically.

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Telegram chat  โ”‚   โ”‚  CLI / URL       โ”‚   โ”‚  Manual file drop โ”‚
โ”‚  (attachment)   โ”‚   โ”‚  (ingest-pdf)    โ”‚   โ”‚  (filesystem)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                     โ”‚                       โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ–ผ
                    workspace/pdf-inbox/
                       paper.pdf
                    .registry.json  โ† SHA-256 hash per file
                               โ”‚
                    wiki.py scan-inbox
                               โ”‚
                    wiki_pdf_watcher.py
                       extract_text (pdfplumber)
                               โ”‚
                               โ–ผ
              wiki-works/<project>/raw/paper.md
              (frontmatter: source: pdf, original, e

... (truncated)
tools

Comments

Sign in to leave a comment

Loading comments...