Voice
Local Memory
Local-first memory plugin for OpenClaw: auto capture, hybrid vector recall, token budgeting
Install
pip install -r
README
# OpenClaw Local Memory v1.2
Local-first memory service for OpenClaw-style agents.
This repo is now organized to be **plugin-ready** for OpenClaw deployments while still preserving the current standalone Python service, API, demo, and CLI.
It does **not** claim private knowledge of OpenClaw plugin internals. Where integration details are unknown, this repo provides conservative scaffolding, documented hook flow, and example config snippets instead of pretending the wiring already exists.
## What this project is
Today, this project gives you a reusable local memory core with:
- automatic memory capture from conversation lines
- profile fact extraction
- working-summary compression
- token-budgeted recall context assembly
- local hybrid retrieval using lexical ranking + lightweight local vectors
- a FastAPI service
- a Python CLI
- a plugin-style adapter module for future OpenClaw integration
## What changed in v1.2
v1.2 keeps the current MVP behavior intact and makes the repo easier to adopt across other OpenClaw installs:
- adds a **plugin-oriented project structure**
- adds a small **Python adapter layer** in `local_memory/plugin.py`
- adds an `integration/` directory with:
- documented hook/invocation flow
- example OpenClaw-facing config snippets
- environment template guidance
- adds a small local-safe installer script in `scripts/install_local.sh`
- expands the README to explain what is production-ready vs scaffolded
- preserves the existing Python service/API/CLI entrypoints
## Current status: ready vs scaffolded
### Ready now
These parts are real and working today:
- `local_memory/` Python package
- `local_memory.api` FastAPI app
- `local_memory.cli` package CLI
- root compatibility wrappers `app.py` and `cli.py`
- local SQLite-backed memory store
- hybrid recall with lexical fallback
- demo and tests
- plugin-style Python adapter helpers in `local_memory.plugin`
### Scaffolded / integration placeholders
These parts are intentionally conservative scaffolding:
- `integration/openclaw-hook-flow.md`
- `integration/openclaw-config.example.toml`
- `integration/env.openclaw-local-memory.example`
- any OpenClaw-side event hook wiring described in docs
Those files document how an OpenClaw deployment could call this project, but they do **not** claim that a full native OpenClaw plugin runtime has already been implemented here.
## Why this approach
Instead of requiring cloud embeddings or a large model download, this project uses a lightweight local vectorizer that stays practical on small self-hosted machines:
- hashed word features
- character n-grams for fuzzy matching
- a small alias/expansion map for near-synonyms like `car` β `vehicle`, `fuel` β `petrol`, `compress` β `summary`
That gives practical semantic-ish retrieval while staying local-first and cheap.
## Architecture
```text
conversation or agent event
-> local memory adapter / service
-> auto-remember
-> low-value filter
-> episodic memory store
-> profile fact extractor
-> working-summary compressor
-> local vector embed + store in SQLite
-> recall(query)
-> profile facts
-> hybrid search memories
-> lexical overlap
-> local vector similarity
-> fallback to lexical-only if embeddings unavailable
-> token budgeter
-> compact context block
-> caller injects returned context into model prompt
```
## Plugin-oriented layout
```text
openclaw-local-memory/
βββ local_memory/
β βββ api.py # FastAPI service
β βββ cli.py # package CLI
β βββ plugin.py # plugin-style adapter helpers
β βββ service.py # memory pipeline
β βββ db.py # SQLite storage + retrieval
β βββ vectorizer.py # local vectorizer
β βββ ...
βββ integration/
β βββ README.md
β βββ openclaw-hook-flow.md
β βββ openclaw-config.example.toml
β βββ env.openclaw-local-memory.example
βββ scripts/
β βββ install_local.sh
βββ app.py # root FastAPI compatibility wrapper
βββ cli.py # root CLI compatibility wrapper
βββ demo.py
βββ test_local_memory.py
```
## Memory layers
- **profile**: durable user facts and preferences
- **working**: compressed task summaries
- **episodic**: raw useful memory snippets
## Token-saving strategy
- skip low-value lines like `ok` or `thanks`
- compress long text into short summaries
- keep profile / working / memory in separate budgets
- return only top relevant items
- estimate token usage before building final context
- use semantic recall so prompts can stay short instead of replaying large chat history
## Core files
- `local_memory/api.py` β FastAPI service
- `local_memory/cli.py` β package CLI
- `local_memory/plugin.py` β plugin-style adapter helpers
- `local_memory/db.py` β SQLite storage + hybrid recall
- `local_memory/service.py` β memory pipeline
- `local_memory/vectorizer.py` β pure-Python local embedding/vectorizer
- `local_memory/extractor.py` β low-value filter + profile extraction
- `local_memory/compression.py` β summaries + token budgeting
- `demo.py` β local self-test / demo
- `test_local_memory.py` β executable unit tests
- `integration/` β documented OpenClaw-facing integration examples
- `scripts/install_local.sh` β local setup helper
## Quick start
```bash
cd /root/.openclaw/workspace/openclaw-local-memory
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
```
## Configuration
Environment variables:
- `LOCAL_MEMORY_DB` β SQLite path, default `memory.db`
- `LOCAL_MEMORY_TOP_K` β recall result count, default `5`
- `LOCAL_MEMORY_CONTEXT_BUDGET` β max context budget, default `600`
- `LOCAL_MEMORY_PROFILE_BUDGET` β profile budget, default `150`
- `LOCAL_MEMORY_WORKING_BUDGET` β working-summary budget, default `250`
- `LOCAL_MEMORY_MIN_SCORE` β recall threshold, default `0.35`
- `LOCAL_MEMORY_EMBEDDINGS` β `true`/`false`, default `true`
- `LOCAL_MEMORY_EMBED_DIM` β hash vector dimension, default `256`
See also:
- `.env.example`
- `integration/env.openclaw-local-memory.example`
### Fallback behavior
The system degrades gracefully:
- If `LOCAL_MEMORY_EMBEDDINGS=false`, recall runs in **lexical-only** mode.
- If a row has no stored vector, that row can still match lexically.
- If vector JSON is missing or corrupt, the row still participates lexically.
- If you already have an old v1 database, new memories start storing vectors automatically; older memories still work through lexical retrieval.
## Run API
```bash
source .venv/bin/activate
uvicorn local_memory.api:app --host 127.0.0.1 --port 8012
```
Endpoints:
- `GET /health`
- `POST /capture`
- `POST /profile/upsert`
- `POST /auto-remember`
- `POST /recall`
`/health` reports whether embeddings are enabled.
## Run CLI demo
```bash
source .venv/bin/activate
python demo.py
```
Demo shows:
- profile/working capture
- normal hybrid recall
- semantic-ish recall (`fuel efficient vehicle`) matching stored text about `hybrid cars` and `petrol`
## CLI examples
### Package CLI
```bash
python -m local_memory.cli auto-remember \
"I prefer Traditional Chinese for replies." \
"We are building a reusable OpenClaw memory service." \
"The system should reduce token usage."
python -m local_memory.cli recall "traditional chinese token usage"
```
### Root compatibility CLI
```bash
python cli.py remember \
"I prefer Traditional Chinese for replies." \
"Token budgets matter for memory recall."
```
## Plugin-style Python usage
If you want to integrate this from another Python component, use the adapter layer rather than importing multiple internals directly.
```python
from local_memory.plugin import LocalMemoryPluginAdapter
adapter = LocalMemoryPluginAdapter()
adapter.auto_remember(
[
"I prefer Traditional Chinese for replies.",
"We are building a reusable OpenClaw memory service.",
],
user_id="roy",
conversation_id="chat-42",
)
result = adapter.recall(
query="traditional chinese reusable memory",
user_id="roy",
)
print(result.context)
adapter.close()
```
Or use the context-oriented helper:
```python
from local_memory.plugin import build_context_for_query
payload = build_context_for_query(
query="traditional chinese token budget",
user_id="roy",
)
print(payload["context"])
```
## Plugin architecture for OpenClaw adoption
This repo is best treated as two layers:
### 1) Reusable memory core
A standalone Python package/service that owns:
- storage
- auto-capture logic
- recall logic
- context assembly
- local retrieval behavior
### 2) OpenClaw integration layer
A thin adapter in the OpenClaw side that decides:
- when to call `auto_remember`
- when to call `recall`
- what user/session/conversation IDs to pass through
- how to inject returned `context` into prompts
- whether to run this in-process or over HTTP
This separation keeps the memory logic portable and avoids pretending there is only one OpenClaw runtime shape.
## Conservative OpenClaw integration model
Because OpenClaw plugin internals may vary, the recommended flow is:
1. OpenClaw receives a message or agent event.
2. The integration layer derives stable IDs:
- `user_id`
- `conversation_id`
- optional source metadata
3. Before or after model execution, the integration layer calls this project:
- `auto_remember(...)` for useful conversation lines
- `recall(query=...)` when memory context is needed
4. The returned `context` block is injected into the model prompt.
5. The integration layer remains responsible for hook timing and policy.
More detail and examples live under `integration/`.
## Recall output
The system returns:
- compact profile facts
- working summary bullets
- relevant episodic memories
- a single pre-composed `context` block
- token estimate / budget used
- `retrieval_mode` (`lexical` or `hybrid`)
- `embeddings_enabled`
- per-result `lexical_score`,
... (truncated)
voice
Comments
Sign in to leave a comment