Memoryclaw

Name: Memoryclaw
Rating: 3.5 (1 reviews)
Author: imjohnzakkam

By imjohnzakkam 👁 55 views ▲ 0 votes

A brain-inspired, hierarchical memory system as a plugin layer for OpenClaw

GitHub

Configuration Example

{
  "goal": "send email to John about project update",
  "plan": ["get_recipient", "compose", "send"],
  "facts": {
    "recipient_email": "[email protected]",
    "project_deadline": "2025-04-10"
  },
  "recent_observations": ["draft created"],
  "active_skill": "send_email"
}

README

# MemoryClaw: A Deterministic Memory Layer for OpenClaw

## Complete Specification — Revised Draft

---

## 1. Executive Summary

MemoryClaw is a **brain-inspired, hierarchical memory system** designed as a plugin layer for the OpenClaw personal AI assistant. It addresses inefficiencies in current agent memory architectures by replacing opaque vector embeddings with **file-based storage and transparent retrieval**. MemoryClaw reduces token consumption, eliminates embedding costs, and makes every memory operation debuggable.

MemoryClaw integrates with OpenClaw as a suite of plugins, leveraging its existing skills, cron, and tooling ecosystems. It is designed for users who value privacy, transparency, and long-term learning in their personal AI.

**Key design goals:**

- All memories stored in human-readable markdown files, editable and version-controllable.
- Primary retrieval via keyword and tag matching on pre-summarized episodes, with a hybrid vector fallback available from day one for cases where keyword matching falls short.
- Only a compact working memory payload is injected into the LLM context, typically 400–800 tokens including system framing.
- A consolidation daemon continuously summarizes raw interactions, extracts facts, and detects reusable action patterns.
- No embedding API calls required for default operation; retrieval is pure file I/O, with local LLMs handling summarization.

---

## 2. The Memory Model: Inspired by Human Cognition

MemoryClaw implements a four-tier memory hierarchy, mirroring theories of human memory. Each tier serves a distinct purpose and has its own storage format, access pattern, and known limitations.

| Tier | Name | Function | Storage | Access |
|------|------|----------|---------|--------|
| 1 | **Working Memory** | Active context for current task | In-memory JSON (agent session) | Direct |
| 2 | **Episodic Memory** | Compressed records of past interactions | Markdown files with YAML frontmatter | Keyword/tag + optional vector |
| 3 | **Semantic Memory** | Persistent facts and relationships | Markdown files (structured lists/tables) | Entity-based lookup |
| 4 | **Procedural Memory** | Reusable skills compiled from experience | OpenClaw skill files (JSON/JavaScript) | Intent matching |

### 2.1 Working Memory

**Purpose:** Holds everything the LLM needs for the current reasoning step. It is the active context window of the agent.

**Structure:** A JSON object with configurable size limits (default ~2k tokens). Example:

```json
{
  "goal": "send email to John about project update",
  "plan": ["get_recipient", "compose", "send"],
  "facts": {
    "recipient_email": "[email protected]",
    "project_deadline": "2025-04-10"
  },
  "recent_observations": ["draft created"],
  "active_skill": "send_email"
}
```

**Lifecycle:** Created at the start of a task, updated after each LLM response or tool result, and cleared upon task completion.

### 2.2 Episodic Memory

**Purpose:** Stores compressed, tagged summaries of past interactions for future reference.

**Storage:** Markdown files in `memoryclaw/episodes/` with naming convention `YYYY-MM-DD_HH-MM-SS_summary.md`. Each file contains YAML frontmatter (timestamp, tags, summary, participants) followed by structured detail.

**File format:**

```markdown
---
timestamp: 2025-04-08T14:32:10Z
tags: [email, projectX, deadline]
summary: "User asked to send project update email to John. Used send_email skill. Email contained deadline info."
participants: [user, assistant]
confidence: high
---
**Details:**
- Recipient: John <[email protected]>
- Subject: Project update
- Body: "Hi John, just a reminder that the deadline is April 10th..."
```

**Retrieval:** Primary retrieval is via keyword search on the `summary` and `tags` fields. An inverted index (SQLite) is recommended for performance once episode count exceeds a few hundred. A hybrid vector fallback is available as a configurable option (see Section 3.3).

### 2.3 Semantic Memory

**Purpose:** Persistent, decontextualized facts such as contacts, preferences, and project metadata.

**Storage:** One or more markdown files in `memoryclaw/semantic/` (e.g., `contacts.md`, `projects.md`, `preferences.md`). Format is simple key-value lists, YAML, or markdown tables.

```markdown
# Contacts
- John: [email protected]
- Sarah: [email protected]

# Projects
- projectX:
    deadline: 2025-04-10
    stakeholders: [John, Sarah]
```

**Retrieval:** Direct entity lookup via regex or simple parsing. Results cached in memory after first read.

**Scaling note:** This flat-file approach works well for personal-scale data (tens to low hundreds of entities). For users who accumulate hundreds of entities with complex relationships, the semantic layer will need to migrate toward a lightweight structured store (e.g., SQLite with a markdown export layer). The roadmap accounts for this as a Phase 7 enhancement, but implementers should be aware that flat markdown does not scale indefinitely.

### 2.4 Procedural Memory

**Purpose:** Reusable routines for common tasks, potentially executed without LLM invocation.

**Storage:** OpenClaw skill files in `memoryclaw/skills/`, in either JSON declarative format (for simple sequences) or JavaScript format (for complex logic using OpenClaw's skill API).

**JSON declarative skill example:**

```json
{
  "name": "send_email",
  "triggers": ["send email", "email"],
  "preconditions": {
    "recipient": {"type": "email", "source": "semantic or user"},
    "subject": {"type": "string", "optional": false},
    "body": {"type": "string", "optional": false}
  },
  "steps": [
    {"action": "call_tool", "tool": "email_api", "params": {
      "to": "$recipient",
      "subject": "$subject",
      "body": "$body"
    }}
  ]
}
```

**JavaScript skill example:**

```javascript
export default {
  name: 'send_email',
  triggers: ['send email', 'email'],
  async run(params, context) {
    // implementation
  }
};
```

**Compilation:** Skills can be automatically suggested by the consolidation daemon when repeated action patterns are detected. They can also be written manually. See Section 3.5 for an honest discussion of the challenges involved in automatic skill compilation.

---

## 3. How MemoryClaw Works: End-to-End Flow

### 3.1 User Interaction Cycle

1. **User sends a message** via any OpenClaw-supported channel (WhatsApp, Telegram, CLI, etc.).
2. **OpenClaw routes the message** to the appropriate agent and invokes its configured pipeline.
3. **MemoryClaw retrieval plugin** is triggered before the LLM call. It extracts keywords from the user query and current goal, searches episodic memory for relevant summaries, looks up semantic facts for mentioned entities, and (optionally) falls back to vector search if keyword results are insufficient. The top results are injected into the agent's working memory.
4. **LLM call** occurs with the working memory as context plus the system prompt. Realistic total prompt size is typically 400–800 tokens depending on how many episodes and facts are retrieved.
5. **LLM response** may include tool calls, plan updates, or a final answer.
6. **Post-response hook** logs the raw interaction (full transcript) to `memoryclaw/logs/`.
7. **Consolidation daemon** (running as a cron job) later processes raw logs: summarizes them, writes episode files, updates semantic memory, and detects action patterns.
8. **User receives answer** (or sees a skill executed).

### 3.2 Retrieval Strategy: Strengths and Limitations

The default retrieval mechanism is keyword and tag matching on episode summaries and semantic files. This approach is deterministic, transparent, and fast. When a memory is recalled, the user can see exactly which keywords matched, making the system fully debuggable.

**Where keyword matching excels:**

- Queries that use the same vocabulary as the stored episodes (e.g., "what did I email John about?" when the episode is tagged with "email" and "John").
- Lookups involving proper nouns, project names, dates, and other concrete terms.
- Cases where the user has built up a consistent tagging vocabulary over time.

**Where keyword matching struggles:**

- Semantic paraphrasing: if the user asks about "that budget conversation with marketing" but the episode was tagged with "finance, campaign-team, Q3-review," keyword matching will miss it entirely.
- Vague or abstract queries: "that thing we discussed last week about the restructuring" may not match any stored keywords.
- Cross-domain connections: linking a "flight booking" episode to a "travel reimbursement" query requires understanding that these are related concepts, not just matching words.

Because these failure modes are common in real usage, MemoryClaw includes a hybrid retrieval option from Phase 1. When keyword matching returns fewer than the configured minimum results, the system can fall back to a lightweight vector similarity search using a local embedding model (e.g., via Ollama). This fallback is off by default to preserve the deterministic-first philosophy but can be enabled in configuration. The intent is that keyword matching handles the common case cheaply and transparently, while vector search catches the long tail of semantically related but lexically different queries.

### 3.3 Hybrid Retrieval Configuration

The retrieval pipeline supports a configurable fallback chain:

```yaml
retrieval:
  primary: keyword          # Always runs first
  minPrimaryResults: 2      # Minimum results before fallback triggers
  fallback: vector           # Options: none, vector
  vectorModel: nomic-embed   # Local model via Ollama
  maxResults: 5
  blendStrategy: primary_first  # keyword results ranked above vector
```

When the fallback is set to `none`, the system operates in pure deterministic mode. When set to `vector`, the system runs vector search only when keyword matching produces fewer than `minPrimaryResults`. Results from both sources are blended, with keyword matches ranked first to prese

... (truncated)

tools