Memory Hierarchical

Name: Memory Hierarchical
Rating: 3.5 (1 reviews)
Author: antra-tess

By antra-tess 👁 367 views ▲ 0 votes

Hierarchical autobiographical memory plugin for OpenClaw agents

GitHub

Install

openclaw plugins install openclaw-memory-hierarchical

Configuration Example

{
  "plugins": {
    "openclaw-memory-hierarchical": {
      "kind": "memory"
    }
  }
}

README

# Hierarchical Memory Plugin for OpenClaw

A memory system that gives AI agents autobiographical recall across unlimited conversation length. Instead of losing context when conversations grow beyond the model's window, the plugin continuously compresses older conversation into layered summaries — like how human memory works, where distant events are remembered in broad strokes while recent ones stay vivid.

## How it works

The plugin runs a background worker every 5 minutes that reads the agent's conversation history and builds a three-level memory hierarchy:

```
Raw conversation → L1 summaries → L2 summaries → L3 summaries
(~6k token chunks)  (~2k tokens each)  (6 L1s merged)   (6 L2s merged)
```

**L1 (recent memory):** The worker takes chunks of ~6,000 tokens from the raw conversation and asks the model to write a ~2,000 token autobiographical summary. The model sees its full memory hierarchy as context, so each summary is written with awareness of the bigger picture.

**L2 (earlier context):** When 6 L1 summaries accumulate, they're consolidated into a single L2 summary. The model reads all six as its own memories and merges them into a cohesive narrative.

**L3 (long-term memory):** Same pattern — 6 L2s merge into an L3. These capture the arc of the entire relationship.

### The compression model IS the agent

A key design choice: compression doesn't use a separate summarizer prompt. Instead, the model receives a multi-turn conversation where its own prior memories appear as assistant messages (things it "said"), raw conversation keeps its original user/assistant structure, and a Context Manager marks boundaries. There's no system prompt — the conversation structure itself establishes the model's voice. This means memories sound like the agent remembering, not like a third party's notes.

### Non-redundant context

Each compression instance sees the full hierarchy without duplication. If an L2 summary was made from L1s 0001–0006 and all six L1s are visible in context, the L2 is excluded (its information is already there at higher detail). This anti-redundancy rule prevents the model from re-narrating the same events at different compression levels.

### Prompt cache optimization

The message array puts stable context (higher-level summaries) first and changing content (the chunk being compressed) last. Since the worker processes chronologically, upper layers only grow during a run — they never shrink. This means consecutive compression calls share a large cacheable prefix, dramatically reducing costs for backfill processing.

## Memory injection

Before each agent run, the plugin injects memories as an assistant-role message at the start of the conversation. The injection uses per-layer token budgets with waterfall carryover:

| Layer | Default budget | Content |
|-------|---------------|---------|
| L3 | 30,000 tokens | Long-term memory (oldest first) |
| L2 | 30,000 tokens | Earlier context (oldest first) |
| L1 | 30,000 tokens | Recent memory (oldest first) |

If a layer doesn't use its full budget, the remainder cascades to the next layer down. For example, if L3 summaries only use 12k tokens, the extra 18k flows to L2's budget (making it 48k effective). This means the agent always gets as much memory as possible within the total budget.

All summaries (including merged ones) are eligible for injection — the budget system replaces the old behavior of only injecting unmerged summaries.

## Installation

The plugin requires the `injectMessages` hook feature ([PR #11732](https://github.com/antra-tess/openclaw/pull/11732)) which adds the ability for `before_agent_start` hooks to inject messages into the conversation history.

```bash
openclaw plugins install openclaw-memory-hierarchical
```

Add to `openclaw.json`:
```json
{
  "plugins": {
    "openclaw-memory-hierarchical": {
      "kind": "memory"
    }
  }
}
```

> **Note:** OpenClaw validates config before plugin install. If validation fails, remove the plugin config entry, install, then restore it.

## Configuration

All settings are optional — defaults work well for most setups.

```json
{
  "plugins": {
    "openclaw-memory-hierarchical": {
      "kind": "memory",
      "chunkTokens": 6000,
      "summaryTargetTokens": 2000,
      "mergeThreshold": 6,
      "pruningBoundaryTokens": 30000,
      "workerInterval": "5m",
      "maxLevels": 3,
      "model": "claude-sonnet-4-5",
      "injectionBudget": {
        "L3": 30000,
        "L2": 30000,
        "L1": 30000
      }
    }
  }
}
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `chunkTokens` | 6000 | Minimum tokens in a raw chunk before L1 summarization |
| `summaryTargetTokens` | 2000 | Target token count for each summary |
| `mergeThreshold` | 6 | Summaries at a level before merging to the next |
| `pruningBoundaryTokens` | 30000 | Tokens behind conversation head required for eligibility (protects recent messages from summarization) |
| `workerInterval` | `"5m"` | How often the background worker runs |
| `maxLevels` | 3 | Maximum summary levels (L1, L2, L3) |
| `model` | *(agent's model)* | Model for summarization (can use a cheaper model) |
| `injectionBudget.L3` | 30000 | Token budget for long-term memory injection |
| `injectionBudget.L2` | 30000 | Token budget for earlier context injection |
| `injectionBudget.L1` | 30000 | Token budget for recent memory injection |

## CLI commands

```bash
# Show memory status
openclaw memory-hierarchical status

# View summaries
openclaw memory-hierarchical inspect
openclaw memory-hierarchical inspect --level L1 --limit 10
openclaw memory-hierarchical inspect --json
```

## Multi-session support

The worker processes all session JSONL files chronologically. Completed sessions (not the active one) have their pruning boundary set to 0, meaning all messages are eligible for summarization. The active session respects the `pruningBoundaryTokens` setting to protect recent conversation.

## Storage

Summaries are stored as markdown files alongside a JSON index:

```
~/.openclaw/state/agents/<agentId>/memory/summaries/
├── index.json
├── L1/
│   ├── 0001.md
│   └── ...
├── L2/
│   └── ...
└── L3/
    └── ...
```

Each `.md` file contains an HTML comment with metadata (source IDs, timestamps, token estimates) followed by the summary text.

## Architecture

The plugin integrates with OpenClaw through three extension points:

- **`before_agent_start` hook**: Reads summaries from disk, applies budget-aware selection, and injects a formatted memory section as an assistant message
- **`registerService`**: Runs the background compression worker on a timer
- **`registerCli`**: Provides `status` and `inspect` commands

Zero changes to OpenClaw core are required beyond PR #11732 (17 lines across 3 files).

tools