Tools

Reflection Guard

Name: Reflection Guard
Rating: 3.5 (1 reviews)
Author: bhattman-dev

By bhattman-dev 👁 46 views ▲ 0 votes

OpenClaw plugin: auto-detects corrections, triages into mechanical fixes or behavioral lessons, progressively hardens agent behavior

GitHub

README

# reflection-guard

An OpenClaw plugin that detects user corrections, triages them into mechanical fixes or behavioral lessons, and progressively hardens the agent's behavior.

## How it works

When a user corrects the assistant, the plugin:

1. **Classifies** the exchange (CORRECTION vs OK) using an async LLM call
2. **Stores** the correction in SQLite (with JSONL fallback) — category, confidence, reasoning, context
3. **Triages** — asks "can this be mechanically prevented by a plugin hook?"
4. If **yes**: writes a fix plan, posts a proposal to a Discord alerts channel for human approval, then spawns an ACP agent to build the gate
5. If **no**: adds the pattern to a hot list, injects the top 3 most frequent corrections as reminders at session start

The end state: every correction either becomes a wall (plugin gate) or a prioritized reminder. The system hardens itself over time.

## Architecture

```
User corrects assistant
  → Classify (Flash, async)
  → Store to SQLite
  → Triage: mechanically preventable?
    → YES: write plan → post to #alerts → approve → ACP builds the gate
    → NO: upsert hot list → inject [REFLECTION] at session start
```

### Correction categories
- `PROCESS_VIOLATION` — skipped a required step or gate
- `FACTUAL_ERROR` — stated something incorrect
- `STYLE_VIOLATION` — formatting, tone, or writing rule broken
- `ASSUMPTION_WRONG` — acted on a wrong assumption
- `RULE_REPEATED` — violated a rule that was already established
- `SCOPE_DRIFT` — did more or different work than asked

### Hot list injection
Top 3 corrections by frequency are injected at the start of each turn:
```
[REFLECTION — Active patterns to avoid]
1. STYLE_VIOLATION (5x): No em dashes in any output
2. PROCESS_VIOLATION (3x): Always emit [DISCUSS COMPLETE] before spawning
3. FACTUAL_ERROR (2x): Verify infrastructure claims before asserting
```
Entries decay after 14 days of no recurrence.

## Install

Copy `index.js` and `openclaw.plugin.json` to `~/.openclaw/extensions/reflection-guard/`:

```bash
mkdir -p ~/.openclaw/extensions/reflection-guard
cp index.js openclaw.plugin.json ~/.openclaw/extensions/reflection-guard/
```

Restart the gateway. The plugin auto-loads.

### Optional files
- `triage-prompt.md` — custom triage classifier prompt. Place at `data/reflection-guard-v2/triage-prompt.md` in your workspace.
- `classifier-prompt.md` — custom correction classifier prompt. Place at `data/reflection-plugin/classifier-prompt-v1.md` in your workspace.

If these files aren't present, the plugin uses built-in fallback prompts.

## Storage

SQLite database at `data/reflection-guard.db` in your workspace. Two tables:

- **corrections** — every detected correction with category, confidence, triage result
- **hot_list** — active behavioral patterns with frequency tracking and 14-day decay

Falls back to JSONL append at `data/reflection-plugin/classification-queue.jsonl` if SQLite isn't available.

## Configuration

The plugin works out of the box with no config. It uses:
- `gemini-2.5-flash` for classification and triage (cheap, fast)
- Claude Code (default ACP agent) for building mechanical fixes
- Discord channel for alerts (configurable in the plugin code)

### Session filtering
Automatically skips: ACP sessions, sub-agents, voice sessions, and smart glasses sessions. Only watches main interactive sessions.

## Eval dataset

`eval-dataset.jsonl` contains 70 synthetic labeled examples for benchmarking the classifier:
- 44 CORRECTION examples across all 6 categories
- 26 NOT_CORRECTION examples (acknowledgments, new tasks, follow-ups, preferences)

All examples are synthetic. No real conversation data.

## Requirements

- OpenClaw with plugin support
- Node.js 22+ (uses `node:sqlite` for `DatabaseSync`)
- A configured LLM provider for classification (defaults to Gemini Flash)

## License

MIT

tools