Tools

Oh My Claw

Name: Oh My Claw
Rating: 3.5 (1 reviews)
Author: winehouse8

By winehouse8 👁 78 views ▲ 0 votes

OpenClaw plugin: agentic LLM critic that hooks agent_end, judges completions via subagent.run, and re-tasks on FAIL via the message_sending hook.

GitHub

Install

openclaw plugins install oh-my-claw

README

# oh-my-claw

> First version of an **OpenClaw plugin** that hooks `agent_end`, runs an
> agentic LLM critic, and re-tasks the original session via
> `subagent.run` when the completion fails the judge checklist.

This is the **plugin** answer to "openclaw-bot은 검토 없이 그대로 전달한다"
— it is NOT a bridge-daemon if-statement layer. It registers as a normal
OpenClaw plugin via `definePluginEntry` and calls only documented
plugin-SDK surfaces.

---

## Why this exists

The previous iteration's `OpenclawCritic` lived inside our `bridge-daemon`
process and judged completions with regex/keyword rules. That was fast to
build but had three structural gaps that this plugin closes:

1. **No LLM call** — regex can't read intent, only text shape.
2. **Out of OpenClaw's orchestration graph** — bridge-daemon is
   adjacent to OpenClaw, not part of it. Plugin code runs inside the
   gateway process and gets first-class access to `runEmbeddedAgent` and
   `subagent.run`.
3. **Bridge-only scope** — the plugin works for any agent run that
   triggers `agent_end`, not just the Discord channel our bridge owns.

---

## Architecture (v0.2 — actually suppresses the user-facing reply on FAIL)

The lifecycle in OpenClaw 2026.4.12: `agent_end` is **fire-and-forget**
(`dist/pi-embedded-runner-CefZK1Pt.js:6234` — `.catch(...)` only, no await),
but `message_sending` IS awaited (`dist/deliver-CClC7J0O.js:780-825`,
`applyMessageSendingHook` returns `{cancelled: true}` to the deliver loop
at line 802 when the plugin returns `{cancel: true}`).

So the plugin uses **two hooks in tandem**:

```
agent (Claude Code via ACP) emits stop
        │
        ▼
gateway fires agent_end (fire-and-forget) AND continues to delivery
        │                                            │
        ▼                                            ▼
oh-my-claw                                  gateway: deliverOutboundPayloads
  api.on("agent_end", ...)                    awaits applyMessageSendingHook
   capture-only:                                    │
   contexts.set(channelId, {                        ▼
     sessionKey, messages, attempt, ...      api.on("message_sending", ...)
   })                                          critic gate:
   NO awaits before set —                       lookup contexts.get(channelId)
   sync portion lands first                     if absent → return undefined (let through)
                                                if not absent →
                                                  lazy-await runCritic(entry)
                                                  ┌──────────────────────────┐
                                                  │ runEmbeddedAgent({       │
                                                  │   extraSystemPrompt:     │
                                                  │     CRITIC_SYSTEM_PROMPT │
                                                  │   prompt: buildCritic..  │
                                                  │   disableTools: true     │
                                                  │ })                       │
                                                  └──────────────────────────┘
                                                  parseVerdict(...)
                                                    │
                                                    ├─ PASS → return undefined
                                                    │         (user sees original reply)
                                                    │         retryState.clear()
                                                    │         contexts.delete()
                                                    │
                                                    ├─ SKIP → return undefined
                                                    │         (parse error / critic LLM error)
                                                    │
                                                    └─ FAIL →
                                                        if !entry.retasked:
                                                          retryState.bump()
                                                          fire-and-forget subagent.run({
                                                            sessionKey,
                                                            message: fixupPrompt,
                                                            deliver: false,
                                                            idempotencyKey: ...,
                                                          })
                                                        return { cancel: true }
                                                        // ↑ THIS suppresses the original
                                                        //   outbound — user sees nothing.
                                                        //   Next agent_end re-enters
                                                        //   with attempt+1.
```

**Why two hooks**: `agent_end` is the only place we get the raw `messages`
+ `sessionKey` + `runId`, but it can't suppress because the gateway doesn't
wait on it. `message_sending` IS awaited and CAN suppress, but its ctx
doesn't carry sessionKey. We bridge the two via a per-channel `contexts`
map populated synchronously in `agent_end`.

**Why the lazy critic**: a single agent reply can produce multiple outbound
payloads (text + media). We run the critic ONCE per agent_end and reuse the
verdict for every subsequent message_sending of the same run via a cached
Promise — no double-fire of the LLM call.

---

## Files

| Path | Role |
|---|---|
| `package.json` | npm package, `openclaw.extensions: ["./index.js"]`, `compat.pluginApi: ">=2026.4.12"`, `type: "module"` |
| `openclaw.plugin.json` | OpenClaw plugin manifest — `id`, `name`, `description`, `configSchema`. **Required for loader discovery** (parallel to `package.json`; modeled on `dist/extensions/llm-task/openclaw.plugin.json`). |
| `index.js` | `definePluginEntry(...)`, **`api.on("agent_end", (event, ctx) => ...)`** (the typed plugin hook surface — NOT `api.registerHook`, which routes through internal `<type>:<action>` keys and has no `agent_end` event). Calls `runEmbeddedAgent` + `subagent.run`. |
| `src/critic-prompt.mjs` | Pure: `CRITIC_SYSTEM_PROMPT` + `buildCriticUserMessage()` — testable without OpenClaw |
| `src/parse-verdict.mjs` | Pure: `parseVerdict(rawText)` → `{verdict, reasons, fixupPrompt}`. Lenient parser, fail-safe on garbage input |
| `src/retry-state.mjs` | Pure: `RetryState` class — per-session counter, `decide()` returns `"retry"` or `"give-up"` |
| `test/run.mjs` | 51 unit tests for the three pure modules |

---

## Install (when ClawHub publishing is set up)

```bash
openclaw plugins install oh-my-claw
```

Until then, link locally:

```bash
cd /opt/homebrew/lib/node_modules/openclaw/dist/extensions
ln -s /Users/jaewoo/Desktop/Project/research/260413_openclaw_research/oh-my-claw oh-my-claw
# Restart the gateway. The plugin will be discovered via openclaw.extensions
# in package.json.
```

---

## Configuration

Plugin config schema (set in your `openclaw.toml` under
`plugin.oh-my-claw.*`):

| Key | Default | Description |
|---|---|---|
| `enabled` | `true` | Master toggle. Set false to no-op the hook entirely. |
| `judgeProvider` | runtime default | Provider for the critic LLM turn. |
| `judgeModel` | runtime default | Model for the critic LLM turn. |
| `maxRejects` | `3` | Re-task ceiling per session before giving up. |
| `criticTimeoutMs` | `60000` | Wall-clock timeout for one critic LLM turn. |
| `retaskTimeoutMs` | `300000` | Wall-clock timeout for waiting on the re-tasked subagent run. **Decoupled from the critic timeout** so slow re-runs don't double-fire. |
| `sessionKeyAllowlistRegex` | unset | If set, only `sessionKey` matching this regex is reviewed (e.g. `^acp:claudecode:.*`). |

---

## Verification

### Unit tests (no OpenClaw runtime needed)

```bash
npm test
# 51 passed · 0 failed
```

Tests cover:
- **critic-prompt**: system prompt shape, user message assembly with weird
  message shapes (string, typed text, array content, null, number),
  truncation safety, acceptance criteria rendering.
- **parse-verdict**: PASS/FAIL/SKIP classification, lenient FAIL parsing
  (markdown bullets, `*` markers, Korean `재지시:` header, fallback C-line
  scan, default fixup synthesis, unparseable → SKIP fail-safe).
- **retry-state**: `bump`/`get`/`clear`/`decide` lifecycle, custom
  `maxRejects`, multi-key independence.

### Static check

```bash
npm run check
# node --check on all .js / .mjs sources, including index.js
```

`index.js` imports `openclaw/plugin-sdk` so it can only be **loaded**
inside an OpenClaw process — but `node --check` validates the syntax
without executing the import. The pure-logic modules are tested
end-to-end with the real implementation via the unit suite.

### Runtime verification (manual, requires live gateway)

1. Symlink the plugin into OpenClaw's extensions directory (see Install).
2. Restart the gateway with `OPENCLAW_LOG_LEVEL=info`.
3. Trigger any agent run via Claude Code that should fail the checklist
   — e.g. ask Claude to "modify foo.mjs" and have it respond with just
   "완료" without touching the file.
4. Watch the gateway logs:
   - `[oh-my-claw] registering agent_end hook (...)` on startup
   - `[oh-my-claw] critic turn START sessionKey=... attempt=1/3`
   - `[oh-my-claw] critic verdict=FAIL ... reasons=[...]`
   - `[oh-my-claw] FAIL → re-task ... (1/3)`
5. Verify the session receives the fixup prompt and Claude does the
   actual work this time around.

---

## Honest limitations (read before deploying)

1. **Critic runs without tools** (`disableTools: true`). The critic
   judges from the agent's reply text only. Future: enable the
   file-read tool so the critic can verify claimed file paths.

2. **In-memory retry counter and contexts map.** Restarting the gateway
   resets all state. In-flight r

... (truncated)

tools