Tools
Oh My Claw
OpenClaw plugin: agentic LLM critic that hooks agent_end, judges completions via subagent.run, and re-tasks on FAIL via the message_sending hook.
Install
openclaw plugins install oh-my-claw
README
# oh-my-claw
> First version of an **OpenClaw plugin** that hooks `agent_end`, runs an
> agentic LLM critic, and re-tasks the original session via
> `subagent.run` when the completion fails the judge checklist.
This is the **plugin** answer to "openclaw-bot์ ๊ฒํ ์์ด ๊ทธ๋๋ก ์ ๋ฌํ๋ค"
โ it is NOT a bridge-daemon if-statement layer. It registers as a normal
OpenClaw plugin via `definePluginEntry` and calls only documented
plugin-SDK surfaces.
---
## Why this exists
The previous iteration's `OpenclawCritic` lived inside our `bridge-daemon`
process and judged completions with regex/keyword rules. That was fast to
build but had three structural gaps that this plugin closes:
1. **No LLM call** โ regex can't read intent, only text shape.
2. **Out of OpenClaw's orchestration graph** โ bridge-daemon is
adjacent to OpenClaw, not part of it. Plugin code runs inside the
gateway process and gets first-class access to `runEmbeddedAgent` and
`subagent.run`.
3. **Bridge-only scope** โ the plugin works for any agent run that
triggers `agent_end`, not just the Discord channel our bridge owns.
---
## Architecture (v0.2 โ actually suppresses the user-facing reply on FAIL)
The lifecycle in OpenClaw 2026.4.12: `agent_end` is **fire-and-forget**
(`dist/pi-embedded-runner-CefZK1Pt.js:6234` โ `.catch(...)` only, no await),
but `message_sending` IS awaited (`dist/deliver-CClC7J0O.js:780-825`,
`applyMessageSendingHook` returns `{cancelled: true}` to the deliver loop
at line 802 when the plugin returns `{cancel: true}`).
So the plugin uses **two hooks in tandem**:
```
agent (Claude Code via ACP) emits stop
โ
โผ
gateway fires agent_end (fire-and-forget) AND continues to delivery
โ โ
โผ โผ
oh-my-claw gateway: deliverOutboundPayloads
api.on("agent_end", ...) awaits applyMessageSendingHook
capture-only: โ
contexts.set(channelId, { โผ
sessionKey, messages, attempt, ... api.on("message_sending", ...)
}) critic gate:
NO awaits before set โ lookup contexts.get(channelId)
sync portion lands first if absent โ return undefined (let through)
if not absent โ
lazy-await runCritic(entry)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ runEmbeddedAgent({ โ
โ extraSystemPrompt: โ
โ CRITIC_SYSTEM_PROMPT โ
โ prompt: buildCritic.. โ
โ disableTools: true โ
โ }) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
parseVerdict(...)
โ
โโ PASS โ return undefined
โ (user sees original reply)
โ retryState.clear()
โ contexts.delete()
โ
โโ SKIP โ return undefined
โ (parse error / critic LLM error)
โ
โโ FAIL โ
if !entry.retasked:
retryState.bump()
fire-and-forget subagent.run({
sessionKey,
message: fixupPrompt,
deliver: false,
idempotencyKey: ...,
})
return { cancel: true }
// โ THIS suppresses the original
// outbound โ user sees nothing.
// Next agent_end re-enters
// with attempt+1.
```
**Why two hooks**: `agent_end` is the only place we get the raw `messages`
+ `sessionKey` + `runId`, but it can't suppress because the gateway doesn't
wait on it. `message_sending` IS awaited and CAN suppress, but its ctx
doesn't carry sessionKey. We bridge the two via a per-channel `contexts`
map populated synchronously in `agent_end`.
**Why the lazy critic**: a single agent reply can produce multiple outbound
payloads (text + media). We run the critic ONCE per agent_end and reuse the
verdict for every subsequent message_sending of the same run via a cached
Promise โ no double-fire of the LLM call.
---
## Files
| Path | Role |
|---|---|
| `package.json` | npm package, `openclaw.extensions: ["./index.js"]`, `compat.pluginApi: ">=2026.4.12"`, `type: "module"` |
| `openclaw.plugin.json` | OpenClaw plugin manifest โ `id`, `name`, `description`, `configSchema`. **Required for loader discovery** (parallel to `package.json`; modeled on `dist/extensions/llm-task/openclaw.plugin.json`). |
| `index.js` | `definePluginEntry(...)`, **`api.on("agent_end", (event, ctx) => ...)`** (the typed plugin hook surface โ NOT `api.registerHook`, which routes through internal `<type>:<action>` keys and has no `agent_end` event). Calls `runEmbeddedAgent` + `subagent.run`. |
| `src/critic-prompt.mjs` | Pure: `CRITIC_SYSTEM_PROMPT` + `buildCriticUserMessage()` โ testable without OpenClaw |
| `src/parse-verdict.mjs` | Pure: `parseVerdict(rawText)` โ `{verdict, reasons, fixupPrompt}`. Lenient parser, fail-safe on garbage input |
| `src/retry-state.mjs` | Pure: `RetryState` class โ per-session counter, `decide()` returns `"retry"` or `"give-up"` |
| `test/run.mjs` | 51 unit tests for the three pure modules |
---
## Install (when ClawHub publishing is set up)
```bash
openclaw plugins install oh-my-claw
```
Until then, link locally:
```bash
cd /opt/homebrew/lib/node_modules/openclaw/dist/extensions
ln -s /Users/jaewoo/Desktop/Project/research/260413_openclaw_research/oh-my-claw oh-my-claw
# Restart the gateway. The plugin will be discovered via openclaw.extensions
# in package.json.
```
---
## Configuration
Plugin config schema (set in your `openclaw.toml` under
`plugin.oh-my-claw.*`):
| Key | Default | Description |
|---|---|---|
| `enabled` | `true` | Master toggle. Set false to no-op the hook entirely. |
| `judgeProvider` | runtime default | Provider for the critic LLM turn. |
| `judgeModel` | runtime default | Model for the critic LLM turn. |
| `maxRejects` | `3` | Re-task ceiling per session before giving up. |
| `criticTimeoutMs` | `60000` | Wall-clock timeout for one critic LLM turn. |
| `retaskTimeoutMs` | `300000` | Wall-clock timeout for waiting on the re-tasked subagent run. **Decoupled from the critic timeout** so slow re-runs don't double-fire. |
| `sessionKeyAllowlistRegex` | unset | If set, only `sessionKey` matching this regex is reviewed (e.g. `^acp:claudecode:.*`). |
---
## Verification
### Unit tests (no OpenClaw runtime needed)
```bash
npm test
# 51 passed ยท 0 failed
```
Tests cover:
- **critic-prompt**: system prompt shape, user message assembly with weird
message shapes (string, typed text, array content, null, number),
truncation safety, acceptance criteria rendering.
- **parse-verdict**: PASS/FAIL/SKIP classification, lenient FAIL parsing
(markdown bullets, `*` markers, Korean `์ฌ์ง์:` header, fallback C-line
scan, default fixup synthesis, unparseable โ SKIP fail-safe).
- **retry-state**: `bump`/`get`/`clear`/`decide` lifecycle, custom
`maxRejects`, multi-key independence.
### Static check
```bash
npm run check
# node --check on all .js / .mjs sources, including index.js
```
`index.js` imports `openclaw/plugin-sdk` so it can only be **loaded**
inside an OpenClaw process โ but `node --check` validates the syntax
without executing the import. The pure-logic modules are tested
end-to-end with the real implementation via the unit suite.
### Runtime verification (manual, requires live gateway)
1. Symlink the plugin into OpenClaw's extensions directory (see Install).
2. Restart the gateway with `OPENCLAW_LOG_LEVEL=info`.
3. Trigger any agent run via Claude Code that should fail the checklist
โ e.g. ask Claude to "modify foo.mjs" and have it respond with just
"์๋ฃ" without touching the file.
4. Watch the gateway logs:
- `[oh-my-claw] registering agent_end hook (...)` on startup
- `[oh-my-claw] critic turn START sessionKey=... attempt=1/3`
- `[oh-my-claw] critic verdict=FAIL ... reasons=[...]`
- `[oh-my-claw] FAIL โ re-task ... (1/3)`
5. Verify the session receives the fixup prompt and Claude does the
actual work this time around.
---
## Honest limitations (read before deploying)
1. **Critic runs without tools** (`disableTools: true`). The critic
judges from the agent's reply text only. Future: enable the
file-read tool so the critic can verify claimed file paths.
2. **In-memory retry counter and contexts map.** Restarting the gateway
resets all state. In-flight r
... (truncated)
tools
Comments
Sign in to leave a comment