← Back to Plugins
Channels

Monitor

p-matrix By p-matrix ⭐ 1 stars 👁 40 views ▲ 0 votes

Real-time safety monitoring plugin for OpenClaw agents

Homepage GitHub

Install

openclaw plugins install @pmatrix/openclaw-monitor

Configuration Example

{
  "serverUrl": "https://api.pmatrix.io",
  "agentId": "oc_YOUR_AGENT_ID",
  "apiKey": "pm_live_xxxxxxxxxxxx",

  "safetyGate": {
    "enabled": true,
    "confirmTimeoutMs": 20000,
    "serverTimeoutMs": 2500
  },

  "credentialProtection": {
    "enabled": true,
    "blockTranscript": true,
    "customPatterns": []
  },

  "killSwitch": {
    "autoHaltOnRt": 0.75,
    "safeModel": "claude-opus-4-6"
  },

  "dataSharing": false,

  "batch": {
    "maxSize": 10,
    "flushIntervalMs": 2000,
    "retryMax": 3
  },

  "debug": false
}

README

# @pmatrix/openclaw-monitor

Runtime safety governance plugin for OpenClaw — **active intervention, not just logging.**

Blocks dangerous tool calls before execution, detects credential leaks in outgoing messages, and continuously measures agent risk with live Trust Grade (A–E).

> Early Access — Requires a P-MATRIX account and API key.

---

## What it does

### Core Protection

- **Safety Gate** — Intercepts high-risk tool calls before execution.
  Blocks or prompts for confirmation based on current risk level R(t).
- **Credential Protection** — Detects and blocks 11 types of API keys
  and secrets before they leave the agent.
- **Kill Switch** — Automatically halts the agent when R(t) ≥ 0.75.
  Manually via `/pmatrix halt`.

### Behavioral Intelligence

- **Autonomy Escalation** — Detects consecutive tool calls without
  user intervention. Applies norm penalty (-0.10).
- **Chain-Risk Detection** — Identifies dangerous tool call sequences
  (e.g., READ → WRITE → EXEC). Applies norm penalty (-0.05).
- **Drift Detection** — Compares per-turn behavioral snapshots and
  flags significant deviations. Adjusts stability axis accordingly.
- **Subagent Explosion Detection** — Blocks runaway subagent spawning
  (5+ in 5 minutes).
- **Live Grade** — Streams 4-axis safety signals and displays Trust
  Grade (A–E) in real time.

---

## Requirements

| Requirement | Version |
|-------------|---------|
| Node.js | ≥ 22 |
| P-MATRIX server | v1.0.0+ |

---

## Installation

```bash
openclaw plugins install @pmatrix/openclaw-monitor
```

Set up your API key:

```bash
npx @pmatrix/openclaw-monitor setup
```

Start the gateway:

```bash
openclaw gateway
```

---

### Privacy

**🔒 Content-Agnostic:** P-MATRIX never collects, parses, or stores
your LLM prompts or responses.

When data sharing is enabled, we only transmit numerical metadata —
token counts, timing, tool names, and safety events. Your agent's
prompt and response content stays local.

Pattern-based instant blocks (sudo, rm -rf, curl|sh) and credential
scanning run entirely on-device with no network dependency.

---

### Advanced Configuration

For full control, edit `~/.pmatrix/config.json` (created by the setup command):

```json
{
  "serverUrl": "https://api.pmatrix.io",
  "agentId": "oc_YOUR_AGENT_ID",
  "apiKey": "pm_live_xxxxxxxxxxxx",

  "safetyGate": {
    "enabled": true,
    "confirmTimeoutMs": 20000,
    "serverTimeoutMs": 2500
  },

  "credentialProtection": {
    "enabled": true,
    "blockTranscript": true,
    "customPatterns": []
  },

  "killSwitch": {
    "autoHaltOnRt": 0.75,
    "safeModel": "claude-opus-4-6"
  },

  "dataSharing": false,

  "batch": {
    "maxSize": 10,
    "flushIntervalMs": 2000,
    "retryMax": 3
  },

  "debug": false
}
```

Or set your API key as an environment variable:

```bash
export PMATRIX_API_KEY=pm_live_xxxxxxxxxxxx
```

---

## Chat Commands

| Command | Description |
|---------|-------------|
| `/pmatrix status` | Show current Grade, R(t), and 4-axis breakdown |
| `/pmatrix halt` | Manually trigger Kill Switch |
| `/pmatrix resume` | Resume from Halt state |
| `/pmatrix allow once` | Allow a pending Safety Gate confirmation |
| `/pmatrix help` | Show command list |

---

## Safety Gate

The Safety Gate intercepts tool calls before execution and evaluates them against the current risk level R(t):

| Risk Level | Mode | HIGH-risk tool | MEDIUM-risk tool | LOW-risk tool |
|-----------|------|---------------|-----------------|--------------|
| < 0.15 | Normal | Allow | Allow | Allow |
| 0.15–0.30 | Caution | **Confirm** | Allow | Allow |
| 0.30–0.50 | Alert | **Confirm** | Allow | Allow |
| 0.50–0.75 | Critical | **Block** | **Confirm** | Allow |
| ≥ 0.75 | Halt | **Block** | **Block** | **Block** |

**HIGH-risk tools** (auto-classified): `exec`, `bash`, `shell`, `run`, `apply_patch`, `browser`, `computer`, `terminal`, `code_interpreter`

**MEDIUM-risk tools**: `web_fetch`, `http_request`, `fetch`, `request`, `curl`, `wget`

**LOW-risk tools** (prefix match): `file_read`, `list_files`, `search`, `read`, `glob`, `grep`, and common read-only commands (`ls`, `find`, `cat`, `head`, `tail`)

> Unknown tools default to **MEDIUM** risk as a conservative baseline.
> Use `safetyGate.customToolRisk` to override.

**Instant block rules** (regardless of R(t)):
- `sudo` / `chmod 777` — privilege escalation
- `rm -rf /` — destructive deletion
- `curl ... | sh` — remote code execution

> **Note:** Instant block rules (sudo, rm -rf, curl|sh) are enforced
> independently of the Safety Gate setting. Even with
> `safetyGate.enabled: false`, these rules remain active.

When a tool requires confirmation, the agent pauses and prompts:
```
⚠️ [P-MATRIX] Safety Gate — Confirmation Required
Tool: bash | Risk: HIGH | Mode: Caution (R(t)=0.22)
👉 To allow: /pmatrix allow once
Timeout: auto-block after 20s
```

---

## Credential Protection

Detects and blocks 11 credential types before they are sent:

- OpenAI API keys (`sk-proj-...`)
- Anthropic API keys (`sk-ant-...`)
- AWS Access Keys (`AKIA...`)
- GitHub tokens (`ghp_...`, `github_pat_...`)
- Private keys (`-----BEGIN PRIVATE KEY-----`)
- Database URLs (`postgresql://user:pass@...`)
- Passwords (`password: "..."`)
- Bearer tokens
- Google AI keys (`AIza...`)
- Stripe keys (`sk_live_...`, `sk_test_...`)

Code blocks in messages are excluded from scanning to prevent false positives.

---

## Autonomy Escalation

Tracks consecutive tool calls within a single turn without user interaction. When the count exceeds the threshold (default: 5), the plugin:

1. Applies a `norm` penalty (-0.10 per event)
2. Emits an `autonomy_escalation` signal to the server
3. Logs a warning in the agent console

This prevents agents from executing long chains of actions autonomously without human oversight.

---

## Chain-Risk Detection

Monitors tool call sequences for dangerous patterns:

| Pattern | Example | Risk |
|---------|---------|------|
| `READ_WRITE_EXEC` | read file → write file → bash | Data exfiltration |
| `READ_EXEC` | read file → bash | Direct execution after read |
| `WRITE_EXEC` | write file → bash | Execute after file modification |

When a pattern is matched:
- `norm` axis receives a penalty (-0.05)
- A `chain_risk_detected` signal is emitted with the matched pattern name
- The agent console displays a warning

---

## Drift Detection

At each turn boundary (`agent_end`), the plugin captures a behavioral snapshot (axes values, tool usage, duration) and compares it against the previous turn:

| Drift Level | Score Threshold | Stability Delta |
|-------------|-----------------|-----------------|
| Moderate | ≥ 0.5 | +0.05 |
| Significant | ≥ 1.0 | +0.10 |

Higher `stability` values indicate more drift (lower safety). When drift is detected, a `drift_detected` signal is emitted with the drift score and level.

---

## R(t) Formula

```
R(t) = 1 - (BASELINE + NORM + (1 - STABILITY) + META_CONTROL) / 4
```

| Axis | Field | Meaning |
|------|-------|---------|
| BASELINE | `baseline` | Initial config integrity — higher = safer |
| NORM | `norm` | Behavioral normalcy — higher = safer |
| STABILITY | `stability` | Trajectory stability — lower = more drift |
| META_CONTROL | `meta_control` | Self-control capacity — higher = safer |

P-Score = `round(100 × (1 - R(t)), 2)`
Trust Grade: A (≥80) · B (≥60) · C (≥40) · D (≥20) · E (<20)

---

## Server-side Setup

The plugin sends signals to `POST /v1/inspect/stream` on your P-MATRIX server.

Production server: `https://api.pmatrix.io`

---

## Configuration Reference

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `serverUrl` | string | — | P-MATRIX server URL |
| `agentId` | string | — | Agent ID from P-MATRIX dashboard |
| `apiKey` | string | — | API key (`pm_live_...`). Use env var. |
| `safetyGate.enabled` | boolean | `true` | Enable Safety Gate |
| `safetyGate.confirmTimeoutMs` | number | `20000` | User confirm timeout (10,000–30,000ms) |
| `safetyGate.serverTimeoutMs` | number | `2500` | Server query timeout (fail-open) |
| `safetyGate.customToolRisk` | object | `{}` | Override tool risk tier |
| `credentialProtection.enabled` | boolean | `true` | Enable credential scanning |
| `credentialProtection.blockTranscript` | boolean | `true` | Also block transcript writes |
| `credentialProtection.customPatterns` | string[] | `[]` | Additional regex patterns |
| `killSwitch.autoHaltOnRt` | number | `0.75` | Auto-halt R(t) threshold |
| `killSwitch.safeModel` | string | `"claude-opus-4-6"` | Model to switch to on Halt |
| `dataSharing` | boolean | `false` | Send safety signals to P-MATRIX server (opt-in). When false, instant block rules (sudo, rm -rf, curl\|sh) and credential scanning still work fully locally. However, R(t)-based Safety Gate decisions use the last known server value (or 0.0 if never connected) — grade tracking and dashboard are disabled. Run `npx @pmatrix/openclaw-monitor setup` to enable live grading and adaptive Safety Gate. Prompts and responses are never transmitted regardless of this setting. |
| `outboundAllowlist.enabled` | boolean | `false` | Enable outbound message allowlist |
| `outboundAllowlist.allowedChannels` | string[] | `[]` | Permitted outbound channels (e.g. `"slack:#ops"`) |
| `batch.maxSize` | number | `10` | Buffer flush threshold |
| `batch.flushIntervalMs` | number | `2000` | Periodic flush interval (ms) |
| `batch.retryMax` | number | `3` | Send retry count |
| `debug` | boolean | `false` | Verbose logging |

---

## Offline / Server-Down Behavior

- **No cache (initial)**: R(t) = 0.0 (fail-open, no blocking before first connection)
- **Cache exists + server down**: Last known R(t) is kept indefinitely — Safety Gate continues using it
- Failed signals are saved to `~/.pmatrix/unsent/` and can be replayed later

---

## License

Apache-2.0 © 2026 P-MATRIX
channels

Comments

Sign in to leave a comment

Loading comments...