← Back to Plugins
Tools

Identity Plane

intrinsec-ai By intrinsec-ai 👁 5 views ▲ 0 votes

Identity Plane for OpenClaw agents that surfaces off-identity behavior and compacts identity over time.

Homepage GitHub

Install

npm install @intrinsec-ai/openclaw-identity-plane

Configuration Example

{
  "plugins": {
    "load": { "paths": ["~/.openclaw/extensions/identity"] },
    "entries": { "identity": { "enabled": true } }
  }
}

README



# ๐Ÿง  `@intrinsec-ai/openclaw-identity-plane` โ€” Identity Plane Plugin

**An identity watchdog for your AI agents.**  
Track whether your agent stays true to its declared identity โ€” across every message, not just the first.

[Plugin](https://docs.openclaw.ai/)
[Node](https://nodejs.org/)
[License](./LICENSE)
[Status]()



---

> **Demo notice.** This is a lightweight reference implementation โ€” intentionally observational and read-only. It logs, visualises, and annotates drift but does not block or intercept anything. Features like active intervention, blocking writes, or fleet correlation can be added based on community feedback.

---

## The core insight

> *Every token entering your agent's context window is a vote on its identity.*  
> *This plugin watches for votes that shouldn't be there.*

Prompt injection, jailbreaks, and malicious skill installations (like [ClawHavoc](https://www.koi.ai/blog/clawhavoc-341-malicious-clawedbot-skills-found-by-the-bot-they-were-targeting)) all work the same way: they deflect the agent's behavioural trajectory away from its declared identity. The identity plane makes that deflection visible.

---

## What it does


|                 |                                                                                                                                                              |
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| ๐Ÿ” **Detects**  | Prompt injection ยท jailbreaks ยท off-topic drift ยท malicious skill installs                                                                                   |
| โš™๏ธ **How**      | Scores each message against your agent's compliance model; off-identity signals accumulate in bounded memory without cancelling each other out               |
| ๐Ÿ“Š **Signals**  | `maxDrift` ยท `avgDrift` ยท drift themes ยท **cognitive file change annotations** (so you know whether a spike came *before* or *after* a file changed on disk) |
| โ˜๏ธ **Stack**    | **OpenAI-only:** `text-embedding-3-small` on every inbound message, plus the same LLM you configure for calibration, compaction, and novel-theme labels      |
| ๐Ÿ‘๏ธ **Posture** | Read-only and observational. Nothing is blocked.                                                                                                             |


---

## How it works

### 1 ยท Calibration *(once, then cached)*

When you first run an agent (or run `/identity recalibrate`):

1. **Reads your cognitive files** โ€” `SOUL.md`, `AGENTS.md`, `IDENTITY.md`, `USER.md`, `TOOLS.md`
2. **Extracts identity anchors** โ€” the LLM distills the raw files into 5โ€“10 named, categorised anchors: `purpose`, `value`, `boundary`, `persona`, `constraint`. These are specific, traceable aspects of identity (e.g. *[BOUNDARY] No credential access: The agent must never read or transmit API keys or secrets.*) rather than a raw policy dump.
3. **Generates labelled examples grounded in the anchors** โ€” the same LLM produces `max(20, 3 ร— numAnchors)` compliant and violation sentences. The count scales with identity complexity (e.g. 8 anchors โ†’ 24 examples each) so every anchor is represented multiple times, giving the manifold better covariance coverage. The prompt asks for at least one compliant and one violation per anchor.
4. **Both sets are embedded** with `text-embedding-3-small` (same model as runtime โ€” keeps geometry consistent)
5. **Fits a Gaussian** to the compliant cluster โ€” mean and covariance via PCA, producing the **identity manifold**
6. **Sets the surprise threshold** โ€” the 90th-percentile Mahalanobis distance of compliant examples. With โ‰ฅ20 examples the empirical distribution is reliable enough that the 95th percentile over-fits to compliant outliers; 90th is a tighter, more stable gate.

The identity anchors are the canonical representation of the agent's identity from this point on. Every downstream LLM call โ€” compaction audits, novel theme labels โ€” references the named anchors rather than the raw file contents.

Everything is cached to `.openclaw/identity-cache.json`. Nothing is recomputed until you explicitly recalibrate.

> **Upgrading from an older build:** run `/identity recalibrate` โ€” the cache schema now includes identity anchors and old caches without them will trigger automatic re-calibration.

---

### 2 ยท Runtime *(every inbound message)*

Every piece of text entering the LLM โ€” user messages, tool results, subagent replies โ€” flows through:

```
  message โ”€โ”€โ–บ embed (OpenAI) โ”€โ”€โ–บ surprise = ยฝยทMahalanobisยฒ(x, compliance)
                                        โ”‚
                          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                          โ”‚  weight = max(0, sโˆ’ฯ„)     โ”‚
                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                        โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚  weight > 0?                                     โ”‚
              โ”‚   โ†’ ingest into coreset                          โ”‚
              โ”‚   โ†’ novelty check โ†’ new mode? โ†’ LLM theme label  โ”‚
              โ”‚                                                  โ”‚
              โ”‚  weight = 0? โ†’ skip (identity-neutral)           โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                        โ–ผ
              drift scores from coreset โ”€โ”€โ–บ maxDrift / avgDrift โ”€โ”€โ–บ level
```

**Drift levels:** `nominal` ยท `warning` ยท `alert`

Without `OPENAI_API_KEY`, embedding fails and the hook no-ops with a logged error. Novel-theme labeling falls back to **geometric** labels if the key is missing at runtime; compaction is skipped without a key.

---

### 3 ยท Identity compaction *(LLM audit)*

The coreset records *where* drift lives geometrically. **Identity compaction** turns that into a human-readable story grounded in the named identity anchors.

Every `compactionInterval` off-identity messages (after ingestion, i.e. `surpriseWeight > 0`), when `OPENAI_API_KEY` is set:

1. **Clusters** coreset points via connected-component BFS
2. **Describes** each cluster using the actual stored message texts
3. **Asks the LLM** for theme labels, severities, and a 1โ€“2 sentence narrative โ€” with the identity anchors list in the prompt so every theme and the narrative explicitly references named aspects of the declared identity
4. **Replaces** the theme list with the consolidated analysis

The compaction is **non-destructive** โ€” the coreset (geometric ground truth) is never rewritten by the LLM.

**Novel drift modes** get a one-shot LLM label (triggering message + nearest calibration violations) whenever the key is present; otherwise the geometric fallback is used.

---

### 4 ยท The coreset โ€” why signals don't cancel

Unlike a running mean, the **bounded weighted coreset** (128 points by default) keeps distinct drift directions separate:

- Opposing drifts cannot cancel each other out
- Old signals aren't lost to exponential forgetting
- When full, the *closest pair* merges โ€” preserving distinct drift modes while bounding memory

Each coreset point carries the **original message texts** that contributed to it, so drift themes and audit narratives are grounded in real behaviour, not just geometry.

---

### 5 ยท Why distance-from-compliance works

When an agent has a well-defined goal โ€” *"help with code"*, *"answer HR questions"* โ€” its compliant responses form a **tight cluster** in embedding space. Violations live in the open-ended complement.

```
     Embedding space (conceptual 2D projection)
     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
     โ”‚  โœ—          โœ—                 โœ—          โ”‚
     โ”‚       โœ—          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โœ—      โ”‚
     โ”‚  โœ—               โ”‚ โ— โ— โ— โ— โ”‚             โ”‚
     โ”‚                  โ”‚ โ— โ— โ— โ— โ”‚  โœ—          โ”‚
     โ”‚  โœ—               โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜             โ”‚
     โ”‚                       โ”‚                  โ”‚
     โ”‚   violations          โ”‚  compliance      โ”‚
     โ”‚   (open-ended)        โ”‚  (tight cluster) โ”‚
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```

We never need to enumerate violations. We only ask: *how far is this message from the known compliance region?* That is a one-class classification problem with a clean solution: **Mahalanobis distance**.

---

---

## Installation

### Prerequisites

- [OpenClaw](https://docs.openclaw.ai/) installed and running
- Node.js 22.16+ (Node 24 recommended)
- `OPENAI_API_KEY` โ€” embeddings and LLM calls (calibration is cached; embeddings and audit calls are ongoing)

---

### Option A โ€” Install from npm *(recommended)*

```bash
npm install @intrinsec-ai/openclaw-identity-plane
npx openclaw plugins install -l node_modules/@intrinsec-ai/openclaw-identity-plane
```

### Option B โ€” Clone directly

```bash
git clone https://github.com/intrinsec-ai/openclaw-identity-plane \
  ~/.openclaw/extensions/identity
cd ~/.openclaw/extensions/identity
npm install
```

### Option C โ€” Link a local checkout *(for development)*

```bash
cd /path/to/openclaw-identity-plane
npm install
openclaw plugins install --link /path/to/openclaw-identity-plane
```

---

### Enable the plugin

```bash
openclaw plugins enable identity
```

Or manually in `~/.openclaw/openclaw.json`:

```json
{
  "plugins": {
    "load": { "paths": ["~/.openclaw/extensions/identity"] },
    "entries": { "identity": { "enabled": true } }
  }
}
```

Remove deprecated config keys if present: `embedModel`, `agenticAudit` (no longer used).

---

### Set your API key

```bash
export OPENAI_API_KEY=sk-...
```

Optional: route both chat and embeddings through a compatible gateway:

```bash
export OPENAI_BASE_URL=http://localhost:11434/v1
```

Set `calibrationModel` to a model your endpoint serves. The embedding client uses the **same** `OPENAI_BASE_URL`; the server must implement OpenAI-style `**POST /v1/embeddings`** for `text-embedding-3-small` (or map that model id to a local embedder), or calibration and runtime embedding wil

... (truncated)
tools

Comments

Sign in to leave a comment

Loading comments...