Tools
Identity Plane
Identity Plane for OpenClaw agents that surfaces off-identity behavior and compacts identity over time.
Install
npm install @intrinsec-ai/openclaw-identity-plane
Configuration Example
{
"plugins": {
"load": { "paths": ["~/.openclaw/extensions/identity"] },
"entries": { "identity": { "enabled": true } }
}
}
README
# ๐ง `@intrinsec-ai/openclaw-identity-plane` โ Identity Plane Plugin
**An identity watchdog for your AI agents.**
Track whether your agent stays true to its declared identity โ across every message, not just the first.
[Plugin](https://docs.openclaw.ai/)
[Node](https://nodejs.org/)
[License](./LICENSE)
[Status]()
---
> **Demo notice.** This is a lightweight reference implementation โ intentionally observational and read-only. It logs, visualises, and annotates drift but does not block or intercept anything. Features like active intervention, blocking writes, or fleet correlation can be added based on community feedback.
---
## The core insight
> *Every token entering your agent's context window is a vote on its identity.*
> *This plugin watches for votes that shouldn't be there.*
Prompt injection, jailbreaks, and malicious skill installations (like [ClawHavoc](https://www.koi.ai/blog/clawhavoc-341-malicious-clawedbot-skills-found-by-the-bot-they-were-targeting)) all work the same way: they deflect the agent's behavioural trajectory away from its declared identity. The identity plane makes that deflection visible.
---
## What it does
| | |
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| ๐ **Detects** | Prompt injection ยท jailbreaks ยท off-topic drift ยท malicious skill installs |
| โ๏ธ **How** | Scores each message against your agent's compliance model; off-identity signals accumulate in bounded memory without cancelling each other out |
| ๐ **Signals** | `maxDrift` ยท `avgDrift` ยท drift themes ยท **cognitive file change annotations** (so you know whether a spike came *before* or *after* a file changed on disk) |
| โ๏ธ **Stack** | **OpenAI-only:** `text-embedding-3-small` on every inbound message, plus the same LLM you configure for calibration, compaction, and novel-theme labels |
| ๐๏ธ **Posture** | Read-only and observational. Nothing is blocked. |
---
## How it works
### 1 ยท Calibration *(once, then cached)*
When you first run an agent (or run `/identity recalibrate`):
1. **Reads your cognitive files** โ `SOUL.md`, `AGENTS.md`, `IDENTITY.md`, `USER.md`, `TOOLS.md`
2. **Extracts identity anchors** โ the LLM distills the raw files into 5โ10 named, categorised anchors: `purpose`, `value`, `boundary`, `persona`, `constraint`. These are specific, traceable aspects of identity (e.g. *[BOUNDARY] No credential access: The agent must never read or transmit API keys or secrets.*) rather than a raw policy dump.
3. **Generates labelled examples grounded in the anchors** โ the same LLM produces `max(20, 3 ร numAnchors)` compliant and violation sentences. The count scales with identity complexity (e.g. 8 anchors โ 24 examples each) so every anchor is represented multiple times, giving the manifold better covariance coverage. The prompt asks for at least one compliant and one violation per anchor.
4. **Both sets are embedded** with `text-embedding-3-small` (same model as runtime โ keeps geometry consistent)
5. **Fits a Gaussian** to the compliant cluster โ mean and covariance via PCA, producing the **identity manifold**
6. **Sets the surprise threshold** โ the 90th-percentile Mahalanobis distance of compliant examples. With โฅ20 examples the empirical distribution is reliable enough that the 95th percentile over-fits to compliant outliers; 90th is a tighter, more stable gate.
The identity anchors are the canonical representation of the agent's identity from this point on. Every downstream LLM call โ compaction audits, novel theme labels โ references the named anchors rather than the raw file contents.
Everything is cached to `.openclaw/identity-cache.json`. Nothing is recomputed until you explicitly recalibrate.
> **Upgrading from an older build:** run `/identity recalibrate` โ the cache schema now includes identity anchors and old caches without them will trigger automatic re-calibration.
---
### 2 ยท Runtime *(every inbound message)*
Every piece of text entering the LLM โ user messages, tool results, subagent replies โ flows through:
```
message โโโบ embed (OpenAI) โโโบ surprise = ยฝยทMahalanobisยฒ(x, compliance)
โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ
โ weight = max(0, sโฯ) โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโ
โ weight > 0? โ
โ โ ingest into coreset โ
โ โ novelty check โ new mode? โ LLM theme label โ
โ โ
โ weight = 0? โ skip (identity-neutral) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ
drift scores from coreset โโโบ maxDrift / avgDrift โโโบ level
```
**Drift levels:** `nominal` ยท `warning` ยท `alert`
Without `OPENAI_API_KEY`, embedding fails and the hook no-ops with a logged error. Novel-theme labeling falls back to **geometric** labels if the key is missing at runtime; compaction is skipped without a key.
---
### 3 ยท Identity compaction *(LLM audit)*
The coreset records *where* drift lives geometrically. **Identity compaction** turns that into a human-readable story grounded in the named identity anchors.
Every `compactionInterval` off-identity messages (after ingestion, i.e. `surpriseWeight > 0`), when `OPENAI_API_KEY` is set:
1. **Clusters** coreset points via connected-component BFS
2. **Describes** each cluster using the actual stored message texts
3. **Asks the LLM** for theme labels, severities, and a 1โ2 sentence narrative โ with the identity anchors list in the prompt so every theme and the narrative explicitly references named aspects of the declared identity
4. **Replaces** the theme list with the consolidated analysis
The compaction is **non-destructive** โ the coreset (geometric ground truth) is never rewritten by the LLM.
**Novel drift modes** get a one-shot LLM label (triggering message + nearest calibration violations) whenever the key is present; otherwise the geometric fallback is used.
---
### 4 ยท The coreset โ why signals don't cancel
Unlike a running mean, the **bounded weighted coreset** (128 points by default) keeps distinct drift directions separate:
- Opposing drifts cannot cancel each other out
- Old signals aren't lost to exponential forgetting
- When full, the *closest pair* merges โ preserving distinct drift modes while bounding memory
Each coreset point carries the **original message texts** that contributed to it, so drift themes and audit narratives are grounded in real behaviour, not just geometry.
---
### 5 ยท Why distance-from-compliance works
When an agent has a well-defined goal โ *"help with code"*, *"answer HR questions"* โ its compliant responses form a **tight cluster** in embedding space. Violations live in the open-ended complement.
```
Embedding space (conceptual 2D projection)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ โ โ
โ โ โโโโโโโโโโโ โ โ
โ โ โ โ โ โ โ โ โ
โ โ โ โ โ โ โ โ โ
โ โ โโโโโโฌโโโโโ โ
โ โ โ
โ violations โ compliance โ
โ (open-ended) โ (tight cluster) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
We never need to enumerate violations. We only ask: *how far is this message from the known compliance region?* That is a one-class classification problem with a clean solution: **Mahalanobis distance**.
---
---
## Installation
### Prerequisites
- [OpenClaw](https://docs.openclaw.ai/) installed and running
- Node.js 22.16+ (Node 24 recommended)
- `OPENAI_API_KEY` โ embeddings and LLM calls (calibration is cached; embeddings and audit calls are ongoing)
---
### Option A โ Install from npm *(recommended)*
```bash
npm install @intrinsec-ai/openclaw-identity-plane
npx openclaw plugins install -l node_modules/@intrinsec-ai/openclaw-identity-plane
```
### Option B โ Clone directly
```bash
git clone https://github.com/intrinsec-ai/openclaw-identity-plane \
~/.openclaw/extensions/identity
cd ~/.openclaw/extensions/identity
npm install
```
### Option C โ Link a local checkout *(for development)*
```bash
cd /path/to/openclaw-identity-plane
npm install
openclaw plugins install --link /path/to/openclaw-identity-plane
```
---
### Enable the plugin
```bash
openclaw plugins enable identity
```
Or manually in `~/.openclaw/openclaw.json`:
```json
{
"plugins": {
"load": { "paths": ["~/.openclaw/extensions/identity"] },
"entries": { "identity": { "enabled": true } }
}
}
```
Remove deprecated config keys if present: `embedModel`, `agenticAudit` (no longer used).
---
### Set your API key
```bash
export OPENAI_API_KEY=sk-...
```
Optional: route both chat and embeddings through a compatible gateway:
```bash
export OPENAI_BASE_URL=http://localhost:11434/v1
```
Set `calibrationModel` to a model your endpoint serves. The embedding client uses the **same** `OPENAI_BASE_URL`; the server must implement OpenAI-style `**POST /v1/embeddings`** for `text-embedding-3-small` (or map that model id to a local embedder), or calibration and runtime embedding wil
... (truncated)
tools
Comments
Sign in to leave a comment