Tools
Observability
A COPY FROM https://github.com/henrikrexed/openclaw-observability-plugin
Configuration Example
{
"diagnostics": {
"enabled": true,
"otel": {
"enabled": true,
"endpoint": "http://localhost:4318",
"serviceName": "openclaw-gateway",
"traces": true,
"metrics": true,
"logs": true
}
}
}
README
# OpenClaw Observability
[](https://henrikrexed.github.io/openclaw-observability-plugin/)
[](https://opensource.org/licenses/Apache-2.0)
OpenTelemetry observability for [OpenClaw](https://github.com/openclaw/openclaw) AI agents.
๐ **[Full Documentation](https://henrikrexed.github.io/openclaw-observability-plugin/)** โ Setup guides, configuration reference, and backend examples.
## Two Approaches to Observability
This repository documents **two complementary approaches** to monitoring OpenClaw:
| Approach | Best For | Setup Complexity |
|----------|----------|------------------|
| **Official Plugin** | Operational metrics, Gateway health, cost tracking | Simple config |
| **Custom Plugin** | Deep tracing, tool call visibility, request lifecycle | Plugin installation |
**Recommendation:** Use both for complete observability.
---
## Approach 1: Official Diagnostics Plugin (Built-in)
OpenClaw v2026.2+ includes **built-in OpenTelemetry support**. Just add to `openclaw.json`:
```json
{
"diagnostics": {
"enabled": true,
"otel": {
"enabled": true,
"endpoint": "http://localhost:4318",
"serviceName": "openclaw-gateway",
"traces": true,
"metrics": true,
"logs": true
}
}
}
```
Then restart:
```bash
openclaw gateway restart
```
### What It Captures
**Metrics:**
- `openclaw.tokens` โ Token usage by type (input/output/cache)
- `openclaw.cost.usd` โ Estimated model cost
- `openclaw.run.duration_ms` โ Agent run duration
- `openclaw.context.tokens` โ Context window usage
- `openclaw.webhook.*` โ Webhook processing stats
- `openclaw.message.*` โ Message processing stats
- `openclaw.queue.*` โ Queue depth and wait times
- `openclaw.session.*` โ Session state transitions
**Traces:** Model usage, webhook processing, message processing, stuck sessions
**Logs:** All Gateway logs via OTLP with severity, subsystem, and code location
---
## Approach 2: Custom Hook-Based Plugin (This Repo)
For **deeper observability**, install the custom plugin from this repo. It uses OpenClaw's typed plugin hooks to capture the full agent lifecycle.
### What It Adds
**Connected Traces:**
```
openclaw.request (root span)
โโโ openclaw.agent.turn
โ โโโ tool.Read (file read)
โ โโโ tool.exec (shell command)
โ โโโ tool.Write (file write)
โ โโโ tool.web_search
โโโ (child spans connected via trace context)
```
Plus standalone spans on session commands (`openclaw.command.new|reset|stop`) and gateway startup (`openclaw.gateway.startup`).
**Per-Tool Visibility:**
- Individual spans for each tool call
- Tool execution time
- Result size (characters)
- Error tracking per tool
**Request Lifecycle:**
- Full message โ response tracing
- Session context propagation
- Agent turn duration with token breakdown
### Plugin Lifecycle
OpenClaw has two hook registration moments, and the plugin uses both at the right phase:
| Phase | Runs | What the plugin does |
|---|---|---|
| `register()` | Synchronous, before the gateway accepts traffic | Registers **all 15 typed hooks** via `api.on()`, event-stream hooks (`command:*`, `gateway:startup`), the `otel-observability.status` RPC, the `otel` CLI command, the background service, and the optional `otel_status` agent tool. Hooks receive a **lazy telemetry getter** (`() => telemetry`) so they can be wired before the OTel runtime exists. |
| `start()` | Async, after the gateway is ready | Calls `initTelemetry()` to build the `TracerProvider`/`MeterProvider` and register them globally, conditionally initializes OpenLLMetry wraps when `traces` is on, and subscribes to OpenClaw diagnostic events for cost/token data. |
| `stop()` | Async, on gateway reload/shutdown | Clears the 60 s stale-session sweeper `setInterval` ([ISI-522](https://github.com/henrikrexed/openclaw-observability-plugin/commit/b668a4f)), unsubscribes from diagnostics, and calls `telemetry.shutdown()` to flush exporters. |
**Why this matters:** OpenClaw snapshots typed hooks at registration time. If hooks are registered from `start()` instead of `register()`, the gateway never sees them and **hooks register but never fire**. PR #6 (see [ISI-515](https://github.com/henrikrexed/openclaw-observability-plugin/pull/6)) moved them back to `register()` and introduced the lazy getter so handlers no-op cleanly during the brief `register()` โ `start()` window.
### Installation
1. Clone this repository:
```bash
git clone https://github.com/henrikrexed/openclaw-observability-plugin.git
```
2. Add to your `openclaw.json`:
```json
{
"plugins": {
"load": {
"paths": ["/path/to/openclaw-observability-plugin"]
},
"entries": {
"otel-observability": {
"enabled": true
}
}
}
}
```
3. Clear cache and restart:
```bash
rm -rf /tmp/jiti
systemctl --user restart openclaw-gateway
```
### Validate your first trace
Send a message that triggers at least one tool call and check Gateway logs for the lifecycle markers:
```bash
journalctl --user -u openclaw-gateway -f | grep -E '\[otel\]'
```
You should see, in this order:
```
[otel] Registered message_received hook (via api.on)
[otel] Registered before_agent_start hook (via api.on)
[otel] Registered tool_result_persist hook (via api.on)
[otel] Registered agent_end hook (via api.on)
[otel] Registered command event hooks (via api.registerHook)
[otel] Registered gateway:startup hook (via api.registerHook)
[otel] Starting OpenTelemetry observability...
[otel] โ
Observability pipeline active
[otel] Traces=true Metrics=true Logs=true
[otel] Endpoint=http://localhost:4318 (http)
```
Then, on the next inbound message, the debug log confirms hooks are live:
```
[otel] Root span started for session=<sessionKey>
[otel] Agent turn span started: agent=<agentId>, session=<sessionKey>
```
In your backend, look for an `openclaw.request` span with at least one `openclaw.agent.turn` child. A healthy trace has `openclaw.request` โ `openclaw.agent.turn` โ one or more `tool.*` children.
---
## Comparing the Two Approaches
| Feature | Official Plugin | Custom Plugin |
|---------|-----------------|---------------|
| Token metrics | โ
Per model | โ
Per session + model |
| Cost tracking | โ
Yes | โ
Yes (from diagnostics) |
| Gateway health | โ
Webhooks, queues, sessions | โ Not focused |
| Session state | โ
State transitions | โ Not tracked |
| **Tool call tracing** | โ No | โ
Individual tool spans |
| **Request lifecycle** | โ No | โ
Full request โ response |
| **Connected traces** | โ Separate spans | โ
Parent-child hierarchy |
| Setup complexity | ๐ข Config only | ๐ก Plugin installation |
---
## Backend Examples
### Dynatrace (Direct)
```json
{
"diagnostics": {
"enabled": true,
"otel": {
"enabled": true,
"endpoint": "https://{env-id}.live.dynatrace.com/api/v2/otlp",
"headers": {
"Authorization": "Api-Token {your-token}"
},
"serviceName": "openclaw-gateway",
"traces": true,
"metrics": true,
"logs": true
}
}
}
```
### Grafana Cloud
```json
{
"diagnostics": {
"enabled": true,
"otel": {
"enabled": true,
"endpoint": "https://otlp-gateway-{region}.grafana.net/otlp",
"headers": {
"Authorization": "Basic {base64-credentials}"
},
"serviceName": "openclaw-gateway",
"traces": true,
"metrics": true
}
}
}
```
### Local OTel Collector
```json
{
"diagnostics": {
"enabled": true,
"otel": {
"enabled": true,
"endpoint": "http://localhost:4318",
"serviceName": "openclaw-gateway",
"traces": true,
"metrics": true,
"logs": true
}
}
}
```
---
## Configuration Reference
### Official Plugin Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `diagnostics.enabled` | boolean | false | Enable diagnostics system |
| `diagnostics.otel.enabled` | boolean | false | Enable OTel export |
| `diagnostics.otel.endpoint` | string | โ | OTLP endpoint URL |
| `diagnostics.otel.protocol` | string | "http/protobuf" | Protocol |
| `diagnostics.otel.headers` | object | โ | Custom headers |
| `diagnostics.otel.serviceName` | string | "openclaw" | Service name |
| `diagnostics.otel.traces` | boolean | true | Enable traces |
| `diagnostics.otel.metrics` | boolean | true | Enable metrics |
| `diagnostics.otel.logs` | boolean | false | Enable logs |
| `diagnostics.otel.sampleRate` | number | 1.0 | Trace sampling (0-1) |
### Custom Plugin Options
> **Important:** Do NOT add a `config` block inside the plugin entry โ OpenClaw's plugin framework rejects unknown properties. The plugin reads its configuration from the `diagnostics.otel` section instead.
The following settings are controlled via the `diagnostics.otel` config block:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `endpoint` | string | `http://localhost:4318` | OTLP endpoint URL |
| `serviceName` | string | `openclaw-gateway` | Service name |
| `protocol` | string | `http/protobuf` | OTLP protocol |
| `traces` | boolean | true | Enable traces |
| `metrics` | boolean | true | Enable metrics |
| `logs` | boolean | true | Enable logs |
---
## Documentation
- [Getting Started](./docs/getting-started.md) โ Setup guide
- [Configuration](./docs/configuration.md) โ All options
- [Architecture](./docs/architecture.md) โ How it works
- [Limitations](./docs/limitations.md) โ Known constraints
- [Backends](./docs/backends/) โ Backend-specific guides
---
## Optional: Kernel-Level Security with Tetragon
For **defense in depth**, add [Tetragon](https://tetragon.io) eBPF-based monitoring. While the plugins above capture application-level telemetry, Tetragon sees what happens at the kernel level โ file access, process execution, netw
... (truncated)
tools
Comments
Sign in to leave a comment