← Back to Plugins
Tools

Observability

xring By xring 👁 63 views ▲ 0 votes

A COPY FROM https://github.com/henrikrexed/openclaw-observability-plugin

GitHub

Configuration Example

{
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "http://localhost:4318",
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "logs": true
    }
  }
}

README

# OpenClaw Observability

[![Documentation](https://img.shields.io/badge/docs-GitHub%20Pages-blue)](https://henrikrexed.github.io/openclaw-observability-plugin/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

OpenTelemetry observability for [OpenClaw](https://github.com/openclaw/openclaw) AI agents.

๐Ÿ“– **[Full Documentation](https://henrikrexed.github.io/openclaw-observability-plugin/)** โ€” Setup guides, configuration reference, and backend examples.

## Two Approaches to Observability

This repository documents **two complementary approaches** to monitoring OpenClaw:

| Approach | Best For | Setup Complexity |
|----------|----------|------------------|
| **Official Plugin** | Operational metrics, Gateway health, cost tracking | Simple config |
| **Custom Plugin** | Deep tracing, tool call visibility, request lifecycle | Plugin installation |

**Recommendation:** Use both for complete observability.

---

## Approach 1: Official Diagnostics Plugin (Built-in)

OpenClaw v2026.2+ includes **built-in OpenTelemetry support**. Just add to `openclaw.json`:

```json
{
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "http://localhost:4318",
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "logs": true
    }
  }
}
```

Then restart:

```bash
openclaw gateway restart
```

### What It Captures

**Metrics:**
- `openclaw.tokens` โ€” Token usage by type (input/output/cache)
- `openclaw.cost.usd` โ€” Estimated model cost
- `openclaw.run.duration_ms` โ€” Agent run duration
- `openclaw.context.tokens` โ€” Context window usage
- `openclaw.webhook.*` โ€” Webhook processing stats
- `openclaw.message.*` โ€” Message processing stats
- `openclaw.queue.*` โ€” Queue depth and wait times
- `openclaw.session.*` โ€” Session state transitions

**Traces:** Model usage, webhook processing, message processing, stuck sessions

**Logs:** All Gateway logs via OTLP with severity, subsystem, and code location

---

## Approach 2: Custom Hook-Based Plugin (This Repo)

For **deeper observability**, install the custom plugin from this repo. It uses OpenClaw's typed plugin hooks to capture the full agent lifecycle.

### What It Adds

**Connected Traces:**
```
openclaw.request (root span)
โ”œโ”€โ”€ openclaw.agent.turn
โ”‚   โ”œโ”€โ”€ tool.Read (file read)
โ”‚   โ”œโ”€โ”€ tool.exec (shell command)
โ”‚   โ”œโ”€โ”€ tool.Write (file write)
โ”‚   โ””โ”€โ”€ tool.web_search
โ””โ”€โ”€ (child spans connected via trace context)
```

Plus standalone spans on session commands (`openclaw.command.new|reset|stop`) and gateway startup (`openclaw.gateway.startup`).

**Per-Tool Visibility:**
- Individual spans for each tool call
- Tool execution time
- Result size (characters)
- Error tracking per tool

**Request Lifecycle:**
- Full message โ†’ response tracing
- Session context propagation
- Agent turn duration with token breakdown

### Plugin Lifecycle

OpenClaw has two hook registration moments, and the plugin uses both at the right phase:

| Phase | Runs | What the plugin does |
|---|---|---|
| `register()` | Synchronous, before the gateway accepts traffic | Registers **all 15 typed hooks** via `api.on()`, event-stream hooks (`command:*`, `gateway:startup`), the `otel-observability.status` RPC, the `otel` CLI command, the background service, and the optional `otel_status` agent tool. Hooks receive a **lazy telemetry getter** (`() => telemetry`) so they can be wired before the OTel runtime exists. |
| `start()` | Async, after the gateway is ready | Calls `initTelemetry()` to build the `TracerProvider`/`MeterProvider` and register them globally, conditionally initializes OpenLLMetry wraps when `traces` is on, and subscribes to OpenClaw diagnostic events for cost/token data. |
| `stop()` | Async, on gateway reload/shutdown | Clears the 60 s stale-session sweeper `setInterval` ([ISI-522](https://github.com/henrikrexed/openclaw-observability-plugin/commit/b668a4f)), unsubscribes from diagnostics, and calls `telemetry.shutdown()` to flush exporters. |

**Why this matters:** OpenClaw snapshots typed hooks at registration time. If hooks are registered from `start()` instead of `register()`, the gateway never sees them and **hooks register but never fire**. PR #6 (see [ISI-515](https://github.com/henrikrexed/openclaw-observability-plugin/pull/6)) moved them back to `register()` and introduced the lazy getter so handlers no-op cleanly during the brief `register()` โ†’ `start()` window.

### Installation

1. Clone this repository:
   ```bash
   git clone https://github.com/henrikrexed/openclaw-observability-plugin.git
   ```

2. Add to your `openclaw.json`:
   ```json
   {
     "plugins": {
       "load": {
         "paths": ["/path/to/openclaw-observability-plugin"]
       },
       "entries": {
         "otel-observability": {
           "enabled": true
         }
       }
     }
   }
   ```

3. Clear cache and restart:
   ```bash
   rm -rf /tmp/jiti
   systemctl --user restart openclaw-gateway
   ```

### Validate your first trace

Send a message that triggers at least one tool call and check Gateway logs for the lifecycle markers:

```bash
journalctl --user -u openclaw-gateway -f | grep -E '\[otel\]'
```

You should see, in this order:

```
[otel] Registered message_received hook (via api.on)
[otel] Registered before_agent_start hook (via api.on)
[otel] Registered tool_result_persist hook (via api.on)
[otel] Registered agent_end hook (via api.on)
[otel] Registered command event hooks (via api.registerHook)
[otel] Registered gateway:startup hook (via api.registerHook)
[otel] Starting OpenTelemetry observability...
[otel] โœ… Observability pipeline active
[otel]   Traces=true Metrics=true Logs=true
[otel]   Endpoint=http://localhost:4318 (http)
```

Then, on the next inbound message, the debug log confirms hooks are live:

```
[otel] Root span started for session=<sessionKey>
[otel] Agent turn span started: agent=<agentId>, session=<sessionKey>
```

In your backend, look for an `openclaw.request` span with at least one `openclaw.agent.turn` child. A healthy trace has `openclaw.request` โ†’ `openclaw.agent.turn` โ†’ one or more `tool.*` children.

---

## Comparing the Two Approaches

| Feature | Official Plugin | Custom Plugin |
|---------|-----------------|---------------|
| Token metrics | โœ… Per model | โœ… Per session + model |
| Cost tracking | โœ… Yes | โœ… Yes (from diagnostics) |
| Gateway health | โœ… Webhooks, queues, sessions | โŒ Not focused |
| Session state | โœ… State transitions | โŒ Not tracked |
| **Tool call tracing** | โŒ No | โœ… Individual tool spans |
| **Request lifecycle** | โŒ No | โœ… Full request โ†’ response |
| **Connected traces** | โŒ Separate spans | โœ… Parent-child hierarchy |
| Setup complexity | ๐ŸŸข Config only | ๐ŸŸก Plugin installation |

---

## Backend Examples

### Dynatrace (Direct)

```json
{
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "https://{env-id}.live.dynatrace.com/api/v2/otlp",
      "headers": {
        "Authorization": "Api-Token {your-token}"
      },
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "logs": true
    }
  }
}
```

### Grafana Cloud

```json
{
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "https://otlp-gateway-{region}.grafana.net/otlp",
      "headers": {
        "Authorization": "Basic {base64-credentials}"
      },
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true
    }
  }
}
```

### Local OTel Collector

```json
{
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "http://localhost:4318",
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "logs": true
    }
  }
}
```

---

## Configuration Reference

### Official Plugin Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `diagnostics.enabled` | boolean | false | Enable diagnostics system |
| `diagnostics.otel.enabled` | boolean | false | Enable OTel export |
| `diagnostics.otel.endpoint` | string | โ€” | OTLP endpoint URL |
| `diagnostics.otel.protocol` | string | "http/protobuf" | Protocol |
| `diagnostics.otel.headers` | object | โ€” | Custom headers |
| `diagnostics.otel.serviceName` | string | "openclaw" | Service name |
| `diagnostics.otel.traces` | boolean | true | Enable traces |
| `diagnostics.otel.metrics` | boolean | true | Enable metrics |
| `diagnostics.otel.logs` | boolean | false | Enable logs |
| `diagnostics.otel.sampleRate` | number | 1.0 | Trace sampling (0-1) |

### Custom Plugin Options

> **Important:** Do NOT add a `config` block inside the plugin entry โ€” OpenClaw's plugin framework rejects unknown properties. The plugin reads its configuration from the `diagnostics.otel` section instead.

The following settings are controlled via the `diagnostics.otel` config block:

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `endpoint` | string | `http://localhost:4318` | OTLP endpoint URL |
| `serviceName` | string | `openclaw-gateway` | Service name |
| `protocol` | string | `http/protobuf` | OTLP protocol |
| `traces` | boolean | true | Enable traces |
| `metrics` | boolean | true | Enable metrics |
| `logs` | boolean | true | Enable logs |

---

## Documentation

- [Getting Started](./docs/getting-started.md) โ€” Setup guide
- [Configuration](./docs/configuration.md) โ€” All options
- [Architecture](./docs/architecture.md) โ€” How it works
- [Limitations](./docs/limitations.md) โ€” Known constraints
- [Backends](./docs/backends/) โ€” Backend-specific guides

---

## Optional: Kernel-Level Security with Tetragon

For **defense in depth**, add [Tetragon](https://tetragon.io) eBPF-based monitoring. While the plugins above capture application-level telemetry, Tetragon sees what happens at the kernel level โ€” file access, process execution, netw

... (truncated)
tools

Comments

Sign in to leave a comment

Loading comments...