← Back to Plugins
Voice

Reachy

suharvest By suharvest 👁 4 views ▲ 0 votes

OpenClaw plugin for Reachy robots β€” WebSocket bridge with emotion, voice, and hardware control

GitHub

Install

openclaw plugins install @seeed-studio/openclaw-reachy

Configuration Example

// Node.js / browser WebSocket client example
const ws = new WebSocket("ws://127.0.0.1:18790/desktop-robot");

ws.onopen = () => {
  // 1. Start a session
  ws.send(JSON.stringify({ type: "hello" }));
};

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  switch (msg.type) {
    case "welcome":
      // 2. Session established β€” send a message
      ws.send(JSON.stringify({ type: "message", text: "Hello!" }));
      break;
    case "emotion":
      console.log("Robot emotion:", msg.emotion); // e.g. "happy"
      break;
    case "stream_delta":
      console.log("TTS chunk:", msg.text); // feed to TTS engine
      break;
    case "robot_command":
      console.log("Execute:", msg.action, msg.params); // e.g. "move_head" {yaw: 20}
      break;
  }
};

README

<p align="center">
  <img src="banner.png" alt="openclaw-reachy banner" width="800" />
</p>

# openclaw-reachy

[![npm](https://img.shields.io/npm/v/@seeed-studio/openclaw-reachy)](https://www.npmjs.com/package/@seeed-studio/openclaw-reachy)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![OpenClaw Plugin](https://img.shields.io/badge/OpenClaw-plugin-blue)](https://github.com/openclaw/openclaw)

OpenClaw plugin that turns [Reachy Mini](https://www.pollen-robotics.com/reachy-mini/) into a conversational AI companion.

<!-- TODO: Add demo GIF here β€” 15s clip showing: user speaks β†’ robot thinks (emotion) β†’ robot replies + moves -->

## What is this?

openclaw-reachy bridges any LLM (via [OpenClaw](https://github.com/openclaw/openclaw)) to a Reachy Mini robot over WebSocket. The robot listens (STT is client-side), the AI thinks and responds with streaming text for TTS, and the plugin adds an **emotion channel** β€” the AI's emotional state (happy, thinking, curious...) drives the robot's facial expression in real-time, independently from speech. Hardware commands like head movement and dances are forwarded as tool calls.

## Key Features

- **Emotion channel** β€” AI emotions (`[emotion:happy]`) are extracted from the text stream and sent as separate messages, so the robot reacts before TTS finishes
- **Streaming text for TTS** β€” chunked delivery with configurable min-chars and flush interval, optimized for real-time speech synthesis
- **Robot command forwarding** β€” `reachy_*` tool calls (move head, dance, capture image) are sent as `robot_command` messages for client-side SDK execution
- **Background task delegation** β€” complex work is handed off to sub-agents; the robot acknowledges immediately and delivers results when ready
- **Auto-loaded tool schemas** β€” define robot tools in a single `SKILL.md` file; the plugin parses it into JSON schemas at startup

## Table of Contents

- [Install](#install)
- [Quickstart](#quickstart)
- [Configuration](#configuration)
- [WebSocket Protocol](#websocket-protocol)
- [Emotion Channel](#emotion-channel)
- [Robot Commands](#robot-commands)
- [Typical Flow](#typical-flow)
- [Authentication](#authentication)
- [State Machine](#state-machine)
- [Auto-loading Tools from SKILL.md](#auto-loading-tools-from-skillmd)
- [Source Structure](#source-structure)
- [Contributing](#contributing)
- [Acknowledgements](#acknowledgements)
- [License](#license)

## Install

```bash
openclaw plugins install @seeed-studio/openclaw-reachy
```

Then use the onboarding wizard:

```bash
openclaw setup
# select Reachy and follow the prompts
```

## Quickstart

Start the OpenClaw gateway, then connect via WebSocket:

```javascript
// Node.js / browser WebSocket client example
const ws = new WebSocket("ws://127.0.0.1:18790/desktop-robot");

ws.onopen = () => {
  // 1. Start a session
  ws.send(JSON.stringify({ type: "hello" }));
};

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  switch (msg.type) {
    case "welcome":
      // 2. Session established β€” send a message
      ws.send(JSON.stringify({ type: "message", text: "Hello!" }));
      break;
    case "emotion":
      console.log("Robot emotion:", msg.emotion); // e.g. "happy"
      break;
    case "stream_delta":
      console.log("TTS chunk:", msg.text); // feed to TTS engine
      break;
    case "robot_command":
      console.log("Execute:", msg.action, msg.params); // e.g. "move_head" {yaw: 20}
      break;
  }
};
```

## Configuration

All fields are optional. Shown with defaults:

```jsonc
{
  "channels": {
    "desktop-robot": {
      "enabled": true,
      "serve": {
        "port": 18790,
        "bind": "127.0.0.1",
        "path": "/desktop-robot",
      },
      "auth": {
        "token": "", // if set, clients must provide this token
        "allowAnonymous": false,
      },
      "session": {
        "idleTimeoutMs": 1800000, // 30 min β€” sessions exceeding this are closed
        "maxSessions": 5,
      },
      "streaming": {
        "minChunkChars": 10, // min chars per stream_delta
        "flushIntervalMs": 100,
      },
      "responseModel": "", // e.g. "dashscope/kimi-k2.5"
      "responseSystemPrompt": "", // override default voice prompt
      "agentId": "desktop-robot",
      "tools": [], // allowed tools, e.g. ["sessions_spawn", "cron"]
      // empty = use core defaults
      "dmPolicy": "open",
      "allowFrom": [],
    },
  },
}
```

## WebSocket Protocol

All messages are JSON. Each has a `type` field.

### Inbound (Client β†’ Server)

| Type           | Fields                     | Description                                                       |
| -------------- | -------------------------- | ----------------------------------------------------------------- |
| `hello`        | `sessionId?`, `authToken?` | Start session. Optional `sessionId` to resume.                    |
| `message`      | `text`                     | Send user message (from STT).                                     |
| `interrupt`    | β€”                          | Barge-in: abort current response.                                 |
| `state_change` | `state`                    | Client reports state: `"listening"`, `"idle"`, `"speaking_done"`. |
| `ping`         | `ts?`                      | Keepalive. Server replies with `pong`.                            |
| `robot_result` | `commandId`, `result`      | Client reports result of a robot command execution.               |

### Outbound (Server β†’ Client)

| Type             | Fields                                   | Description                                                                     |
| ---------------- | ---------------------------------------- | ------------------------------------------------------------------------------- |
| `welcome`        | `sessionId`                              | Session established.                                                            |
| `state`          | `state`                                  | Server state: `"idle"` or `"processing"`.                                       |
| `stream_start`   | `runId`                                  | Response stream begins.                                                         |
| `stream_delta`   | `text`, `runId`                          | Incremental text chunk (feed to TTS).                                           |
| `stream_end`     | `runId`, `fullText`                      | Response complete.                                                              |
| `stream_abort`   | `runId`, `reason`                        | Response aborted (interrupt or error).                                          |
| `tool_start`     | `toolName`, `runId`                      | Agent started a tool call.                                                      |
| `tool_end`       | `toolName`, `runId`                      | Agent finished a tool call.                                                     |
| `emotion`        | `emotion`                                | AI emotion to display (e.g. `"happy"`, `"thinking"`).                           |
| `robot_command`  | `action`, `params`, `commandId`          | Hardware command for client-side SDK execution.                                 |
| `task_spawned`   | `taskLabel`, `taskRunId`                 | Background task started via `sessions_spawn`.                                   |
| `task_completed` | `taskRunId`, `summary`, `resultPreview?` | Background task finished. `resultPreview` is first ~200 chars for TTS briefing. |
| `error`          | `message`, `code?`                       | Error.                                                                          |
| `pong`           | `ts?`                                    | Reply to `ping`.                                                                |

## Emotion Channel

The AI includes an emotion tag at the start of each reply (e.g. `[emotion:happy]`). The plugin strips the tag from the text stream and sends a separate `emotion` message so the client can update the robot's facial expression independently from TTS.

Available emotions: `happy`, `sad`, `angry`, `surprised`, `thinking`, `confused`, `curious`, `excited`, `laugh`, `fear`, `neutral`, `listening`, `agreeing`, `disagreeing`.

```text
Client                          Server
  β”‚                               β”‚
  │──── message {text} ─────────▢│
  │◀─── emotion {happy} ─────────│  ← update face immediately
  │◀─── stream_delta (Γ—N) ──────│  ← TTS text (emotion tag stripped)
  │◀─── stream_end ─────────────│
```

## Robot Commands

When the AI invokes a tool prefixed with `reachy_` (e.g. `reachy_move_head`, `reachy_dance`), the plugin forwards it as a `robot_command` message for client-side SDK execution. The client executes the command via its local robot SDK and optionally reports the result via `robot_result`.

Tool schemas are auto-registered from `SKILL.md` at startup β€” see [Auto-loading tools from SKILL.md](#auto-loading-tools-from-skillmd).

```text
Client                          Server
  β”‚                               β”‚
  │◀─── robot_command ───────────│  action="move_head", params={yaw:20}, commandId="abc"
  β”‚  (execute via robot SDK)      β”‚
  │──── robot_result ────────────▢│  commandId="abc", result={ok: true}
```

To enable robot tools, add the tool names to the `tools` allowlist:

```jsonc
{
  "channels": {
    "desktop-robot": {
      "tools": ["sessions_spawn", "reachy_move_head", "reachy_dance", "reachy_capture_image"],
    },
  },
}
```

## Typical Flow

```text
Client                          Server
  β”‚                               β”‚
  │──── hello ──────────────────▢│
  │◀─── welcome {sessionId} ────│
  β”‚                               β”‚
  │──── message {text} ─────────▢│
  │◀─── state {processing} ─────│
  │◀─── emotion {thinking} ─────│  ← robot shows thinking face
  │◀─── stream_start ───────────│
  │◀─── stream_delta (Γ—N) ──────│  ← feed chunks to TTS
  │◀─── stream_end {fullTex

... (truncated)
voice

Comments

Sign in to leave a comment

Loading comments...