Voice
Reachy
OpenClaw plugin for Reachy robots β WebSocket bridge with emotion, voice, and hardware control
Install
openclaw plugins install @seeed-studio/openclaw-reachy
Configuration Example
// Node.js / browser WebSocket client example
const ws = new WebSocket("ws://127.0.0.1:18790/desktop-robot");
ws.onopen = () => {
// 1. Start a session
ws.send(JSON.stringify({ type: "hello" }));
};
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
switch (msg.type) {
case "welcome":
// 2. Session established β send a message
ws.send(JSON.stringify({ type: "message", text: "Hello!" }));
break;
case "emotion":
console.log("Robot emotion:", msg.emotion); // e.g. "happy"
break;
case "stream_delta":
console.log("TTS chunk:", msg.text); // feed to TTS engine
break;
case "robot_command":
console.log("Execute:", msg.action, msg.params); // e.g. "move_head" {yaw: 20}
break;
}
};
README
<p align="center">
<img src="banner.png" alt="openclaw-reachy banner" width="800" />
</p>
# openclaw-reachy
[](https://www.npmjs.com/package/@seeed-studio/openclaw-reachy)
[](LICENSE)
[](https://github.com/openclaw/openclaw)
OpenClaw plugin that turns [Reachy Mini](https://www.pollen-robotics.com/reachy-mini/) into a conversational AI companion.
<!-- TODO: Add demo GIF here β 15s clip showing: user speaks β robot thinks (emotion) β robot replies + moves -->
## What is this?
openclaw-reachy bridges any LLM (via [OpenClaw](https://github.com/openclaw/openclaw)) to a Reachy Mini robot over WebSocket. The robot listens (STT is client-side), the AI thinks and responds with streaming text for TTS, and the plugin adds an **emotion channel** β the AI's emotional state (happy, thinking, curious...) drives the robot's facial expression in real-time, independently from speech. Hardware commands like head movement and dances are forwarded as tool calls.
## Key Features
- **Emotion channel** β AI emotions (`[emotion:happy]`) are extracted from the text stream and sent as separate messages, so the robot reacts before TTS finishes
- **Streaming text for TTS** β chunked delivery with configurable min-chars and flush interval, optimized for real-time speech synthesis
- **Robot command forwarding** β `reachy_*` tool calls (move head, dance, capture image) are sent as `robot_command` messages for client-side SDK execution
- **Background task delegation** β complex work is handed off to sub-agents; the robot acknowledges immediately and delivers results when ready
- **Auto-loaded tool schemas** β define robot tools in a single `SKILL.md` file; the plugin parses it into JSON schemas at startup
## Table of Contents
- [Install](#install)
- [Quickstart](#quickstart)
- [Configuration](#configuration)
- [WebSocket Protocol](#websocket-protocol)
- [Emotion Channel](#emotion-channel)
- [Robot Commands](#robot-commands)
- [Typical Flow](#typical-flow)
- [Authentication](#authentication)
- [State Machine](#state-machine)
- [Auto-loading Tools from SKILL.md](#auto-loading-tools-from-skillmd)
- [Source Structure](#source-structure)
- [Contributing](#contributing)
- [Acknowledgements](#acknowledgements)
- [License](#license)
## Install
```bash
openclaw plugins install @seeed-studio/openclaw-reachy
```
Then use the onboarding wizard:
```bash
openclaw setup
# select Reachy and follow the prompts
```
## Quickstart
Start the OpenClaw gateway, then connect via WebSocket:
```javascript
// Node.js / browser WebSocket client example
const ws = new WebSocket("ws://127.0.0.1:18790/desktop-robot");
ws.onopen = () => {
// 1. Start a session
ws.send(JSON.stringify({ type: "hello" }));
};
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
switch (msg.type) {
case "welcome":
// 2. Session established β send a message
ws.send(JSON.stringify({ type: "message", text: "Hello!" }));
break;
case "emotion":
console.log("Robot emotion:", msg.emotion); // e.g. "happy"
break;
case "stream_delta":
console.log("TTS chunk:", msg.text); // feed to TTS engine
break;
case "robot_command":
console.log("Execute:", msg.action, msg.params); // e.g. "move_head" {yaw: 20}
break;
}
};
```
## Configuration
All fields are optional. Shown with defaults:
```jsonc
{
"channels": {
"desktop-robot": {
"enabled": true,
"serve": {
"port": 18790,
"bind": "127.0.0.1",
"path": "/desktop-robot",
},
"auth": {
"token": "", // if set, clients must provide this token
"allowAnonymous": false,
},
"session": {
"idleTimeoutMs": 1800000, // 30 min β sessions exceeding this are closed
"maxSessions": 5,
},
"streaming": {
"minChunkChars": 10, // min chars per stream_delta
"flushIntervalMs": 100,
},
"responseModel": "", // e.g. "dashscope/kimi-k2.5"
"responseSystemPrompt": "", // override default voice prompt
"agentId": "desktop-robot",
"tools": [], // allowed tools, e.g. ["sessions_spawn", "cron"]
// empty = use core defaults
"dmPolicy": "open",
"allowFrom": [],
},
},
}
```
## WebSocket Protocol
All messages are JSON. Each has a `type` field.
### Inbound (Client β Server)
| Type | Fields | Description |
| -------------- | -------------------------- | ----------------------------------------------------------------- |
| `hello` | `sessionId?`, `authToken?` | Start session. Optional `sessionId` to resume. |
| `message` | `text` | Send user message (from STT). |
| `interrupt` | β | Barge-in: abort current response. |
| `state_change` | `state` | Client reports state: `"listening"`, `"idle"`, `"speaking_done"`. |
| `ping` | `ts?` | Keepalive. Server replies with `pong`. |
| `robot_result` | `commandId`, `result` | Client reports result of a robot command execution. |
### Outbound (Server β Client)
| Type | Fields | Description |
| ---------------- | ---------------------------------------- | ------------------------------------------------------------------------------- |
| `welcome` | `sessionId` | Session established. |
| `state` | `state` | Server state: `"idle"` or `"processing"`. |
| `stream_start` | `runId` | Response stream begins. |
| `stream_delta` | `text`, `runId` | Incremental text chunk (feed to TTS). |
| `stream_end` | `runId`, `fullText` | Response complete. |
| `stream_abort` | `runId`, `reason` | Response aborted (interrupt or error). |
| `tool_start` | `toolName`, `runId` | Agent started a tool call. |
| `tool_end` | `toolName`, `runId` | Agent finished a tool call. |
| `emotion` | `emotion` | AI emotion to display (e.g. `"happy"`, `"thinking"`). |
| `robot_command` | `action`, `params`, `commandId` | Hardware command for client-side SDK execution. |
| `task_spawned` | `taskLabel`, `taskRunId` | Background task started via `sessions_spawn`. |
| `task_completed` | `taskRunId`, `summary`, `resultPreview?` | Background task finished. `resultPreview` is first ~200 chars for TTS briefing. |
| `error` | `message`, `code?` | Error. |
| `pong` | `ts?` | Reply to `ping`. |
## Emotion Channel
The AI includes an emotion tag at the start of each reply (e.g. `[emotion:happy]`). The plugin strips the tag from the text stream and sends a separate `emotion` message so the client can update the robot's facial expression independently from TTS.
Available emotions: `happy`, `sad`, `angry`, `surprised`, `thinking`, `confused`, `curious`, `excited`, `laugh`, `fear`, `neutral`, `listening`, `agreeing`, `disagreeing`.
```text
Client Server
β β
βββββ message {text} ββββββββββΆβ
βββββ emotion {happy} ββββββββββ β update face immediately
βββββ stream_delta (ΓN) βββββββ β TTS text (emotion tag stripped)
βββββ stream_end ββββββββββββββ
```
## Robot Commands
When the AI invokes a tool prefixed with `reachy_` (e.g. `reachy_move_head`, `reachy_dance`), the plugin forwards it as a `robot_command` message for client-side SDK execution. The client executes the command via its local robot SDK and optionally reports the result via `robot_result`.
Tool schemas are auto-registered from `SKILL.md` at startup β see [Auto-loading tools from SKILL.md](#auto-loading-tools-from-skillmd).
```text
Client Server
β β
βββββ robot_command ββββββββββββ action="move_head", params={yaw:20}, commandId="abc"
β (execute via robot SDK) β
βββββ robot_result βββββββββββββΆβ commandId="abc", result={ok: true}
```
To enable robot tools, add the tool names to the `tools` allowlist:
```jsonc
{
"channels": {
"desktop-robot": {
"tools": ["sessions_spawn", "reachy_move_head", "reachy_dance", "reachy_capture_image"],
},
},
}
```
## Typical Flow
```text
Client Server
β β
βββββ hello βββββββββββββββββββΆβ
βββββ welcome {sessionId} βββββ
β β
βββββ message {text} ββββββββββΆβ
βββββ state {processing} ββββββ
βββββ emotion {thinking} ββββββ β robot shows thinking face
βββββ stream_start ββββββββββββ
βββββ stream_delta (ΓN) βββββββ β feed chunks to TTS
βββββ stream_end {fullTex
... (truncated)
voice
Comments
Sign in to leave a comment