Voice
Claw Voice Chat
Push-to-talk voice chat interface for OpenClaw channels
Install
npm install &&
Configuration Example
// Client -> Server
{"type": "audio", "pcm16": "<base64 PCM16 mono 16kHz>"}
{"type": "text", "text": "hello"}
{"type": "flush"} // end of speech segment
{"type": "reset"} // clear conversation
// Server -> Client
{"type": "ready", "llm": "model-name", "tts_enabled": true}
{"type": "stt_partial", "text": "hel..."}
{"type": "stt_final", "text": "hello"}
{"type": "user_text", "text": "hello"}
{"type": "assistant_delta", "text": "Hi"}
{"type": "assistant_final", "text": "Hi there!"}
{"type": "tts_audio", "audio": "<base64 WAV>"}
{"type": "info", "message": "..."}
{"type": "error", "message": "..."}
README
<p align="center">
<img src="client/public/claw-icon.svg" width="80" alt="Claw Voice Chat" />
</p>
<h1 align="center">Claw-Voice-Chat</h1>
<p align="center">
<strong>Push-to-Talk Voice Chat for OpenClaw Channels</strong><br>
Connect to Telegram, Discord, Slack, or any <a href="https://github.com/openclaw/openclaw">OpenClaw</a> channel and interact using voice or text.<br>
Messages are transcribed via STT, sent to the AI agent, and responses stream back with configurable TTS.
</p>
<p align="center">
<img src="https://img.shields.io/badge/version-1.0.0-blue" alt="Version" />
<img src="https://img.shields.io/badge/node-%3E%3D22-brightgreen" alt="Node.js 22+" />
<img src="https://img.shields.io/badge/python-%3E%3D3.10-blue" alt="Python 3.10+" />
<img src="https://img.shields.io/badge/license-Apache%202.0-orange" alt="License" />
<img src="https://img.shields.io/badge/platform-macOS%20%7C%20Linux%20%7C%20Windows-lightgrey" alt="Platform" />
<img src="https://img.shields.io/badge/STT-faster--whisper-blueviolet" alt="STT" />
<img src="https://img.shields.io/badge/TTS-Browser%20%7C%20OpenAI%20%7C%20edge--tts-blueviolet" alt="TTS" />
</p>
<p align="center">
<a href="#install-with-ai">Install with AI</a> ·
<a href="#features">Features</a> ·
<a href="#quick-start">Quick Start</a> ·
<a href="#tts-providers">TTS</a> ·
<a href="#stt-backend-push-to-talk">STT</a> ·
<a href="#environment-variables">Config</a> ·
<a href="#ai-setup-guide">AI Guide</a> ·
<a href="README.ko.md">νκ΅μ΄</a>
</p>
---
## Install with AI
> **Just paste this to your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, etc.):**
>
> ```
> Install claw-voice-chat following the guide at:
> https://github.com/GreenSheep01201/claw-voice-chat
> ```
>
> The AI will read this README and handle everything automatically.
---
## Table of Contents
- [Features](#features)
- [Architecture](#architecture)
- [Requirements](#requirements)
- [Quick Start](#quick-start)
- [TTS Providers](#tts-providers)
- [Local TTS Server (edge-tts)](#local-tts-server-edge-tts)
- [STT Backend (Push-to-Talk)](#stt-backend-push-to-talk)
- [Usage](#usage)
- [Remote Access (Mobile)](#remote-access-mobile--other-devices)
- [Environment Variables](#environment-variables)
- [AI Setup Guide](#ai-setup-guide)
- [Server Endpoints Reference](#server-endpoints-reference)
- [Key Files](#key-files)
- [WebSocket Protocol](#websocket-protocol-wschat)
- [Troubleshooting](#troubleshooting)
- [License](#license)
---
## Features
- **Push-to-talk** voice input with real-time STT (faster-whisper)
- **Channel bridge** β select any active OpenClaw session and talk to it
- **Streaming transcript** β agent responses arrive token by token
- **Configurable TTS** β Browser (Web Speech API), OpenAI, Qwen/DashScope, or Custom endpoint
- **STT language selection** β language hint for faster-whisper (Korean, English, Japanese, Chinese, etc.)
- **Local TTS server** β included edge-tts wrapper for high-quality TTS without API keys
- **Voice preview** β test TTS voices before saving
- **Model catalog** β browse models from connected providers
- **Text input** β type messages with `Ctrl+Enter` / `Cmd+Enter`
- **Standalone LLM mode** β works without channel connection using a local LLM backend
## Architecture
```
Browser (React + Tailwind)
|
| port 8888 (HTTP + WebSocket)
v
Express Server (Node.js)
|
|--- /bridge/* --> OpenClaw Gateway (port 18789)
|--- /bridge/tts --> TTS Proxy (OpenAI / Qwen / Custom / Local)
|--- /api/* /ws/* --> STT/TTS Backend (port 8766) [optional]
|
v
OpenClaw Gateway --> Telegram, Discord, Slack, Signal, ...
```
### Operating Modes
| Mode | Requirements | Description |
|------|-------------|-------------|
| **Channel Bridge** | Node.js + OpenClaw Gateway | Text/voice to channels. Browser or external TTS for responses. |
| **Standalone LLM** | Node.js + Python STT/TTS backend | Full voice pipeline: push-to-talk, local STT, LLM, audio TTS. |
Both modes can run simultaneously. The Python backend is only needed for push-to-talk STT.
## Requirements
- **Node.js 22+** ([download](https://nodejs.org/))
- **OpenClaw Gateway** running locally (for channel bridge)
- **Python 3.10+** (only for local TTS server or STT backend β optional)
## Quick Start
### 1. Clone and install
```bash
git clone https://github.com/GreenSheep01201/claw-voice-chat.git
cd claw-voice-chat
npm install && cd client && npm install && cd ../server && npm install && cd ..
npm run stt:install # Python STT backend dependencies
```
### 2. Set up OpenClaw Gateway
```bash
npm install -g openclaw
openclaw setup # connect channels, create config
openclaw gateway run # starts on port 18789
```
### 3. Configure environment
```bash
cp .env.example .env
```
Edit `.env`:
```env
PORT=8888
NODE_ENV=production
OPENCLAW_GATEWAY_URL=http://127.0.0.1:18789
OPENCLAW_GATEWAY_TOKEN=your-token-here
# Model catalog β path to openclaw CLI binary or openclaw.mjs entry point.
# Required for OAuth provider models (GitHub Copilot, Google Antigravity, etc.)
# to appear in the Options model picker.
OPENCLAW_CLI=openclaw
```
Get your token:
- **macOS/Linux:** `cat ~/.openclaw/openclaw.json | grep token`
- **Windows:** `type %USERPROFILE%\.openclaw\openclaw.json | findstr token`
Or extract it programmatically:
```bash
python -c "import json; print(json.load(open('$HOME/.openclaw/openclaw.json'))['gateway']['auth']['token'])"
```
### 4. Build and run
```bash
npm run build
npm start # starts Express (8888) + STT backend (8766) concurrently
```
Open http://127.0.0.1:8888
> To run only the Express server without STT: `npm run start:server`
### 5. Development mode
```bash
npm run dev # Vite (5173) + Express (8888) + STT (8766) concurrently
```
## TTS Providers
Configure in **Options > TTS / STT** tab.
| Provider | Setup | Quality | Latency |
|----------|-------|---------|---------|
| **Browser** | Built-in, no setup | Varies by OS | Instant |
| **OpenAI** | API key required | Excellent | ~1s |
| **Qwen/DashScope** | API key required | Good | ~1s |
| **Custom** | Any OpenAI-compatible endpoint | Varies | Varies |
| **Local (edge-tts)** | `pip install edge-tts` | Excellent | ~2s |
### Local TTS Server (edge-tts)
High-quality TTS without API keys. Works on **macOS, Linux, and Windows**.
**Setup:**
```bash
pip install edge-tts fastapi uvicorn
python tts-local/server.py
```
**Connect in UI:**
1. Options > TTS / STT tab
2. Select **Custom**
3. URL: `http://localhost:5050/v1/audio/speech`
4. Leave API Key empty
5. Voice: `sunhi` (Korean), `echo` (English), `nanami` (Japanese)
6. Click **Preview Voice** to test
**Available voices:**
| Language | Voices |
|----------|--------|
| Korean | `sunhi`, `inwoo`, `hyunsu` |
| English | `alloy`, `nova`, `echo`, `onyx`, `shimmer` |
| Japanese | `nanami`, `keita` |
| Chinese | `xiaoxiao`, `yunxi`, `xiaoyi` |
**Run in background:**
```bash
# macOS/Linux
nohup python tts-local/server.py > /tmp/tts-local.log 2>&1 &
# Windows (PowerShell)
Start-Process -NoNewWindow python -ArgumentList "tts-local/server.py"
```
**Verify:**
```bash
curl http://127.0.0.1:5050/health
# {"ok":true,"backend":"edge"}
```
### STT Backend (Push-to-Talk)
The included `stt-backend/` provides real-time speech-to-text using [faster-whisper](https://github.com/SYSTRAN/faster-whisper). It starts automatically with `npm start`.
**Manual startup** (if running separately):
```bash
npm run stt:install # pip install -r stt-backend/requirements.txt
npm run stt:start # starts on port 8766
```
**Configuration:**
STT model size and language can be configured in the **Options > TTS / STT** tab in the UI. Changes take effect on the next WebSocket connection (reconnect).
| Setting | Options | Default | Description |
|---------|---------|---------|-------------|
| **Model Size** | Tiny, Base, Small, Medium, Large v3 | Medium | Accuracy vs speed trade-off |
| **Language** | Auto-detect, Korean, English, Japanese, + 12 more | Auto (browser locale) | Language hint for recognition |
Environment variables (`.env`) set the server-side defaults:
| Variable | Default | Description |
|----------|---------|-------------|
| `STT_MODEL_SIZE` | `medium` | Default model when client doesn't specify |
| `STT_DEVICE` | `auto` | Device: `auto`, `cpu`, `cuda` |
| `STT_COMPUTE_TYPE` | `int8` | Compute type: `int8`, `float16`, `float32` |
Models are cached in memory β switching sizes in the UI loads the new model once and reuses it for subsequent connections.
## Usage
1. Click **Connect** to establish the WebSocket connection
2. Click **Enable Audio** to unlock browser audio
3. Select a channel from the dropdown (e.g., Telegram bot session)
4. **Hold to Speak** β hold the button, speak, release to send
5. Or type in the text box and press `Ctrl+Enter` / `Cmd+Enter`
6. Toggle **TTS On/Off** to control voice output
## Remote Access (Mobile / Other Devices)
Microphone access requires a **secure context** (HTTPS or localhost). When accessing from a phone, tablet, or another machine over plain HTTP, the browser blocks microphone input silently.
**Recommended: Tailscale HTTPS**
[Tailscale](https://tailscale.com/) provides automatic HTTPS certificates for devices on your tailnet.
```bash
# Expose the voice-chat server (port 8888) over Tailscale HTTPS
tailscale serve --bg 8888
```
Access from mobile: `https://your-machine.tail12345.ts.net/`
> **Important:** Do NOT append `:8888` to the Tailscale URL. Tailscale serves HTTPS on port 443 and proxies internally to 8888. Accessing `http://your-machine:8888` directly is plain HTTP and microphone will not work.
**Verify HTTPS is active:**
```bash
curl -sk https://your-machine.tail12345.ts.net/healthz
# Expected: {"ok":true,"port":8888,...}
```
**Stop Tailscale serve:**
```bash
tailscale serve --https=443 off
```
## Environment Variables
... (truncated)
voice
Comments
Sign in to leave a comment