Voice
Voice Client
Voice client for OpenClaw β desktop app + plugin for voice interaction with AI agents
Install
npm install
npm
Configuration Example
{
"plugins": {
"entries": {
"voice-client": {
"config": {
"enabled": true,
"sonioxApiKey": "YOUR_SONIOX_API_KEY_HERE",
"serve": {
"port": 18790,
"bind": "127.0.0.1",
"path": "/voice-client"
},
"profiles": {
"allowed": ["Alice", "Bob"]
}
}
}
}
}
}
README
# OpenClaw Voice Client
[](https://opensource.org/licenses/MIT)
[](https://github.com/megastruktur/openclaw-voice-client)
[](https://github.com/mariozechner/openclaw)
A thin-client desktop application for voice-based interaction with OpenClaw Gateway. Speak naturally to your AI agent with push-to-talk, powered by Soniox speech-to-text.

*Tray-only app with push-to-talk voice input*
## Overview
OpenClaw Voice Client enables voice interaction with OpenClaw Gateway through a lightweight desktop application. The architecture follows a **thin-client principle**: all processing happens on the OpenClaw Gateway, the desktop app is just a UI shell for audio recording.
### Key Features
- π€ **Push-to-Talk Recording** - Hold button or hotkey to record
- π£οΈ **High-Quality Speech Recognition** - Powered by Soniox STT
- π€ **Full Agent Integration** - Complete access to OpenClaw agent tools
- π¬ **Conversation History** - Session-based context tracking
- π **Secure Token Storage** - OS keychain integration
- π― **Tray-Only Interface** - Minimal, always-available UI
- π **Cross-Platform** - macOS, Windows, and Linux support
## Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Desktop Application β
β ββββββββββββββββ Push-to-talk ββββββββββββββββββββββββ β
β β Tray Icon + β ββββββββββββββ> β Audio Recording β β
β β Popup UI β β (MediaRecorder API) β β
β ββββββββββββββββ ββββββββββββββββββββββββ β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β HTTP POST
β /voice-client/audio
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenClaw Gateway β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Voice Client Plugin β β
β β ββββββββββββββββ ββββββββββββββββ βββββββββββ β β
β β β HTTP Server ββ>β Soniox STT ββ>β Agent β β β
β β β /voice-clientβ β Transcriptionβ β Turn β β β
β β ββββββββββββββββ ββββββββββββββββ βββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β β Text Response β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Pi Agent (main) β β
β β Tools: Memory, Calendar, Web Search, etc. β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Thin-Client Principle
**All requests go FROM the desktop app TO OpenClaw. All responses come back FROM OpenClaw.**
- Desktop app only handles UI and audio recording
- OpenClaw plugin handles STT, agent turns, and TTS
- No external API calls from the desktop app
- Session management on the server side
## Requirements
- **OpenClaw Gateway** - v2026.2.21 or later
- **Soniox API Key** - [Sign up at soniox.com](https://soniox.com)
- **Desktop OS** - macOS 10.15+, Windows 10+, or Linux (Ubuntu 20.04+)
## Quick Start
### 1. Install the Plugin
Clone this repository and install the plugin into your OpenClaw Gateway:
```bash
git clone https://github.com/megastruktur/openclaw-voice-client.git
cd openclaw-voice-client
# Install the plugin
openclaw plugins install ./extensions/voice-client
```
### 2. Configure OpenClaw
Add the plugin configuration to your `openclaw.json`:
```json
{
"plugins": {
"entries": {
"voice-client": {
"config": {
"enabled": true,
"sonioxApiKey": "YOUR_SONIOX_API_KEY_HERE",
"serve": {
"port": 18790,
"bind": "127.0.0.1",
"path": "/voice-client"
},
"profiles": {
"allowed": ["Alice", "Bob"]
}
}
}
}
}
}
```
**β οΈ Important**: Replace `YOUR_SONIOX_API_KEY_HERE` with your actual Soniox API key.
Restart OpenClaw Gateway:
```bash
openclaw restart
```
Verify the plugin is running:
```bash
curl http://127.0.0.1:18790/voice-client/profiles
```
### 3. Download and Run the Desktop App
**Option A: Download Pre-built Release**
Download the latest release for your platform:
π **[Download from Releases](https://github.com/megastruktur/openclaw-voice-client/releases)**
- **macOS**: `OpenClaw-Voice-{version}.dmg`
- **Windows**: `OpenClaw-Voice-Setup-{version}.exe`
- **Linux**: `OpenClaw-Voice-{version}.AppImage`
**Option B: Build from Source**
```bash
cd clients/voice-client-desktop
npm install
npm run build
# The built app will be in release/
```
**First Run Setup:**
1. Launch the app (tray icon appears)
2. Click the tray icon β Open Settings
3. Configure:
- **Gateway URL**: `http://127.0.0.1:18790/voice-client`
- **Profile Name**: Your name (must be in `profiles.allowed`)
4. Test Connection
5. Save
## Usage
### Creating a Session
1. Click the tray icon to open the popup
2. Click **"New Session"**
3. Session ID appears at the bottom
### Voice Input
**Method 1: Mouse Button (Push-to-Talk)**
1. **Hold** the microphone button
2. Speak your message
3. **Release** to send
**Method 2: Hotkey (Global)**
1. Configure hotkey in Settings (e.g., `Ctrl+Space`)
2. Press and **hold** the hotkey anywhere
3. Speak your message
4. **Release** to send
### Viewing Responses
- Transcription appears instantly
- Agent response shows below
- Last exchange is saved in the popup
## Configuration Reference
### Plugin Configuration
All settings are configured in `openclaw.json` under `plugins.entries.voice-client.config`:
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `enabled` | boolean | `true` | Enable/disable the plugin |
| `sonioxApiKey` | string | **required** | Your Soniox API key |
| `serve.port` | number | `18790` | HTTP server port |
| `serve.bind` | string | `"127.0.0.1"` | Bind address (`"0.0.0.0"` for network access) |
| `serve.path` | string | `"/voice-client"` | Base path for endpoints |
| `profiles.allowed` | string[] | `[]` | List of allowed profile names |
### Desktop App Settings
Settings are stored securely using OS keychain:
- **Gateway URL** - HTTP endpoint of the plugin (e.g., `http://127.0.0.1:18790/voice-client`)
- **Token** - Optional authentication token (encrypted in OS keychain)
- **Profile Name** - Your name (must match plugin's `profiles.allowed`)
- **Microphone Device** - Audio input device
- **Push-to-Talk Hotkey** - Global keyboard shortcut (e.g., `Ctrl+Space`, `Alt+T`)
### Example Configurations
**Local Development:**
```json
{
"enabled": true,
"sonioxApiKey": "sk_live_...",
"serve": {
"port": 18790,
"bind": "127.0.0.1"
},
"profiles": {
"allowed": ["Peter"]
}
}
```
**Family Setup:**
```json
{
"enabled": true,
"sonioxApiKey": "sk_live_...",
"serve": {
"port": 18790,
"bind": "0.0.0.0"
},
"profiles": {
"allowed": ["Peter", "Olga", "Kids"]
}
}
```
**Multi-Machine (Tailscale):**
```json
{
"enabled": true,
"sonioxApiKey": "sk_live_...",
"serve": {
"port": 18790,
"bind": "0.0.0.0"
},
"profiles": {
"allowed": ["Peter", "Laptop", "Desktop"]
}
}
```
## API Endpoints
The plugin exposes these HTTP endpoints:
| Endpoint | Method | Description |
|----------|--------|-------------|
| `GET /voice-client/profiles` | GET | List allowed profiles |
| `POST /voice-client/session/new` | POST | Create new session |
| `GET /voice-client/session?id=<id>` | GET | Get session info |
| `POST /voice-client/audio?sessionId=<id>` | POST | Send audio for processing |
### Example: Send Audio
```bash
# Create session
SESSION_ID=$(curl -X POST http://127.0.0.1:18790/voice-client/session/new \
-H "Content-Type: application/json" \
-d '{"profileName":"Peter"}' | jq -r .sessionId)
# Send audio
curl -X POST "http://127.0.0.1:18790/voice-client/audio?sessionId=$SESSION_ID" \
-H "X-Profile: Peter" \
-H "Content-Type: audio/wav" \
--data-binary @recording.wav
```
Response:
```json
{
"transcription": {
"text": "What's the weather today?",
"confidence": 0.95
},
"response": {
"text": "Let me check the weather for you..."
}
}
```
## Building from Source
### Plugin
The plugin is written in TypeScript and uses OpenClaw's plugin SDK:
```bash
cd extensions/voice-client
npm install
npm run build
# Install to OpenClaw
openclaw plugins install .
```
### Desktop App
The desktop app uses Electron + React + Vite:
```bash
cd clients/voice-client-desktop
npm install
# Development
npm run dev
# Build for production
npm run build
# Build without installer (faster)
npm run build:dir
```
## Development
### Project Structure
```
openclaw-voice-client/
βββ extensions/voice-client/ # OpenClaw plugin
β βββ index.ts # Plugin entry point
β βββ openclaw.plugin.json # Plugin manifest
β βββ src/
β βββ agent-service.ts # Agent turn integration
β βββ channel.ts # Channel plugin
β βββ http-handler.ts # HTTP server
β βββ session-manager.ts # Session management
β βββ stt-service.ts # Soniox STT
β βββ types.ts # TypeScript types
β
βββ clients/voice-client-desktop/ # Electron app
βββ src/
β βββ main/ # Electron main process
β β βββ index.ts # App lifecycle
β β βββ tray.ts # System tray
β β βββ ipc.ts
... (truncated)
voice
Comments
Sign in to leave a comment