← Back to Plugins
Voice

Voice Client

megastruktur By megastruktur 👁 94 views ▲ 0 votes

Voice client for OpenClaw β€” desktop app + plugin for voice interaction with AI agents

GitHub

Install

npm install
npm

Configuration Example

{
  "plugins": {
    "entries": {
      "voice-client": {
        "config": {
          "enabled": true,
          "sonioxApiKey": "YOUR_SONIOX_API_KEY_HERE",
          "serve": {
            "port": 18790,
            "bind": "127.0.0.1",
            "path": "/voice-client"
          },
          "profiles": {
            "allowed": ["Alice", "Bob"]
          }
        }
      }
    }
  }
}

README

# OpenClaw Voice Client

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Windows%20%7C%20Linux-blue)](https://github.com/megastruktur/openclaw-voice-client)
[![OpenClaw](https://img.shields.io/badge/OpenClaw-Gateway-green)](https://github.com/mariozechner/openclaw)

A thin-client desktop application for voice-based interaction with OpenClaw Gateway. Speak naturally to your AI agent with push-to-talk, powered by Soniox speech-to-text.

![OpenClaw Voice Client Screenshot](docs/screenshot.png)
*Tray-only app with push-to-talk voice input*

## Overview

OpenClaw Voice Client enables voice interaction with OpenClaw Gateway through a lightweight desktop application. The architecture follows a **thin-client principle**: all processing happens on the OpenClaw Gateway, the desktop app is just a UI shell for audio recording.

### Key Features

- 🎀 **Push-to-Talk Recording** - Hold button or hotkey to record
- πŸ—£οΈ **High-Quality Speech Recognition** - Powered by Soniox STT
- πŸ€– **Full Agent Integration** - Complete access to OpenClaw agent tools
- πŸ’¬ **Conversation History** - Session-based context tracking
- πŸ” **Secure Token Storage** - OS keychain integration
- 🎯 **Tray-Only Interface** - Minimal, always-available UI
- 🌍 **Cross-Platform** - macOS, Windows, and Linux support

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Desktop Application                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  Push-to-talk  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Tray Icon +  β”‚ ──────────────> β”‚ Audio Recording      β”‚  β”‚
β”‚  β”‚ Popup UI     β”‚                 β”‚ (MediaRecorder API)  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚ HTTP POST
                          β”‚ /voice-client/audio
                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    OpenClaw Gateway                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚           Voice Client Plugin                      β”‚     β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚     β”‚
β”‚  β”‚  β”‚ HTTP Server  │─>β”‚ Soniox STT   │─>β”‚ Agent   β”‚ β”‚     β”‚
β”‚  β”‚  β”‚ /voice-clientβ”‚  β”‚ Transcriptionβ”‚  β”‚ Turn    β”‚ β”‚     β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                          β”‚                                   β”‚
β”‚                          β”‚ Text Response                     β”‚
β”‚                          β–Ό                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚           Pi Agent (main)                          β”‚     β”‚
β”‚  β”‚  Tools: Memory, Calendar, Web Search, etc.        β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Thin-Client Principle

**All requests go FROM the desktop app TO OpenClaw. All responses come back FROM OpenClaw.**

- Desktop app only handles UI and audio recording
- OpenClaw plugin handles STT, agent turns, and TTS
- No external API calls from the desktop app
- Session management on the server side

## Requirements

- **OpenClaw Gateway** - v2026.2.21 or later
- **Soniox API Key** - [Sign up at soniox.com](https://soniox.com)
- **Desktop OS** - macOS 10.15+, Windows 10+, or Linux (Ubuntu 20.04+)

## Quick Start

### 1. Install the Plugin

Clone this repository and install the plugin into your OpenClaw Gateway:

```bash
git clone https://github.com/megastruktur/openclaw-voice-client.git
cd openclaw-voice-client

# Install the plugin
openclaw plugins install ./extensions/voice-client
```

### 2. Configure OpenClaw

Add the plugin configuration to your `openclaw.json`:

```json
{
  "plugins": {
    "entries": {
      "voice-client": {
        "config": {
          "enabled": true,
          "sonioxApiKey": "YOUR_SONIOX_API_KEY_HERE",
          "serve": {
            "port": 18790,
            "bind": "127.0.0.1",
            "path": "/voice-client"
          },
          "profiles": {
            "allowed": ["Alice", "Bob"]
          }
        }
      }
    }
  }
}
```

**⚠️ Important**: Replace `YOUR_SONIOX_API_KEY_HERE` with your actual Soniox API key.

Restart OpenClaw Gateway:

```bash
openclaw restart
```

Verify the plugin is running:

```bash
curl http://127.0.0.1:18790/voice-client/profiles
```

### 3. Download and Run the Desktop App

**Option A: Download Pre-built Release**

Download the latest release for your platform:

πŸ‘‰ **[Download from Releases](https://github.com/megastruktur/openclaw-voice-client/releases)**

- **macOS**: `OpenClaw-Voice-{version}.dmg`
- **Windows**: `OpenClaw-Voice-Setup-{version}.exe`
- **Linux**: `OpenClaw-Voice-{version}.AppImage`

**Option B: Build from Source**

```bash
cd clients/voice-client-desktop
npm install
npm run build

# The built app will be in release/
```

**First Run Setup:**

1. Launch the app (tray icon appears)
2. Click the tray icon β†’ Open Settings
3. Configure:
   - **Gateway URL**: `http://127.0.0.1:18790/voice-client`
   - **Profile Name**: Your name (must be in `profiles.allowed`)
4. Test Connection
5. Save

## Usage

### Creating a Session

1. Click the tray icon to open the popup
2. Click **"New Session"**
3. Session ID appears at the bottom

### Voice Input

**Method 1: Mouse Button (Push-to-Talk)**

1. **Hold** the microphone button
2. Speak your message
3. **Release** to send

**Method 2: Hotkey (Global)**

1. Configure hotkey in Settings (e.g., `Ctrl+Space`)
2. Press and **hold** the hotkey anywhere
3. Speak your message
4. **Release** to send

### Viewing Responses

- Transcription appears instantly
- Agent response shows below
- Last exchange is saved in the popup

## Configuration Reference

### Plugin Configuration

All settings are configured in `openclaw.json` under `plugins.entries.voice-client.config`:

| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `enabled` | boolean | `true` | Enable/disable the plugin |
| `sonioxApiKey` | string | **required** | Your Soniox API key |
| `serve.port` | number | `18790` | HTTP server port |
| `serve.bind` | string | `"127.0.0.1"` | Bind address (`"0.0.0.0"` for network access) |
| `serve.path` | string | `"/voice-client"` | Base path for endpoints |
| `profiles.allowed` | string[] | `[]` | List of allowed profile names |

### Desktop App Settings

Settings are stored securely using OS keychain:

- **Gateway URL** - HTTP endpoint of the plugin (e.g., `http://127.0.0.1:18790/voice-client`)
- **Token** - Optional authentication token (encrypted in OS keychain)
- **Profile Name** - Your name (must match plugin's `profiles.allowed`)
- **Microphone Device** - Audio input device
- **Push-to-Talk Hotkey** - Global keyboard shortcut (e.g., `Ctrl+Space`, `Alt+T`)

### Example Configurations

**Local Development:**
```json
{
  "enabled": true,
  "sonioxApiKey": "sk_live_...",
  "serve": {
    "port": 18790,
    "bind": "127.0.0.1"
  },
  "profiles": {
    "allowed": ["Peter"]
  }
}
```

**Family Setup:**
```json
{
  "enabled": true,
  "sonioxApiKey": "sk_live_...",
  "serve": {
    "port": 18790,
    "bind": "0.0.0.0"
  },
  "profiles": {
    "allowed": ["Peter", "Olga", "Kids"]
  }
}
```

**Multi-Machine (Tailscale):**
```json
{
  "enabled": true,
  "sonioxApiKey": "sk_live_...",
  "serve": {
    "port": 18790,
    "bind": "0.0.0.0"
  },
  "profiles": {
    "allowed": ["Peter", "Laptop", "Desktop"]
  }
}
```

## API Endpoints

The plugin exposes these HTTP endpoints:

| Endpoint | Method | Description |
|----------|--------|-------------|
| `GET /voice-client/profiles` | GET | List allowed profiles |
| `POST /voice-client/session/new` | POST | Create new session |
| `GET /voice-client/session?id=<id>` | GET | Get session info |
| `POST /voice-client/audio?sessionId=<id>` | POST | Send audio for processing |

### Example: Send Audio

```bash
# Create session
SESSION_ID=$(curl -X POST http://127.0.0.1:18790/voice-client/session/new \
  -H "Content-Type: application/json" \
  -d '{"profileName":"Peter"}' | jq -r .sessionId)

# Send audio
curl -X POST "http://127.0.0.1:18790/voice-client/audio?sessionId=$SESSION_ID" \
  -H "X-Profile: Peter" \
  -H "Content-Type: audio/wav" \
  --data-binary @recording.wav
```

Response:
```json
{
  "transcription": {
    "text": "What's the weather today?",
    "confidence": 0.95
  },
  "response": {
    "text": "Let me check the weather for you..."
  }
}
```

## Building from Source

### Plugin

The plugin is written in TypeScript and uses OpenClaw's plugin SDK:

```bash
cd extensions/voice-client
npm install
npm run build

# Install to OpenClaw
openclaw plugins install .
```

### Desktop App

The desktop app uses Electron + React + Vite:

```bash
cd clients/voice-client-desktop
npm install

# Development
npm run dev

# Build for production
npm run build

# Build without installer (faster)
npm run build:dir
```

## Development

### Project Structure

```
openclaw-voice-client/
β”œβ”€β”€ extensions/voice-client/       # OpenClaw plugin
β”‚   β”œβ”€β”€ index.ts                   # Plugin entry point
β”‚   β”œβ”€β”€ openclaw.plugin.json       # Plugin manifest
β”‚   └── src/
β”‚       β”œβ”€β”€ agent-service.ts       # Agent turn integration
β”‚       β”œβ”€β”€ channel.ts             # Channel plugin
β”‚       β”œβ”€β”€ http-handler.ts        # HTTP server
β”‚       β”œβ”€β”€ session-manager.ts     # Session management
β”‚       β”œβ”€β”€ stt-service.ts         # Soniox STT
β”‚       └── types.ts               # TypeScript types
β”‚
└── clients/voice-client-desktop/  # Electron app
    β”œβ”€β”€ src/
    β”‚   β”œβ”€β”€ main/                  # Electron main process
    β”‚   β”‚   β”œβ”€β”€ index.ts           # App lifecycle
    β”‚   β”‚   β”œβ”€β”€ tray.ts            # System tray
    β”‚   β”‚   β”œβ”€β”€ ipc.ts   

... (truncated)
voice

Comments

Sign in to leave a comment

Loading comments...