Voice
ClawPilot
Talk to your AI agent through Discord voice. Open-source plugin for OpenClaw.ai.
Install
npm install
npm
Configuration Example
{
"plugins": {
"entries": {
"clawpilot": {
"enabled": true,
"config": {
"discordToken": "YOUR_DISCORD_BOT_TOKEN",
"sttProvider": "whisper-local", // Free! Or "deepgram" for speed
"ttsProvider": "edge", // Free! Or "openai" for quality
"agentName": "bobby", // Say "bobby" anywhere to trigger
"activationMode": "wake_word",
"wakeWords": ["hey claw", "ok claw"]
}
}
}
}
}
README
<div align="center">
```
_____ _ _____ _ _ _
/ ____| | | __ (_) | | |
| | | | __ ___ __| |__) || | ___ | |_
| | | |/ _` \ \ /\ / /| ___/ | |/ _ \| __|
| |____| | (_| |\ V V / | | | | | (_) | |_
\_____|_|\__,_| \_/\_/ |_| |_|_|\___/ \__|
```
### Talk to your AI agent through Discord voice.
[](https://opensource.org/licenses/MIT)
[](https://openclaw.ai)
[](https://nodejs.org)
[](https://www.typescriptlang.org)
[](https://discord.js.org)
[](#)
**Say "Hey Claw" in Discord and your AI agent listens, thinks, and responds โ by voice.**
Send emails, tweet, research, manage tasks โ all hands-free.
[Get Started](#-quick-start) · [Features](#-features) · [Providers](#-providers) · [Configuration](#-configuration-reference)
---
</div>
## What is ClawPilot?
ClawPilot is a **free, open-source** [OpenClaw](https://openclaw.ai) plugin that gives your AI agent a voice. It connects to Discord voice channels and creates a real-time voice pipeline between you and your agent.
This isn't a chatbot. Your OpenClaw agent keeps all its capabilities โ sending emails, browsing the web, writing code, managing tasks โ but now you control it **by speaking**.
```
You speak in Discord
โ Audio captured & decoded (Opus โ PCM 48kHz)
โ Speech-to-Text (Deepgram / Whisper / Whisper Local)
โ Wake word detection ("Hey Claw")
โ Sent to your OpenClaw agent
โ Agent processes & responds
โ Text-to-Speech (OpenAI TTS / Edge TTS)
โ Played back in Discord voice channel
```
## Why ClawPilot?
| | Feature | Description |
|---|---------|-------------|
| **๐** | **Voice-first** | Speak naturally instead of typing. Your agent understands context and nuance. |
| **๐ค** | **Full agent power** | Not a simple voice assistant โ your complete OpenClaw agent with all its skills. |
| **๐ฅ** | **Group calls** | Tracks who's speaking. Multiple people can interact with the agent. |
| **๐ฐ** | **Free or premium** | Run 100% locally for $0, or use cloud providers for blazing speed. Your call. |
| **๐** | **Private** | Self-host with Whisper Local + Edge TTS. Your voice never leaves your machine. |
| **โก** | **Barge-in** | Start speaking and the bot shuts up. Like a good assistant should. |
---
## ๐ Quick Start
### 1. Create a Discord Bot
1. Go to [Discord Developer Portal](https://discord.com/developers/applications)
2. Create a new application โ **Bot** section
3. Copy the bot token (no privileged intents needed)
4. Invite to your server with `bot` + `applications.commands` scopes
5. Bot permissions: **View Channels**, **Send Messages**, **Use Slash Commands**, **Connect**, **Speak**
### 2. Install
```bash
git clone https://github.com/CryptoManiaques/ClawPilot.git
cd ClawPilot
npm install
npm run build
```
### 3. Configure
Add to your OpenClaw config (`~/.openclaw/openclaw.json`):
```jsonc
{
"plugins": {
"entries": {
"clawpilot": {
"enabled": true,
"config": {
"discordToken": "YOUR_DISCORD_BOT_TOKEN",
"sttProvider": "whisper-local", // Free! Or "deepgram" for speed
"ttsProvider": "edge", // Free! Or "openai" for quality
"agentName": "bobby", // Say "bobby" anywhere to trigger
"activationMode": "wake_word",
"wakeWords": ["hey claw", "ok claw"]
}
}
}
}
}
```
### 4. Talk
Join a Discord voice channel and say:
> **"Bobby, email John about the meeting tomorrow"**
Or mention the name anywhere in your sentence:
> **"Can you help me bobby?"**
Your agent handles the rest.
---
## ๐ Providers
Mix and match STT and TTS engines. Go fully free or pay for speed โ your choice.
### Speech-to-Text
| Provider | Latency | Cost | Best for |
|----------|---------|------|----------|
| **Deepgram** Nova-3 | ~200ms (streaming) | ~$0.05/h | Real-time conversations |
| **Whisper** API | ~2-3s (batch) | ~$0.006/min | Good balance |
| **Whisper Local** (whisper.cpp) | ~3-5s (batch) | **Free** | Privacy & zero cost |
### Text-to-Speech
| Provider | Latency | Cost | Best for |
|----------|---------|------|----------|
| **OpenAI** gpt-4o-mini-tts | ~300ms | ~$0.003/h | Natural sounding voices |
| **Edge TTS** (Microsoft) | ~700ms | **Free** | Zero cost, decent quality |
### Example setups
<details>
<summary><b>๐ Fully free</b> โ $0/month, runs offline</summary>
```json
{
"sttProvider": "whisper-local",
"ttsProvider": "edge",
"agentName": "bobby",
"edgeTtsVoice": "en-US-AriaNeural"
}
```
Requires: [whisper.cpp](https://github.com/ggerganov/whisper.cpp) + `pip install edge-tts` + `ffmpeg`
</details>
<details>
<summary><b>โก Balanced</b> โ ~$10/month, fast STT + free TTS</summary>
```json
{
"sttProvider": "deepgram",
"deepgramApiKey": "YOUR_KEY",
"ttsProvider": "edge",
"agentName": "bobby"
}
```
</details>
<details>
<summary><b>๐ Premium</b> โ ~$50/month, fastest everything</summary>
```json
{
"sttProvider": "deepgram",
"deepgramApiKey": "YOUR_KEY",
"ttsProvider": "openai",
"openaiApiKey": "YOUR_KEY",
"ttsVoice": "nova",
"agentName": "bobby"
}
```
</details>
---
## ๐ฎ Slash Commands
| Command | Description |
|---------|-------------|
| `/join` | Bot joins your voice channel |
| `/leave` | Bot leaves the voice channel |
| `/mode wake_word` | Activate with "Hey Claw" (default) |
| `/mode always_active` | Listen to everything โ no trigger needed |
| `/status` | Connection info, providers, active speakers, uptime |
---
## ๐ฃ Activation Modes
### Agent name (recommended)
Set `agentName` in config (e.g. `"bobby"`). Say the name **anywhere** in a sentence:
- "**Bobby**, what's the weather?"
- "Can you help me **bobby**?"
- "Send an email **bobby** to John about the meeting"
The name is automatically removed from the text before sending to your agent.
### Wake word (prefix)
Set `wakeWords` (e.g. `["hey claw"]`). Must be at the **start** of the sentence:
- "**Hey Claw**, what's the weather?"
- "**Ok Claw**, send an email to John"
### Always active
Set `activationMode` to `"always_active"`. Listens to **everything** โ no trigger needed. Best for solo use.
> Both `agentName` and `wakeWords` work together. You can use either to trigger the agent.
---
## โจ Features
- **๐ฃ Agent name activation** โ say the agent's name anywhere in a sentence to trigger it
- **๐ค Wake word activation** โ configurable prefix phrases ("hey claw", "ok claw", or anything you want)
- **๐ Always-active mode** โ listens to everything, no trigger needed
- **๐ Barge-in** โ interrupt the bot mid-sentence by speaking
- **๐ฅ Group mode** โ tracks multiple speakers with `[Name]:` attribution
- **๐ Auto-reconnect** โ recovers from Discord/provider disconnects automatically
- **โ๏ธ Fully configurable** โ voices, models, languages, activation timeout, and more
---
## ๐ Configuration Reference
<details>
<summary><b>Click to expand full config options</b></summary>
| Key | Default | Description |
|-----|---------|-------------|
| `discordToken` | โ | Discord bot token **(required)** |
| `guildId` | auto-detect | Server ID |
| `voiceChannelId` | โ | Auto-join channel on startup |
| | | |
| **STT** | | |
| `sttProvider` | `"deepgram"` | `"deepgram"`, `"whisper"`, or `"whisper-local"` |
| `deepgramApiKey` | โ | Required if sttProvider = deepgram |
| `deepgramModel` | `"nova-3"` | Deepgram model |
| `deepgramLanguage` | `"en-US"` | Language code |
| `openaiApiKey` | โ | Required if ttsProvider = openai or sttProvider = whisper |
| `whisperModel` | `"base"` | Model size: tiny, base, small, medium, large |
| `whisperBin` | `"whisper-cpp"` | Path to whisper.cpp binary |
| | | |
| **TTS** | | |
| `ttsProvider` | `"openai"` | `"openai"` or `"edge"` |
| `ttsModel` | `"gpt-4o-mini-tts"` | OpenAI TTS model |
| `ttsVoice` | `"nova"` | OpenAI voice (alloy, echo, fable, onyx, nova, shimmer) |
| `ttsSpeed` | `1.0` | Speech speed (0.25 - 4.0) |
| `edgeTtsVoice` | `"en-US-AriaNeural"` | Edge TTS voice name |
| | | |
| **Activation** | | |
| `agentName` | โ | Agent name for activation anywhere in sentence (e.g. `"bobby"`) |
| `activationMode` | `"wake_word"` | `"wake_word"` or `"always_active"` |
| `wakeWords` | `["hey claw", "ok claw"]` | Prefix trigger phrases |
| `activationDurationMs` | `30000` | Follow-up window after trigger (ms) |
| | | |
| **Behavior** | | |
| `enableBargeIn` | `true` | Interrupt bot when user speaks |
| `groupMode` | `false` | Track multiple speakers |
| `maxConcurrentSpeakers` | `3` | Max simultaneous speakers |
| `agentId` | `"main"` | OpenClaw agent to route messages to |
</details>
---
## ๐ Architecture
```
src/
โโโ index.ts # OpenClaw plugin entry point
โโโ config-schema.ts # Configuration schema (TypeBox)
โโโ types.ts # TypeScript interfaces
โโโ providers.ts # STT/TTS provider factory
โ
โโโ discord/
โ โโโ bot.ts # Discord client lifecycle
โ โโโ voice-connection.ts # Voice channel join/leave/reconnect
โ โโโ commands.ts # Slash commands (/join, /leave, /mode, /status)
โ
โโโ audio/
โ โโโ audio-receiver.ts # Per-user Opus โ PCM capture
โ โโโ audio-player.ts # Queue-based Discord playback
โ โโโ stereo-to-mono.ts # Stereo โ Mono transform stream
โ
โโโ stt/
โ โโโ deepgram-client.ts # Streaming WebSocket STT
โ โโโ whisper-client.ts # OpenAI Whisper API (batch)
โ โโโ whisper-local-client.ts # wh
... (truncated)
voice
Comments
Sign in to leave a comment