← Back to Plugins
Voice

ClawPilot

CryptoManiaques By CryptoManiaques 👁 204 views ▲ 0 votes

Talk to your AI agent through Discord voice. Open-source plugin for OpenClaw.ai.

GitHub

Install

npm install
npm

Configuration Example

{
  "plugins": {
    "entries": {
      "clawpilot": {
        "enabled": true,
        "config": {
          "discordToken": "YOUR_DISCORD_BOT_TOKEN",
          "sttProvider": "whisper-local",   // Free! Or "deepgram" for speed
          "ttsProvider": "edge",            // Free! Or "openai" for quality
          "agentName": "bobby",            // Say "bobby" anywhere to trigger
          "activationMode": "wake_word",
          "wakeWords": ["hey claw", "ok claw"]
        }
      }
    }
  }
}

README

<div align="center">

```
   _____ _                 _____ _ _       _
  / ____| |               |  __ (_) |     | |
 | |    | | __ ___      __| |__) || | ___ | |_
 | |    | |/ _` \ \ /\ / /|  ___/ | |/ _ \| __|
 | |____| | (_| |\ V  V / | |   | | | (_) | |_
  \_____|_|\__,_| \_/\_/  |_|   |_|_|\___/ \__|
```

### Talk to your AI agent through Discord voice.

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![OpenClaw Plugin](https://img.shields.io/badge/OpenClaw-Plugin-blue)](https://openclaw.ai)
[![Node.js](https://img.shields.io/badge/Node.js-%3E%3D22-green)](https://nodejs.org)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.7-blue)](https://www.typescriptlang.org)
[![Discord.js](https://img.shields.io/badge/Discord.js-v14-5865F2)](https://discord.js.org)
[![100% Free](https://img.shields.io/badge/Cost-Free%20%26%20Open%20Source-brightgreen)](#)

**Say "Hey Claw" in Discord and your AI agent listens, thinks, and responds โ€” by voice.**
Send emails, tweet, research, manage tasks โ€” all hands-free.

[Get Started](#-quick-start) &nbsp;&middot;&nbsp; [Features](#-features) &nbsp;&middot;&nbsp; [Providers](#-providers) &nbsp;&middot;&nbsp; [Configuration](#-configuration-reference)

---

</div>

## What is ClawPilot?

ClawPilot is a **free, open-source** [OpenClaw](https://openclaw.ai) plugin that gives your AI agent a voice. It connects to Discord voice channels and creates a real-time voice pipeline between you and your agent.

This isn't a chatbot. Your OpenClaw agent keeps all its capabilities โ€” sending emails, browsing the web, writing code, managing tasks โ€” but now you control it **by speaking**.

```
  You speak in Discord
    โ†’ Audio captured & decoded (Opus โ†’ PCM 48kHz)
    โ†’ Speech-to-Text (Deepgram / Whisper / Whisper Local)
    โ†’ Wake word detection ("Hey Claw")
    โ†’ Sent to your OpenClaw agent
    โ†’ Agent processes & responds
    โ†’ Text-to-Speech (OpenAI TTS / Edge TTS)
    โ†’ Played back in Discord voice channel
```

## Why ClawPilot?

| | Feature | Description |
|---|---------|-------------|
| **๐ŸŽ™** | **Voice-first** | Speak naturally instead of typing. Your agent understands context and nuance. |
| **๐Ÿค–** | **Full agent power** | Not a simple voice assistant โ€” your complete OpenClaw agent with all its skills. |
| **๐Ÿ‘ฅ** | **Group calls** | Tracks who's speaking. Multiple people can interact with the agent. |
| **๐Ÿ’ฐ** | **Free or premium** | Run 100% locally for $0, or use cloud providers for blazing speed. Your call. |
| **๐Ÿ”’** | **Private** | Self-host with Whisper Local + Edge TTS. Your voice never leaves your machine. |
| **โšก** | **Barge-in** | Start speaking and the bot shuts up. Like a good assistant should. |

---

## ๐Ÿš€ Quick Start

### 1. Create a Discord Bot

1. Go to [Discord Developer Portal](https://discord.com/developers/applications)
2. Create a new application โ†’ **Bot** section
3. Copy the bot token (no privileged intents needed)
4. Invite to your server with `bot` + `applications.commands` scopes
5. Bot permissions: **View Channels**, **Send Messages**, **Use Slash Commands**, **Connect**, **Speak**

### 2. Install

```bash
git clone https://github.com/CryptoManiaques/ClawPilot.git
cd ClawPilot
npm install
npm run build
```

### 3. Configure

Add to your OpenClaw config (`~/.openclaw/openclaw.json`):

```jsonc
{
  "plugins": {
    "entries": {
      "clawpilot": {
        "enabled": true,
        "config": {
          "discordToken": "YOUR_DISCORD_BOT_TOKEN",
          "sttProvider": "whisper-local",   // Free! Or "deepgram" for speed
          "ttsProvider": "edge",            // Free! Or "openai" for quality
          "agentName": "bobby",            // Say "bobby" anywhere to trigger
          "activationMode": "wake_word",
          "wakeWords": ["hey claw", "ok claw"]
        }
      }
    }
  }
}
```

### 4. Talk

Join a Discord voice channel and say:

> **"Bobby, email John about the meeting tomorrow"**

Or mention the name anywhere in your sentence:

> **"Can you help me bobby?"**

Your agent handles the rest.

---

## ๐ŸŽ› Providers

Mix and match STT and TTS engines. Go fully free or pay for speed โ€” your choice.

### Speech-to-Text

| Provider | Latency | Cost | Best for |
|----------|---------|------|----------|
| **Deepgram** Nova-3 | ~200ms (streaming) | ~$0.05/h | Real-time conversations |
| **Whisper** API | ~2-3s (batch) | ~$0.006/min | Good balance |
| **Whisper Local** (whisper.cpp) | ~3-5s (batch) | **Free** | Privacy & zero cost |

### Text-to-Speech

| Provider | Latency | Cost | Best for |
|----------|---------|------|----------|
| **OpenAI** gpt-4o-mini-tts | ~300ms | ~$0.003/h | Natural sounding voices |
| **Edge TTS** (Microsoft) | ~700ms | **Free** | Zero cost, decent quality |

### Example setups

<details>
<summary><b>๐Ÿ’š Fully free</b> โ€” $0/month, runs offline</summary>

```json
{
  "sttProvider": "whisper-local",
  "ttsProvider": "edge",
  "agentName": "bobby",
  "edgeTtsVoice": "en-US-AriaNeural"
}
```
Requires: [whisper.cpp](https://github.com/ggerganov/whisper.cpp) + `pip install edge-tts` + `ffmpeg`
</details>

<details>
<summary><b>โšก Balanced</b> โ€” ~$10/month, fast STT + free TTS</summary>

```json
{
  "sttProvider": "deepgram",
  "deepgramApiKey": "YOUR_KEY",
  "ttsProvider": "edge",
  "agentName": "bobby"
}
```
</details>

<details>
<summary><b>๐Ÿš€ Premium</b> โ€” ~$50/month, fastest everything</summary>

```json
{
  "sttProvider": "deepgram",
  "deepgramApiKey": "YOUR_KEY",
  "ttsProvider": "openai",
  "openaiApiKey": "YOUR_KEY",
  "ttsVoice": "nova",
  "agentName": "bobby"
}
```
</details>

---

## ๐ŸŽฎ Slash Commands

| Command | Description |
|---------|-------------|
| `/join` | Bot joins your voice channel |
| `/leave` | Bot leaves the voice channel |
| `/mode wake_word` | Activate with "Hey Claw" (default) |
| `/mode always_active` | Listen to everything โ€” no trigger needed |
| `/status` | Connection info, providers, active speakers, uptime |

---

## ๐Ÿ—ฃ Activation Modes

### Agent name (recommended)

Set `agentName` in config (e.g. `"bobby"`). Say the name **anywhere** in a sentence:

- "**Bobby**, what's the weather?"
- "Can you help me **bobby**?"
- "Send an email **bobby** to John about the meeting"

The name is automatically removed from the text before sending to your agent.

### Wake word (prefix)

Set `wakeWords` (e.g. `["hey claw"]`). Must be at the **start** of the sentence:

- "**Hey Claw**, what's the weather?"
- "**Ok Claw**, send an email to John"

### Always active

Set `activationMode` to `"always_active"`. Listens to **everything** โ€” no trigger needed. Best for solo use.

> Both `agentName` and `wakeWords` work together. You can use either to trigger the agent.

---

## โœจ Features

- **๐Ÿ—ฃ Agent name activation** โ€” say the agent's name anywhere in a sentence to trigger it
- **๐ŸŽค Wake word activation** โ€” configurable prefix phrases ("hey claw", "ok claw", or anything you want)
- **๐Ÿ‘‚ Always-active mode** โ€” listens to everything, no trigger needed
- **๐Ÿ›‘ Barge-in** โ€” interrupt the bot mid-sentence by speaking
- **๐Ÿ‘ฅ Group mode** โ€” tracks multiple speakers with `[Name]:` attribution
- **๐Ÿ”„ Auto-reconnect** โ€” recovers from Discord/provider disconnects automatically
- **โš™๏ธ Fully configurable** โ€” voices, models, languages, activation timeout, and more

---

## ๐Ÿ“‹ Configuration Reference

<details>
<summary><b>Click to expand full config options</b></summary>

| Key | Default | Description |
|-----|---------|-------------|
| `discordToken` | โ€” | Discord bot token **(required)** |
| `guildId` | auto-detect | Server ID |
| `voiceChannelId` | โ€” | Auto-join channel on startup |
| | | |
| **STT** | | |
| `sttProvider` | `"deepgram"` | `"deepgram"`, `"whisper"`, or `"whisper-local"` |
| `deepgramApiKey` | โ€” | Required if sttProvider = deepgram |
| `deepgramModel` | `"nova-3"` | Deepgram model |
| `deepgramLanguage` | `"en-US"` | Language code |
| `openaiApiKey` | โ€” | Required if ttsProvider = openai or sttProvider = whisper |
| `whisperModel` | `"base"` | Model size: tiny, base, small, medium, large |
| `whisperBin` | `"whisper-cpp"` | Path to whisper.cpp binary |
| | | |
| **TTS** | | |
| `ttsProvider` | `"openai"` | `"openai"` or `"edge"` |
| `ttsModel` | `"gpt-4o-mini-tts"` | OpenAI TTS model |
| `ttsVoice` | `"nova"` | OpenAI voice (alloy, echo, fable, onyx, nova, shimmer) |
| `ttsSpeed` | `1.0` | Speech speed (0.25 - 4.0) |
| `edgeTtsVoice` | `"en-US-AriaNeural"` | Edge TTS voice name |
| | | |
| **Activation** | | |
| `agentName` | โ€” | Agent name for activation anywhere in sentence (e.g. `"bobby"`) |
| `activationMode` | `"wake_word"` | `"wake_word"` or `"always_active"` |
| `wakeWords` | `["hey claw", "ok claw"]` | Prefix trigger phrases |
| `activationDurationMs` | `30000` | Follow-up window after trigger (ms) |
| | | |
| **Behavior** | | |
| `enableBargeIn` | `true` | Interrupt bot when user speaks |
| `groupMode` | `false` | Track multiple speakers |
| `maxConcurrentSpeakers` | `3` | Max simultaneous speakers |
| `agentId` | `"main"` | OpenClaw agent to route messages to |

</details>

---

## ๐Ÿ— Architecture

```
src/
โ”œโ”€โ”€ index.ts                      # OpenClaw plugin entry point
โ”œโ”€โ”€ config-schema.ts              # Configuration schema (TypeBox)
โ”œโ”€โ”€ types.ts                      # TypeScript interfaces
โ”œโ”€โ”€ providers.ts                  # STT/TTS provider factory
โ”‚
โ”œโ”€โ”€ discord/
โ”‚   โ”œโ”€โ”€ bot.ts                    # Discord client lifecycle
โ”‚   โ”œโ”€โ”€ voice-connection.ts       # Voice channel join/leave/reconnect
โ”‚   โ””โ”€โ”€ commands.ts               # Slash commands (/join, /leave, /mode, /status)
โ”‚
โ”œโ”€โ”€ audio/
โ”‚   โ”œโ”€โ”€ audio-receiver.ts         # Per-user Opus โ†’ PCM capture
โ”‚   โ”œโ”€โ”€ audio-player.ts           # Queue-based Discord playback
โ”‚   โ””โ”€โ”€ stereo-to-mono.ts         # Stereo โ†’ Mono transform stream
โ”‚
โ”œโ”€โ”€ stt/
โ”‚   โ”œโ”€โ”€ deepgram-client.ts        # Streaming WebSocket STT
โ”‚   โ”œโ”€โ”€ whisper-client.ts         # OpenAI Whisper API (batch)
โ”‚   โ”œโ”€โ”€ whisper-local-client.ts   # wh

... (truncated)
voice

Comments

Sign in to leave a comment

Loading comments...