Voice

Whisper Dict

Name: Whisper Dict
Rating: 3.5 (1 reviews)
Author: kandotrun

By kandotrun 👁 79 views ▲ 0 votes

OpenClaw plugin that dynamically improves Whisper transcription accuracy by having the LLM extract vocabulary from conversations and update the dictionary automatically.

GitHub

Configuration Example

{
  "plugins": {
    "load": {
      "paths": ["/path/to/openclaw-whisper-dict"]
    }
  }
}

README

# openclaw-whisper-dict

OpenClaw plugin that **dynamically improves Whisper transcription accuracy** by leveraging the existing LLM — zero additional API calls.

When you send voice messages, the LLM (e.g., Claude Opus) naturally extracts new vocabulary from the transcript and adds it to Whisper's dictionary. The dictionary grows smarter with every conversation.

## How it works

```
Voice message
     │
     ▼
┌─────────────────────────┐
│ Whisper (faster-whisper) │ ← Reads dictionary as initial_prompt
│                         │
│ "OpenClaw, 二宮貫,      │
│  ツクリエ, TRAC, ..."   │
└─────────────────────────┘
     │
     ▼ (transcript)
┌─────────────────────────┐
│ before_prompt_build      │ ← Plugin detects voice transcript
│                         │
│ Injects hint:           │
│ "Extract new vocabulary │
│  and call               │
│  whisper_dict_add"      │
└─────────────────────────┘
     │
     ▼
┌─────────────────────────┐
│ LLM (Claude Opus etc.)  │ ← Analyzes transcript as part of
│                         │   normal response (no extra API call)
│ "I see '寺沢' and       │
│  'TRIPCALL' are new"    │
│                         │
│ → calls whisper_dict_add│
└─────────────────────────┘
     │
     ▼
┌─────────────────────────┐
│ whisper-dictionary.txt   │ ← Updated for next transcription
│                         │
│ 寺沢                     │
│ TRIPCALL                 │
│ ...                      │
└─────────────────────────┘
```

**Key insight**: The LLM is already processing your message. Adding vocabulary extraction costs nothing — it piggybacks on the existing inference.

## Prerequisites

- [OpenClaw](https://github.com/openclaw/openclaw) (2026.3+)
- Whisper-based transcription (faster-whisper, OpenAI Whisper, etc.) with `initial_prompt` support

## Install

### Option A: Copy to extensions

```bash
git clone https://github.com/kandotrun/openclaw-whisper-dict.git
cp -r openclaw-whisper-dict ~/.openclaw/extensions/whisper-dict-auto
```

### Option B: Load from path

```json
{
  "plugins": {
    "load": {
      "paths": ["/path/to/openclaw-whisper-dict"]
    }
  }
}
```

### Enable

```json
{
  "plugins": {
    "entries": {
      "whisper-dict-auto": {
        "enabled": true
      }
    }
  }
}
```

### Update your transcription script

Your Whisper transcription script needs to read from the dictionary file. Add this to your script:

```bash
# Dynamic dictionary (managed by whisper-dict-auto plugin)
DICT_FILE="${HOME}/.openclaw/workspace/whisper-dictionary.txt"
DICT_TERMS=""
if [[ -f "$DICT_FILE" ]]; then
  DICT_TERMS="$(grep -v '^#' "$DICT_FILE" | grep -v '^$' | tr '\n' ',' | sed 's/,$//' | sed 's/,/, /g')"
fi

# Use as initial_prompt
INITIAL_PROMPT="your, base, terms, ${DICT_TERMS}"
```

Then restart:

```bash
openclaw gateway restart
```

## Configuration

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | boolean | `true` | Enable the plugin |
| `dictionaryPath` | string | `~/.openclaw/workspace/whisper-dictionary.txt` | Path to dictionary file |
| `maxTerms` | number | `100` | Max terms in dictionary (Whisper works best under 100) |
| `autoDetect` | boolean | `true` | Auto-detect voice transcripts and prompt LLM |

## Agent Tools

| Tool | Description |
|------|-------------|
| `whisper_dict_add` | Add terms to the dictionary |
| `whisper_dict_remove` | Remove terms from the dictionary |
| `whisper_dict_list` | List current dictionary contents |

## Why max 100 terms?

Whisper's `initial_prompt` is a conditioning hint, not a dictionary. Too many terms can:
- Cause hallucinations (Whisper "hears" words that weren't said)
- Reduce overall accuracy
- Slow down processing

100 terms is the sweet spot for personal vocabulary. The plugin enforces this limit.

## License

MIT

Copyright (c) 2026 Kan Ninomiya & 白川 玲 (AI)

voice