← Back to Plugins
Tools

Minicpm Tts

yoloshii By yoloshii 👁 5 views ▲ 0 votes

Local TTS plugin for OpenClaw using MiniCPM-o 4.5 GGUF forced token decoding via llama.cpp-omni

GitHub

Configuration Example

{
  "messages": {
    "tts": {
      "provider": "minicpm",
      "minicpm": {
        "endpoint": "http://your-gpu-host:8087",
        "defaultVoice": "default",
        "timeoutMs": 120000,
        "voices": {
          "myvoice": "/path/to/voice_refs/sample.wav"
        }
      }
    }
  }
}

README

# openclaw-minicpm-tts

Local TTS plugin for OpenClaw using MiniCPM-o 4.5 GGUF forced token decoding via llama.cpp-omni.

## Features

- Voice cloning from reference WAV files (5-15s of clear speech)
- Opus output for Telegram voice message compatibility
- Named voice aliases with server-side resolution
- Gateway methods for direct API access (synthesize, health, voices)
- 10.4 GB VRAM (Q5_K_M quantization) vs 18 GB bfloat16

## Architecture

```
OpenClaw ──POST──▶ GGUF TTS API (:8087) ──POST──▶ llama.cpp-omni (:8085)
                   FastAPI proxy                   Forced token decoding
                   WAV concat + opus               tokenize → hidden states → TTS vocoder
                   Voice resolution                WAV chunks on disk
```

~31s per sentence (TTS/T2W vocoder pipeline bottleneck).

## Configuration

### Core TTS Provider

The plugin integrates as a core TTS provider (`"minicpm"`) in OpenClaw's TTS engine.

**openclaw.json:**
```json
{
  "messages": {
    "tts": {
      "provider": "minicpm",
      "minicpm": {
        "endpoint": "http://your-gpu-host:8087",
        "defaultVoice": "default",
        "timeoutMs": 120000,
        "voices": {
          "myvoice": "/path/to/voice_refs/sample.wav"
        }
      }
    }
  }
}
```

**Telegram commands:**
```
/tts provider minicpm    # Switch to local GGUF TTS
/tts status              # Verify provider active
/tts audio Hello world   # Synthesize and send as voice message
```

### Extension Plugin

**openclaw.json:**
```json
{
  "plugins": {
    "entries": {
      "minicpm-tts": {
        "enabled": true,
        "config": {
          "endpoint": "http://your-gpu-host:8087",
          "defaultVoice": "default",
          "format": "opus",
          "timeoutMs": 120000
        }
      }
    }
  }
}
```

## Gateway Methods

### `minicpm.synthesize`

Returns base64-encoded audio.

```json
{
  "text": "Hello, how are you today?",
  "voice": "default"
}
```

Response:
```json
{
  "success": true,
  "format": "opus",
  "mimeType": "audio/ogg",
  "audioBase64": "T2dnUwAC...",
  "audioSize": 2271
}
```

### `minicpm.health`

```json
{
  "healthy": true,
  "endpoint": "http://your-gpu-host:8087",
  "format": "opus",
  "defaultVoice": "default"
}
```

### `minicpm.voices`

Lists available voice references and their server-side paths.

## Prerequisites

- GGUF TTS API running on target host (FastAPI proxy on port 8087 + llama.cpp-omni on port 8085)
- MiniCPM-o-4_5-Q5_K_M.gguf model loaded
- ffmpeg with libopus on the API host (for opus output)

## File Structure

```
├── openclaw.plugin.json   # Plugin manifest and config schema
├── index.ts               # Plugin entry point (gateway methods + service lifecycle)
├── package.json           # Plugin metadata
├── README.md              # This file
└── src/
    ├── config.ts          # Configuration types and defaults
    └── provider.ts        # GGUF TTS API client (synthesize, health, voices)
```

## License

MIT
tools

Comments

Sign in to leave a comment

Loading comments...