Plugin Doubao Speech

Name: Plugin Doubao Speech
Rating: 3.5 (1 reviews)
Author: wfpaopao

By wfpaopao 👁 143 views ▲ 0 votes

Doubao Speech (TTS + ASR) plugin for OpenClaw

GitHub

Configuration Example

{
  "messages": {
    "tts": {
      "provider": "doubao-speech"
    }
  },
  "models": {
    "providers": {
      "doubao-speech": {
        "apiKey": "YOUR_VOLCENGINE_API_KEY",
        "baseUrl": "https://openspeech.bytedance.com",
        "models": [
          { "id": "doubao-tts-v2", "name": "Doubao TTS v2" },
          { "id": "bigmodel", "name": "Doubao ASR bigmodel" }
        ]
      }
    }
  },
  "plugins": {
    "entries": {
      "doubao-speech": {
        "enabled": true,
        "config": {
          "asrLanguage": "zh-CN",
          "ttsVoiceType": "zh_female_vv_uranus_bigtts"
        }
      }
    }
  }
}

README

# Doubao Speech Plugin for OpenClaw

[English](./README.md) | [中文](./README.zh-CN.md)

Doubao Speech extension for OpenClaw with:
- TTS: Doubao Speech 2.0 WebSocket API
- ASR: Doubao streaming ASR 2.0 (optimized bidirectional stream)

Designed for production usability with minimal user-facing config.

## Features

- Minimal config surface for end users
- Built-in stable defaults for ASR/TTS internals
- WebSocket-based TTS and ASR (no third-party file upload bridge)

## Prerequisites

1. Enable Doubao Speech service in Volcengine console:
   - [https://console.volcengine.com/speech/new/overview](https://console.volcengine.com/speech/new/overview)
2. Create or retrieve your API Key.

## Installation

If publishing as an OpenClaw plugin package, install it with your normal plugin flow.

For local development:

1. Place plugin files under:
   - `~/.openclaw/extensions/doubao-speech/`
2. Restart gateway:
   - `openclaw gateway restart`

## Configuration

User-facing options (plugin config):

- `asrLanguage` (optional, default `zh-CN`)
- `ttsVoiceType` (optional, default `zh_female_vv_uranus_bigtts`)

Note:
- Declare `apiKey` in one place only: `models.providers.doubao-speech.apiKey`.
- Keep a minimal `models.providers.doubao-speech` entry in `openclaw.json` for OpenClaw validation/runtime wiring.
- `baseUrl` and `models` in that block are required by framework schema (wiring metadata), even if plugin runtime does not directly use them.

### Example (`openclaw.json`)

```json
{
  "messages": {
    "tts": {
      "provider": "doubao-speech"
    }
  },
  "models": {
    "providers": {
      "doubao-speech": {
        "apiKey": "YOUR_VOLCENGINE_API_KEY",
        "baseUrl": "https://openspeech.bytedance.com",
        "models": [
          { "id": "doubao-tts-v2", "name": "Doubao TTS v2" },
          { "id": "bigmodel", "name": "Doubao ASR bigmodel" }
        ]
      }
    }
  },
  "plugins": {
    "entries": {
      "doubao-speech": {
        "enabled": true,
        "config": {
          "asrLanguage": "zh-CN",
          "ttsVoiceType": "zh_female_vv_uranus_bigtts"
        }
      }
    }
  }
}
```

## Supported Option Values

### `asrLanguage`

`asrLanguage` is an optional language hint sent in ASR request metadata.

Common values:
- `zh-CN` (Mandarin Chinese)
- `en-US` (English)
- `ja-JP` (Japanese)
- `yue-CN` (Cantonese)
- `ko-KR` (Korean)
- `fr-FR` (French)
- `de-DE` (German)
- `es-MX` (Spanish)
- `ru-RU` (Russian)

Notes:
- In Doubao streaming ASR docs, `audio.language` is documented as supported in `bigmodel_nostream`.
- This plugin keeps `asrLanguage` as a simple user-facing hint for better usability.
- If you are unsure, use `zh-CN` first and then tune based on your real audio language.

Detailed reference:
- [Doubao 流式语音识别 WebSocket 文档](https://www.volcengine.com/docs/6561/1354869?lang=zh)

### `ttsVoiceType`

Use official Doubao voice list for the latest supported `voice_type` values:
- [Doubao TTS 音色列表](https://www.volcengine.com/docs/6561/1257544?lang=zh)

## Validation

Recommended smoke tests:

- `openclaw config validate`
- `openclaw infer tts convert --text "hello" --json`
- `openclaw infer audio transcribe --file /path/to/audio.mp3 --json`

## Troubleshooting

- TTS/ASR request unauthorized
  - Check `models.providers.doubao-speech.apiKey` and service activation status in Volcengine console.

- Empty or poor ASR result
  - Confirm audio quality, language setting (`asrLanguage`), and input format.

- Voice mismatch or TTS failure
  - Try another supported `ttsVoiceType` for `seed-tts-2.0`.

## Compatibility

- OpenClaw: tested on 2026.4.x
- Doubao Speech APIs:
  - TTS v3 WebSocket unidirectional stream
  - ASR v3 streaming (`bigmodel_async`)

tools