← Back to Plugins
Voice

Feishu Voice Skill

jiuyou-dev By jiuyou-dev 👁 1 views ▲ 0 votes

OpenClaw skill plugin for generating voice messages via Feishu using ChatTTS and RVC voice conversion

GitHub

Install

pip install torch

README

# Feishu Voice Skill

## ้ฃžไนฆ่ฏญ้ŸณๆŠ€่ƒฝ / ้ฃžไนฆ่ฏญ้ŸณๆŠ€่ƒฝ

> ไธ€ๆฌพๅŸบไบŽ OpenClaw ็š„้ฃžไนฆ่ฏญ้Ÿณๆถˆๆฏ็”ŸๆˆๆŠ€่ƒฝ๏ผŒ่žๅˆ ChatTTS ไธŽ RVC ่ฏญ้Ÿณ่ฝฌๆขๆŠ€ๆœฏ๏ผŒไธบๆ‚จๆ‰“้€ ๆธฉๅ’Œ่‡ช็„ถใ€ไบฒๅˆ‡ๆธฉๆš–็š„ๅฃฐ้Ÿณไฝ“้ชŒใ€‚

> A Feishu (Lark) voice message generation skill for OpenClaw, combining ChatTTS and RVC voice conversion technologies to deliver warm, natural, and friendly voice experiences.

---

## ็›ฎๅฝ• / Table of Contents

- [ไป“ๅบ“ไฟกๆฏ / Repository](#ไป“ๅบ“ไฟกๆฏ-repository)
- [้กน็›ฎไป‹็ป / Introduction](#้กน็›ฎไป‹็ป-introduction)
- [ๆ ธๅฟƒ็‰นๆ€ง / Features](#ๆ ธๅฟƒ็‰นๆ€ง-features)
- [ๆŠ€ๆœฏๆžถๆž„ / Architecture](#ๆŠ€ๆœฏๆžถๆž„-architecture)
- [ๅทฅไฝœๆต็จ‹ / Workflow](#ๅทฅไฝœๆต็จ‹-workflow)
- [ๆŽจ็†ๆต็จ‹่ฏฆ่งฃ / Inference Process](#ๆŽจ็†ๆต็จ‹่ฏฆ่งฃ-inference-process)
- [ๅฟซ้€Ÿๅผ€ๅง‹ / Quick Start](#ๅฟซ้€Ÿๅผ€ๅง‹-quick-start)
- [ไฝฟ็”จ่ฏดๆ˜Ž / Usage](#ไฝฟ็”จ่ฏดๆ˜Ž-usage)
- [้กน็›ฎ็ป“ๆž„ / Project Structure](#้กน็›ฎ็ป“ๆž„-project-structure)
- [ๅผ€ๆบๅผ•็”จ / Open Source](#ๅผ€ๆบๅผ•็”จ-open-source)
- [ๅ…่ดฃๅฃฐๆ˜Ž / Disclaimer](#ๅ…่ดฃๅฃฐๆ˜Ž-disclaimer)

---

## ไป“ๅบ“ไฟกๆฏ / Repository

| ้กน็›ฎ / Item | ไฟกๆฏ / Info |
|-------------|-------------|
| **ไป“ๅบ“ๅœฐๅ€ / URL** | https://github.com/jiuyou-dev/feishu-voice-skill |
| **ๆ‰€ๆœ‰่€… / Owner** | jiuyou-dev (ไนๅนฝๅฎž้ชŒๅฎค) |
| **่ฎธๅฏ่ฏ / License** | MIT License |
| **ๅผ€ๆบๅ่ฎฎ / Open Source** | ChatTTS (BSD-3-Clause), RVC (MIT) |

---

## ้กน็›ฎไป‹็ป / Introduction

### ไธญๆ–‡ไป‹็ป

**Feishu Voice Skill** ๆ˜ฏไธ€ๆฌพไธ“ไธบ OpenClaw AI ๅŠฉๆ‰‹่ฎพ่ฎก็š„้ฃžไนฆ่ฏญ้Ÿณๆถˆๆฏ็”ŸๆˆๆŠ€่ƒฝใ€‚ๅฎƒๅทงๅฆ™ๅœฐ็ป“ๅˆไบ†ไธคๅคงๆ ธๅฟƒๆŠ€ๆœฏ๏ผš

1. **ChatTTS**๏ผˆๅญ—่Š‚่ทณๅŠจๅผ€ๆบ็š„้ซ˜่ดจ้‡่ฏญ้Ÿณๅˆๆˆ็ณป็ปŸ๏ผ‰- ่ดŸ่ดฃๅฐ†ๆ–‡ๆœฌ่ฝฌๆขไธบ่‡ช็„ถๆต็•…็š„่ฏญ้Ÿณ
2. **RVC**๏ผˆๆฃ€็ดขๅผ่ฏญ้Ÿณ่ฝฌๆข๏ผ‰- ่ดŸ่ดฃๅฐ†่ฏญ้Ÿณ่ฝฌๆขไธบ็‰นๅฎš้Ÿณ่‰ฒ๏ผŒไฟ็•™ๅŽŸๅง‹่ฏญ้Ÿณ็š„ๆƒ…ๆ„Ÿๅ’Œ้Ÿตๅพ‹

้€š่ฟ‡่ฟ™ไธค่€…็š„ๅฎŒ็พŽ็ป“ๅˆ๏ผŒๆˆ‘ไปฌ่ƒฝๅคŸ็”Ÿๆˆ**ๆธฉๅ’Œไบฒๅˆ‡ใ€ๆƒ…ๆ„ŸไธฐๅฏŒใ€้Ÿตๅพ‹่‡ช็„ถ**็š„่ฏญ้Ÿณๆถˆๆฏ๏ผŒๅนถ้€š่ฟ‡้ฃžไนฆๅนณๅฐๅ‘้€็ป™็”จๆˆทใ€‚

### English Introduction

**Feishu Voice Skill** is a voice message generation skill designed specifically for the OpenClaw AI assistant. It ingeniously combines two core technologies:

1. **ChatTTS** (ByteDance's open-source high-quality text-to-speech system) - responsible for converting text into natural and fluent speech
2. **RVC** (Retrieval-based Voice Conversion) - responsible for transforming speech to specific timbres while preserving original emotion and prosody

Through the perfect combination of these two technologies, we can generate **warm, friendly, emotionally rich, and natural-sounding** voice messages and send them to users via the Feishu platform.

---

## ๆ ธๅฟƒ็‰นๆ€ง / Features

### ๐ŸŽฏ ไธป่ฆๅŠŸ่ƒฝ / Core Functions

| ๅŠŸ่ƒฝ / Feature | ๆ่ฟฐ / Description |
|----------------|-------------------|
| **ChatTTS ่ฏญ้Ÿณๅˆๆˆ** | ๅฐ†ไปปๆ„ๆ–‡ๆœฌ่ฝฌๆขไธบ่‡ช็„ถ่ฏญ้Ÿณ |
| **RVC ้Ÿณ่‰ฒ่ฝฌๆข** | ๅฐ† ChatTTS ่ฏญ้Ÿณ่ฝฌๆขไธบ็›ฎๆ ‡้Ÿณ่‰ฒ |
| **้ฃžไนฆๆถˆๆฏๅ‘้€** | ๆ”ฏๆŒ็พค่Šๅ’Œ็ง่Š่ฏญ้Ÿณๆถˆๆฏๅ‘้€ |
| **้•ฟๆ–‡ๆœฌๅค„็†** | ่‡ชๅŠจๅˆ†ๆฎตๅค„็†่ถ…้•ฟๆ–‡ๆœฌ |
| **ๆ•ฐๅญ—่ฝฌๆข** | ้˜ฟๆ‹‰ไผฏๆ•ฐๅญ—่‡ชๅŠจ่ฝฌๆขไธบไธญๆ–‡ๅคงๅ†™ |
| **ๆ‰น้‡ๅค„็†** | ๆ”ฏๆŒๆ‰น้‡่ฏญ้Ÿณ็”Ÿๆˆ |

### ๐ŸŽ™๏ธ ๅฃฐ้Ÿณ็‰น็‚น / Voice Characteristics

- **ๆธฉๅ’Œไบฒๅˆ‡ / Warm & Friendly** - ้Ÿณ่ดจๆธฉๆš–่‡ช็„ถ๏ผŒๅฆ‚ๅŒไธŽๆœ‹ๅ‹ไบค่ฐˆ
- **ๆƒ…ๆ„ŸไธฐๅฏŒ / Rich Emotion** - ไฟ็•™ๆ–‡ๆœฌไธญ็š„ๆƒ…ๆ„Ÿ่กจ่พพ
- **้Ÿตๅพ‹่‡ช็„ถ / Natural Prosody** - ่ฏญ่ฐƒ่ตทไผ่‡ช็„ถ๏ผŒๅฌๆ„Ÿ่ˆ’้€‚
- **ๆธ…ๆ™ฐๅ‡†็กฎ / Clear & Accurate** - ๅ‘้Ÿณๆ ‡ๅ‡†๏ผŒ่ฏญไน‰ไผ ่พพๅ‡†็กฎ

---

## ๆŠ€ๆœฏๆžถๆž„ / Architecture

### ็ณป็ปŸๆžถๆž„ๅ›พ / System Architecture

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        Feishu Voice Skill                            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                      โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚   โ”‚   Text   โ”‚โ”€โ”€โ”€โ–ถโ”‚  ChatTTS โ”‚โ”€โ”€โ”€โ–ถโ”‚    RVC   โ”‚โ”€โ”€โ”€โ–ถโ”‚  Feishu  โ”‚   โ”‚
โ”‚   โ”‚  Input   โ”‚    โ”‚   (TTS)  โ”‚    โ”‚ (VC/้Ÿณ่‰ฒ) โ”‚    โ”‚  (Send)  โ”‚   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚       โ”‚                โ”‚                 โ”‚                โ”‚          โ”‚
โ”‚       โ–ผ                โ–ผ                 โ–ผ                โ–ผ          โ”‚
โ”‚   ็”จๆˆท่พ“ๅ…ฅ        ๆ–‡ๆœฌโ†’่ฏญ้Ÿณ           ้Ÿณ่‰ฒ่ฝฌๆข          ้ฃžไนฆๅ‘้€      โ”‚
โ”‚   User Input     Textโ†’Speech         Timbre Conv.     Message Send  โ”‚
โ”‚                                                                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```

### ๆ ธๅฟƒๆŠ€ๆœฏๆ ˆ / Technology Stack

| ็ป„ไปถ / Component | ๆŠ€ๆœฏ / Technology | ็‰ˆๆœฌ / Version | ่ฏดๆ˜Ž / Description |
|------------------|------------------|----------------|-------------------|
| TTS ๅผ•ๆ“Ž | ChatTTS | latest | ๅญ—่Š‚่ทณๅŠจๅผ€ๆบ้ซ˜่ดจ้‡่ฏญ้Ÿณๅˆๆˆ |
| ๅฃฐ้Ÿณ่ฝฌๆข | RVC | v2 | ๆฃ€็ดขๅผ่ฏญ้Ÿณ่ฝฌๆขๆจกๅž‹ |
| AI ๆก†ๆžถ | PyTorch | 2.x | ๆทฑๅบฆๅญฆไน ๆก†ๆžถ |
| ๆถˆๆฏๅนณๅฐ | ้ฃžไนฆ API | v1 | ่ฏญ้Ÿณๆถˆๆฏๅ‘้€ |
| ่ฟ่กŒๆ—ถ | Python | 3.11 | ็จ‹ๅบ่ฟ่กŒ็Žฏๅขƒ |
| ้Ÿณ้ข‘ๅค„็† | FFmpeg | latest | ้Ÿณ้ข‘ๆ ผๅผ่ฝฌๆข |

---

## ๅทฅไฝœๆต็จ‹ / Workflow

### ๅฎŒๆ•ดๅทฅไฝœๆต็จ‹ / Complete Workflow

```
Step 1: ็”จๆˆท่พ“ๅ…ฅๆ–‡ๆœฌ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ็”จๆˆท่พ“ๅ…ฅๆˆ– AI ็”Ÿๆˆๆ–‡ๆœฌ              โ”‚
โ”‚  ไพ‹: "ไฝ ๅฅฝ๏ผŒไปŠๅคฉๅคฉๆฐ”็œŸไธ้”™ๅ‘€~"       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚
                โ–ผ
Step 2: ChatTTS ่ฏญ้Ÿณๅˆๆˆ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ChatTTS ๅฐ†ๆ–‡ๆœฌ่ฝฌๆขไธบ่ฏญ้Ÿณ            โ”‚
โ”‚  - ่‡ช็„ถๆต็•…็š„่ฏญ่ฐƒ                    โ”‚
โ”‚  - ไฟ็•™ๆƒ…ๆ„Ÿ่กจ่พพ                      โ”‚
โ”‚  - ้Ÿตๅพ‹่Š‚ๅฅ่‡ช็„ถ                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚
                โ–ผ
Step 3: ้Ÿณ้ข‘้ข„ๅค„็†
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  - ้‡‡ๆ ท็އ่ฝฌๆข                        โ”‚
โ”‚  - ๆ ผๅผ่ฝฌๆข (wav/opus)              โ”‚
โ”‚  - ้Ÿณ้ข‘่ดจ้‡ไผ˜ๅŒ–                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚
                โ–ผ
Step 4: RVC ้Ÿณ่‰ฒ่ฝฌๆข
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  RVC ๅฐ†่ฏญ้Ÿณ่ฝฌๆขไธบ็›ฎๆ ‡้Ÿณ่‰ฒ            โ”‚
โ”‚  - ไฟ็•™ๅŽŸๅง‹ๆƒ…ๆ„Ÿ                      โ”‚
โ”‚  - ไฟ็•™้Ÿตๅพ‹็‰นๅพ                      โ”‚
โ”‚  - ๅบ”็”จ็›ฎๆ ‡้Ÿณ่‰ฒๆจกๅž‹                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚
                โ–ผ
Step 5: ๅŽๅค„็†ไธŽๅ‘้€
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  - ้Ÿณ้ข‘ๅˆๅนถ                          โ”‚
โ”‚  - ๆ ผๅผ่ฝฌๆขไธบ OPUS (้ฃžไนฆไธ“็”จ)        โ”‚
โ”‚  - ๅ‘้€่‡ณ้ฃžไนฆ                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚
                โ–ผ
Step 6: ็”จๆˆทๆŽฅๆ”ถ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ็”จๆˆทๅœจ้ฃžไนฆไธญๆŽฅๆ”ถ่ฏญ้Ÿณๆถˆๆฏ            โ”‚
โ”‚  ็›ดๆŽฅๆ’ญๆ”พ๏ผŒๆ— ้œ€ไธ‹่ฝฝ                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```

---

## ๆŽจ็†ๆต็จ‹่ฏฆ่งฃ / Inference Process

### 1. ChatTTS ๆŽจ็†ๆต็จ‹ / ChatTTS Inference Process

```
่พ“ๅ…ฅๆ–‡ๆœฌ
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  1. ๆ–‡ๆœฌ่ง„่ŒƒๅŒ– (Text Normalization) โ”‚
โ”‚     - ๆ•ฐๅญ—่ฝฌไธญๆ–‡              โ”‚
โ”‚     - ็‰นๆฎŠ็ฌฆๅทๅค„็†            โ”‚
โ”‚     - ๅคš้Ÿณๅญ—ๅค„็†              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  2. ่ฏญไน‰ๅˆ†ๆž (Semantic Analysis)   โ”‚
โ”‚     - ๅฅๅญ่พน็•Œๆฃ€ๆต‹            โ”‚
โ”‚     - ๆƒ…ๆ„Ÿๆ ‡ๆณจ                โ”‚
โ”‚     - ้Ÿตๅพ‹้ข„ๆต‹                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  3. ็”Ÿๆˆ้Ÿณ้ข‘ๅ‚ๆ•ฐ              โ”‚
โ”‚     - ๆข…ๅฐ”้ข‘่ฐฑ (Mel-Spectrogram) โ”‚
โ”‚     - pitch ่ฝฎๅป“             โ”‚
โ”‚     - ่ƒฝ้‡ๆ›ฒ็บฟ                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  4. ๅฃฐ็ ๅ™จๅˆๆˆ (Vocoder)         โ”‚
โ”‚     - HiFiGAN / BigVGAN       โ”‚
โ”‚     - ๆณขๅฝข็”Ÿๆˆ                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
ChatTTS ่ฏญ้Ÿณ่พ“ๅ‡บ (raw_audio.wav)
```

### 2. RVC ๆŽจ็†ๆต็จ‹ / RVC Inference Process

```
ChatTTS ่ฏญ้Ÿณ่พ“ๅ…ฅ
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  1. ้Ÿณ้ข‘้ข„ๅค„็†                โ”‚
โ”‚     - ้‡้‡‡ๆ ท (ๆ นๆฎๆจกๅž‹่ฆๆฑ‚)    โ”‚
โ”‚     - ๅ•ๅฃฐ้“่ฝฌๆข              โ”‚
โ”‚     - ๆ ‡ๅ‡†ๅŒ–                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  2. F0 ๆๅ– (Pitch Extraction)  โ”‚
โ”‚     - RMVPE (ๆŽจ่)            โ”‚
โ”‚     - Harvest                 โ”‚
โ”‚     - Crepe                   โ”‚
โ”‚     - ๆๅ–ๅŸบ้ข‘่ฝฎๅป“            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  3. ็‰นๅพๆๅ–                  โ”‚
โ”‚     - Huberts ็‰นๅพๆๅ–        โ”‚
โ”‚     - ้Ÿณ้ข‘่กจ็คบๅญฆไน             โ”‚
โ”‚     - 1000ๅธง/็ง’               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  4. ้Ÿณ่‰ฒ่ฝฌๆข                  โ”‚
โ”‚     - ๅŠ ่ฝฝ RVC ๆจกๅž‹ๆƒ้‡       โ”‚
โ”‚     - ็‰นๅพๆ˜ ๅฐ„                โ”‚
โ”‚     - ้Ÿณ่‰ฒๅ˜ๆข                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  5. ๆณขๅฝข้‡ๅปบ                  โ”‚
โ”‚     - ้€†ๅ˜ๆข                  โ”‚
โ”‚     - ่พ“ๅ‡บ็›ฎๆ ‡้Ÿณ่‰ฒ่ฏญ้Ÿณ         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
RVC ่ฝฌๆขๅŽ่ฏญ้Ÿณ (voice_converted.wav)
```

### 3. ้ฃžไนฆๅ‘้€ๆต็จ‹ / Feishu Send Process

```
RVC ่ฝฌๆขๅŽ่ฏญ้Ÿณ
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  1. ๆ ผๅผ่ฝฌๆข                  โ”‚
โ”‚     - WAV โ†’ OPUS (FFmpeg)     โ”‚
โ”‚     - ้ฃžไนฆๅชๆ”ฏๆŒ OPUS ๆ ผๅผ    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  2. ้Ÿณ้ข‘ไธŠไผ                   โ”‚
โ”‚     - ่ฐƒ็”จ้ฃžไนฆ API            โ”‚
โ”‚     - ่Žทๅ– file_key           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  3. ๅ‘้€่ฏญ้Ÿณๆถˆๆฏ              โ”‚
โ”‚     - ่ฐƒ็”จๅ‘ๆถˆๆฏ API          โ”‚
โ”‚     - ๆŒ‡ๅฎšๆŽฅๆ”ถ่€…              โ”‚
โ”‚     - ๆ”ฏๆŒ็พค่Š/็ง่Š           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
้ฃžไนฆๆถˆๆฏๅ‘้€ๆˆๅŠŸ โœ“
```

---

## ๅฟซ้€Ÿๅผ€ๅง‹ / Quick Start

### ็Žฏๅขƒ่ฆๆฑ‚ / Requirements

| ่ฆๆฑ‚ / Requirement | ๆœ€ไฝŽ้…็ฝฎ / Minimum | ๆŽจ่้…็ฝฎ / Recommended |
|-------------------|-------------------|----------------------|
| Python | 3.11 | 3.11 |
| GPU | GTX 1060 6GB | RTX 3060 12GB+ |
| ๅ†…ๅญ˜ / RAM | 8GB | 16GB+ |
| ็ฃ็›˜็ฉบ้—ด / Disk | 10GB | 20GB+ |
| FFmpeg | โœ“ ๅฟ…้œ€ / Required | Latest |

### ๅฎ‰่ฃ…ๆญฅ้ชค / Installation

#### ไธญๆ–‡ๅฎ‰่ฃ…

```bash
# 1. ๅ…‹้š†ไป“ๅบ“
git clone https://github.com/jiuyou-dev/feishu-voice-skill.git
cd feishu-voice-skill

# 2. ๅฎ‰่ฃ… Python ไพ่ต–
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r packages/chatttts/requirements.txt
pip install -r packages/rvc/requirements.txt

# 3. ๅฎ‰่ฃ… FFmpeg (Windows)
# ไธ‹่ฝฝ https://ffmpeg.org/download.html
# ๆˆ–ไฝฟ็”จ: winget install ffmpeg

# 4. ้…็ฝฎ้ฃžไนฆ API ๅฏ†้’ฅ
# ๅœจ้ฃžไนฆๅผ€ๆ”พๅนณๅฐๅˆ›ๅปบๅบ”็”จๅนถ่Žทๅ– app_id ๅ’Œ app_secret
```

#### English Installation

```bash
# 1. Clone the repository
git clone https://github.com/jiuyou-dev/feishu-voice-skill.git
cd feishu-voice-skill

# 2. Install Python dependencies
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r packages/chatttts/requirements.txt
pip install -r packages/rvc/requirements.txt

# 3. Install FFmpeg (Windows)
# Download from https://ffmpeg.org/download.html
# Or use: winget install ffmpeg

# 4. Configure Feishu API credentials
# Create an app on Feishu Open Platform and get app_id and app_secret
```

---

## ไฝฟ็”จ่ฏดๆ˜Ž / Usage

### ๅŸบๆœฌ็”จๆณ• / Basic Usage

#### ไธญๆ–‡ไฝฟ็”จ

```python
# ๆ–นๅผ1: ไฝฟ็”จ่”ๅˆๆŽจ็†็ฎก้“ (ๆŽจ่)
from scripts.chattts_rvc_pipeline import ChatTTSRVCPipeline

pipeline = ChatTTSRVCPipeline()
pipeline.run(
    text="ไฝ ๅฅฝ๏ผŒไปŠๅคฉๅคฉๆฐ”็œŸไธ้”™ๅ‘€~",
    output_path="output.wav"
)

# ๆ–นๅผ2: ๅ‘้€้ฃžไนฆ่ฏญ้Ÿณๆถˆๆฏ
from scripts.feishu_voice import send_voice_message

send_voice_message(
    text="่ฟ™ๆ˜ฏๆต‹่ฏ•่ฏญ้Ÿณๆถˆๆฏ",
    receive_id="ou_xxxxx",  # ้ฃžไนฆ็”จๆˆท open_id
    receive_id_type="open_id"
)
```

#### English Usage

```python
# Method 1: Use combined pipeline (Recomm

... (truncated)
voice

Comments

Sign in to leave a comment

Loading comments...