← Back to Plugins
Voice

Agent Voice Hub

gancnmd By gancnmd 👁 69 views ▲ 0 votes

Voice interaction plugin for AI agents (Hermes, OpenClaw) - TTS, STT, voice cloning, security scanning

GitHub

Install

pip install agent-voice-hub

Configuration Example

# ๆ–นๅผไบŒ๏ผš้€š่ฟ‡ YAML ้…็ฝฎๆณจๅ†Œ
# ~/.hermes/skills/voice-hub.yaml
name: voice-hub
description: AI ่ฏญ้Ÿณๅค„็†ๆŠ€่ƒฝ
commands:
  speak:
    handler: agent_voice_hub.integrations.hermes:speak
    args:
      text: { type: string, required: true }
      voice: { type: string, default: "zh-CN-XiaoxiaoNeural" }
  listen:
    handler: agent_voice_hub.integrations.hermes:listen
    args:
      audio_path: { type: string, required: true }

README

<div align="center">

# ๐ŸŽ™๏ธ Agent Voice Hub

[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![GitHub Stars](https://img.shields.io/github/stars/gancnmd/agent-voice-hub?style=social)](https://github.com/gancnmd/agent-voice-hub)

**ไธญๆ–‡๏ผšAI ๆ™บ่ƒฝไฝ“่ฏญ้Ÿณไธญๆžข โ€” ไธ€็ซ™ๅผ TTS / STT / ๅฃฐ้Ÿณๅ…‹้š† / ไปฃ็ ๅฎ‰ๅ…จๆ‰ซๆๅนณๅฐ**
**English: All-in-one TTS / STT / Voice Cloning / Code Security Scanner for AI Agents**

</div>

---

## โœจ ๅŠŸ่ƒฝ็‰นๆ€ง (Features)

| ็Šถๆ€ | ๆจกๅ— | ่ฏดๆ˜Ž |
|:---:|------|------|
| โœ… | **TTS ๆ–‡ๆœฌ่ฝฌ่ฏญ้Ÿณ** | Edge-TTS๏ผˆๅ…่ดน๏ผŒ300+ ้Ÿณ่‰ฒ๏ผ‰+ Coqui XTTS๏ผˆๅฃฐ้Ÿณๅ…‹้š†๏ผ‰ |
| โœ… | **STT ่ฏญ้Ÿณ่ฝฌๆ–‡ๆœฌ** | faster-whisper๏ผŒๆ”ฏๆŒ 99 ็ง่ฏญ่จ€๏ผŒ่‡ชๅŠจๆฃ€ๆต‹่ฏญ็ง |
| โœ… | **ๅฃฐ้Ÿณๅ…‹้š†** | Coqui XTTS v2 ้›ถๆ ทๆœฌๅ…‹้š†๏ผŒไป…้œ€ 6-30 ็ง’ๅ‚่€ƒ้Ÿณ้ข‘ |
| โœ… | **ๅฎ‰ๅ…จๆ‰ซๆๅ™จ** | 46 ็งๆผๆดžๆจกๅผ๏ผˆๅ‘ฝไปคๆณจๅ…ฅใ€SQLiใ€XSSใ€็กฌ็ผ–็ ๅฏ†้’ฅ็ญ‰๏ผ‰ |
| โœ… | **ๆ™บ่ƒฝไฝ“้›†ๆˆ** | Hermes Skill / OpenClaw Plugin / ้€š็”จ REST API |
| โœ… | **CLI ๅทฅๅ…ท** | speak / listen / scan / voices / clone ไบ”ๅคงๅ‘ฝไปค |
| โœ… | **YAML ้…็ฝฎ** | ็ตๆดป็š„ YAML ้…็ฝฎๆ–‡ไปถ้ฉฑๅŠจ |

---

## ๐Ÿ—๏ธ ๆžถๆž„ๆฆ‚่งˆ (Architecture)

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    ๐ŸŽ™๏ธ Agent Voice Hub                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  CLI ๅฑ‚  โ”‚ API ๅฑ‚   โ”‚  Agent ้›†ๆˆๅฑ‚ โ”‚     ้…็ฝฎๅฑ‚ (YAML)      โ”‚
โ”‚ speak    โ”‚ REST API โ”‚  Hermes      โ”‚   config.yaml          โ”‚
โ”‚ listen   โ”‚ WebSocketโ”‚  OpenClaw    โ”‚                        โ”‚
โ”‚ scan     โ”‚          โ”‚  Generic     โ”‚                        โ”‚
โ”‚ voices   โ”‚          โ”‚              โ”‚                        โ”‚
โ”‚ clone    โ”‚          โ”‚              โ”‚                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                     ๆ ธๅฟƒๅผ•ๆ“Ž (Core Engine)                    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  TTS Engine  โ”‚  STT Engine   โ”‚   Security Scanner           โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚Edge-TTS โ”‚ โ”‚ โ”‚  faster-  โ”‚ โ”‚   โ”‚ 46 vulnerability   โ”‚    โ”‚
โ”‚  โ”‚(ๅ…่ดน)    โ”‚ โ”‚ โ”‚  whisper  โ”‚ โ”‚   โ”‚ patterns           โ”‚    โ”‚
โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ โ”‚(99 languagesโ”‚ โ”‚   โ”‚ Python/JS/Shell   โ”‚    โ”‚
โ”‚  โ”‚Coqui    โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚  โ”‚XTTS v2  โ”‚ โ”‚               โ”‚                              โ”‚
โ”‚  โ”‚(ๅ…‹้š†)    โ”‚ โ”‚               โ”‚                              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚               โ”‚                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```

---

## ๐Ÿ“ฆ ๅฎ‰่ฃ… (Installation)

### ไปŽ PyPI ๅฎ‰่ฃ…

```bash
pip install agent-voice-hub
```

### ไปŽๆบ็ ๅฎ‰่ฃ…

```bash
git clone https://github.com/gancnmd/agent-voice-hub.git
cd agent-voice-hub
pip install -e .
```

### ๅฎ‰่ฃ…ๅฏ้€‰ไพ่ต–

```bash
# ๅฃฐ้Ÿณๅ…‹้š†ๅŠŸ่ƒฝ๏ผˆ้œ€่ฆ GPU๏ผ‰
pip install agent-voice-hub[clone]

# ๅฎŒๆ•ดๅฎ‰่ฃ…๏ผˆๆ‰€ๆœ‰ๅŠŸ่ƒฝ๏ผ‰
pip install agent-voice-hub[all]
```

---

## ๐Ÿš€ ๅฟซ้€Ÿๅผ€ๅง‹ (Quick Start)

### CLI ๅ‘ฝไปค่กŒ

```bash
# ๆ–‡ๆœฌ่ฝฌ่ฏญ้Ÿณ โ€” ไธญๆ–‡
agent-voice speak "ไฝ ๅฅฝไธ–็•Œ" --voice zh-CN-XiaoxiaoNeural --output hello.mp3

# ๆ–‡ๆœฌ่ฝฌ่ฏญ้Ÿณ โ€” ่‹ฑๆ–‡
agent-voice speak "Hello World" --voice en-US-AriaNeural --output hello.mp3

# ่ฏญ้Ÿณ่ฝฌๆ–‡ๆœฌ
agent-voice listen audio.mp3 --language zh --output transcript.txt

# ่‡ชๅŠจๆฃ€ๆต‹่ฏญ็งๅนถ่ฝฌๅ†™
agent-voice listen audio.mp3 --auto-detect

# ๆŸฅ็œ‹ๅฏ็”จ้Ÿณ่‰ฒ
agent-voice voices --language zh-CN

# ไปฃ็ ๅฎ‰ๅ…จๆ‰ซๆ
agent-voice scan ./src --format json --output report.json

# ๅฃฐ้Ÿณๅ…‹้š†
agent-voice clone --ref-audio speaker.wav --text "่ฟ™ๆ˜ฏๅ…‹้š†็š„ๅฃฐ้Ÿณ" --output cloned.mp3
```

### Python API

```python
from agent_voice_hub import TTS, STT, SecurityScanner

# ========== TTS ๆ–‡ๆœฌ่ฝฌ่ฏญ้Ÿณ ==========
tts = TTS(engine="edge-tts")

# ไธญๆ–‡่ฏญ้Ÿณๅˆๆˆ
tts.speak(
    "ๆฌข่ฟŽไฝฟ็”จ Agent Voice Hub",
    voice="zh-CN-XiaoxiaoNeural",
    output="welcome.mp3"
)

# ่‹ฑๆ–‡่ฏญ้Ÿณๅˆๆˆ
tts.speak(
    "Welcome to Agent Voice Hub",
    voice="en-US-AriaNeural",
    output="welcome_en.mp3"
)

# ๅˆ—ๅ‡บไธญๆ–‡้Ÿณ่‰ฒ
voices = tts.list_voices(language="zh-CN")
for v in voices:
    print(f"{v['name']} โ€” {v['gender']}")


# ========== STT ่ฏญ้Ÿณ่ฝฌๆ–‡ๆœฌ ==========
stt = STT(engine="faster-whisper")

# ่‡ชๅŠจๆฃ€ๆต‹่ฏญ็ง
result = stt.transcribe("audio.mp3", auto_detect=True)
print(f"่ฏญ็ง: {result.language}, ็ฝฎไฟกๅบฆ: {result.confidence}")
print(f"ๆ–‡ๆœฌ: {result.text}")

# ๆŒ‡ๅฎš่ฏญ็ง
result = stt.transcribe("audio.mp3", language="zh")


# ========== ๅฎ‰ๅ…จๆ‰ซๆ ==========
scanner = SecurityScanner()

# ๆ‰ซๆ็›ฎๅฝ•
report = scanner.scan("./src")
print(f"ๅ‘็Žฐ {report.total_issues} ไธชๅฎ‰ๅ…จ้—ฎ้ข˜")

for issue in report.issues:
    print(f"[{issue.severity}] {issue.file}:{issue.line} โ€” {issue.description}")
```

---

## ๐Ÿค– ๆ™บ่ƒฝไฝ“้›†ๆˆ (Agent Integrations)

### Hermes Skill

```python
# ๅœจ Hermes ๆ™บ่ƒฝไฝ“ไธญไฝฟ็”จ voice-hub skill

# ๆ–นๅผไธ€๏ผš็›ดๆŽฅ่ฐƒ็”จ
from agent_voice_hub.integrations import hermes_skill

@hermes_skill
def voice_assistant(text: str, action: str = "speak"):
    """่ฏญ้ŸณๅŠฉๆ‰‹ๆŠ€่ƒฝ โ€” ๆ”ฏๆŒ TTS/STT/ๆ‰ซๆ"""
    if action == "speak":
        return tts.speak(text, voice="zh-CN-XiaoxiaoNeural")
    elif action == "listen":
        return stt.transcribe(text)
    elif action == "scan":
        return scanner.scan(text)
```

```yaml
# ๆ–นๅผไบŒ๏ผš้€š่ฟ‡ YAML ้…็ฝฎๆณจๅ†Œ
# ~/.hermes/skills/voice-hub.yaml
name: voice-hub
description: AI ่ฏญ้Ÿณๅค„็†ๆŠ€่ƒฝ
commands:
  speak:
    handler: agent_voice_hub.integrations.hermes:speak
    args:
      text: { type: string, required: true }
      voice: { type: string, default: "zh-CN-XiaoxiaoNeural" }
  listen:
    handler: agent_voice_hub.integrations.hermes:listen
    args:
      audio_path: { type: string, required: true }
```

### OpenClaw Plugin

```python
# agent_voice_hub/integrations/openclaw_plugin.py
from agent_voice_hub.integrations import OpenClawPlugin

class VoiceHubPlugin(OpenClawPlugin):
    name = "voice-hub"
    version = "1.0.0"

    async def on_message(self, message):
        """่‡ชๅŠจ่ฏญ้Ÿณๅ›žๅค"""
        if message.has_audio:
            text = self.stt.transcribe(message.audio)
            response = await self.agent.process(text)
            audio = self.tts.speak(response)
            return self.reply(audio)
```

### ้€š็”จ REST API

```bash
# ๅฏๅŠจ API ๆœๅŠก
agent-voice serve --port 8080

# TTS ่ฏทๆฑ‚
curl -X POST http://localhost:8080/api/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "ไฝ ๅฅฝไธ–็•Œ", "voice": "zh-CN-XiaoxiaoNeural"}' \
  --output speech.mp3

# STT ่ฏทๆฑ‚
curl -X POST http://localhost:8080/api/stt \
  -F "[email protected]" \
  -F "language=zh"

# ๅฎ‰ๅ…จๆ‰ซๆ่ฏทๆฑ‚
curl -X POST http://localhost:8080/api/scan \
  -H "Content-Type: application/json" \
  -d '{"path": "./src", "format": "json"}'
```

---

## ๐Ÿ”’ ๅฎ‰ๅ…จๆ‰ซๆๅ™จ (Security Scanner)

ๅ†…็ฝฎ **46 ็งๆผๆดžๆฃ€ๆต‹ๆจกๅผ**๏ผŒ่ฆ†็›–ไธปๆต็ผ–็จ‹่ฏญ่จ€๏ผš

| ็ฑปๅˆซ | ๆฃ€ๆต‹้กน | ่ฏญ่จ€ |
|------|--------|------|
| ๐Ÿ ๅ‘ฝไปคๆณจๅ…ฅ | `os.system()`, `subprocess.shell=True`, `eval()`, `exec()` | Python |
| ๐Ÿ’‰ SQL ๆณจๅ…ฅ | ๅญ—็ฌฆไธฒๆ‹ผๆŽฅ SQLใ€ๆœชๅ‚ๆ•ฐๅŒ–ๆŸฅ่ฏข | Python/JS |
| ๐ŸŒ XSS | `innerHTML`, `document.write()`, ๆœช่ฝฌไน‰่พ“ๅ‡บ | JavaScript |
| ๐Ÿ”‘ ็กฌ็ผ–็ ๅฏ†้’ฅ | API Keyใ€ๅฏ†็ ใ€Token ๅญ—้ข้‡ | ๅ…จ่ฏญ่จ€ |
| ๐Ÿ“‚ ่ทฏๅพ„้ๅކ | `../` ๆ‹ผๆŽฅใ€ๆœช้ชŒ่ฏ็”จๆˆท่พ“ๅ…ฅ่ทฏๅพ„ | ๅ…จ่ฏญ่จ€ |
| โš ๏ธ ๅๅบๅˆ—ๅŒ– | `pickle.loads()`, `yaml.load()` ๆ—  SafeLoader | Python |
| ๐ŸŒ SSRF | ๆœช้ชŒ่ฏ URLใ€ๅ†…็ฝ‘ๅœฐๅ€ๆœช่ฟ‡ๆปค | Python/JS |
| ๐Ÿ”“ ๅผฑๅŠ ๅฏ† | MD5/SHA1 ็”จไบŽๅฏ†็ ใ€ECB ๆจกๅผ | ๅ…จ่ฏญ่จ€ |

### ๆ‰ซๆ็คบไพ‹

```bash
$ agent-voice scan ./my_project

๐Ÿ”’ ๅฎ‰ๅ…จๆ‰ซๆๆŠฅๅ‘Š โ€” ./my_project
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”

  ๆ‰ซๆๆ–‡ไปถ: 47 ไธช
  ๅ‘็Žฐ้—ฎ้ข˜: 12 ไธช

  ๐Ÿ”ด ไธฅ้‡ (Critical): 2
  ๐ŸŸ  ้ซ˜ๅฑ (High):     3
  ๐ŸŸก ไธญๅฑ (Medium):   4
  ๐Ÿ”ต ไฝŽๅฑ (Low):      3

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”

๐Ÿ”ด CRITICAL โ€” auth.py:23
   ็กฌ็ผ–็ ๅฏ†็ : password = "admin123"
   ๅปบ่ฎฎ: ไฝฟ็”จ็Žฏๅขƒๅ˜้‡ๆˆ–ๅฏ†้’ฅ็ฎก็†ๆœๅŠก

๐Ÿ”ด CRITICAL โ€” utils.py:45
   ๅ‘ฝไปคๆณจๅ…ฅ: os.system(f"convert {user_input}")
   ๅปบ่ฎฎ: ไฝฟ็”จ subprocess.run() ๅนถไผ ๅ…ฅๅˆ—่กจๅ‚ๆ•ฐ

๐ŸŸ  HIGH โ€” db.py:67
   SQL ๆณจๅ…ฅ: cursor.execute(f"SELECT * FROM users WHERE id={uid}")
   ๅปบ่ฎฎ: ไฝฟ็”จๅ‚ๆ•ฐๅŒ–ๆŸฅ่ฏข cursor.execute("SELECT * FROM users WHERE id=?", (uid,))

๐ŸŸก MEDIUM โ€” app.js:12
   XSS ้ฃŽ้™ฉ: element.innerHTML = userInput
   ๅปบ่ฎฎ: ไฝฟ็”จ textContent ๆˆ– DOMPurify ๅ‡€ๅŒ–

โœ… ๆ‰ซๆๅฎŒๆˆใ€‚ไฟฎๅคๅปบ่ฎฎ่ฏทๅ‚่€ƒๆŠฅๅ‘Š่ฏฆๆƒ…ใ€‚
```

---

## ๐ŸŽค ๅฃฐ้Ÿณๅ…‹้š† (Voice Cloning)

ๅŸบไบŽ **Coqui XTTS v2** ็š„้›ถๆ ทๆœฌๅฃฐ้Ÿณๅ…‹้š†๏ผŒไป…้œ€ 6-30 ็ง’ๅ‚่€ƒ้Ÿณ้ข‘ๅณๅฏ็”Ÿๆˆ้ซ˜ๅบฆ็›ธไผผ็š„่ฏญ้Ÿณใ€‚

### ไฝฟ็”จๆ–นๆณ•

```python
from agent_voice_hub import VoiceCloner

cloner = VoiceCloner(model="xtts_v2")

# ไปŽๅ‚่€ƒ้Ÿณ้ข‘ๅ…‹้š†ๅฃฐ้Ÿณ
cloner.clone(
    reference_audio="speaker_sample.wav",   # 6-30 ็ง’ๅ‚่€ƒ้Ÿณ้ข‘
    text="่ฟ™ๆ˜ฏ็”จๅ…‹้š†ๅฃฐ้Ÿณ่ฏดๅ‡บ็š„ๅ†…ๅฎน",           # ่ฆๅˆๆˆ็š„ๆ–‡ๆœฌ
    output="cloned_output.wav",              # ่พ“ๅ‡บๆ–‡ไปถ
    language="zh"                            # ่ฏญ่จ€
)
```

### CLI ไฝฟ็”จ

```bash
# ๅŸบๆœฌๅ…‹้š†
agent-voice clone \
  --ref-audio ./samples/speaker.wav \
  --text "ไฝ ๅฅฝ๏ผŒ่ฟ™ๆ˜ฏๆˆ‘ๅ…‹้š†็š„ๅฃฐ้Ÿณ" \
  --output cloned.wav

# ๆŒ‡ๅฎš่ฏญ่จ€ๅ’Œๆธฉๅบฆๅ‚ๆ•ฐ
agent-voice clone \
  --ref-audio ./samples/speaker.wav \
  --text "Hello, this is my cloned voice" \
  --language en \
  --temperature 0.7 \
  --output cloned_en.wav
```

### ๅ‚่€ƒ้Ÿณ้ข‘่ฆๆฑ‚

| ๅ‚ๆ•ฐ | ่ฆๆฑ‚ |
|------|------|
| ๆ—ถ้•ฟ | 6-30 ็ง’๏ผˆๆŽจ่ 10-15 ็ง’๏ผ‰ |
| ๆ ผๅผ | WAV / MP3 / FLAC / OGG |
| ้‡‡ๆ ท็އ | โ‰ฅ 16kHz |
| ่ดจ้‡ | ๆธ…ๆ™ฐๆ— ๅ™ช้Ÿณ๏ผŒๅ•ไบบ่ฏด่ฏ |
| ็Žฏๅขƒ | ๆŽจ่ GPU๏ผˆCUDA๏ผ‰๏ผŒCPU ไบฆๅฏ๏ผˆ่พƒๆ…ข๏ผ‰ |

---

## โš™๏ธ ้…็ฝฎ (Configuration)

้…็ฝฎๆ–‡ไปถ้ป˜่ฎค่ทฏๅพ„๏ผš`~/.agent-voice-hub/config.yaml`

```yaml
# config.yaml
tts:
  engine: edge-tts          # edge-tts | xtts
  default_voice: zh-CN-XiaoxiaoNeural
  output_format: mp3        # mp3 | wav | ogg
  speed: 1.0                # ่ฏญ้€Ÿ (0.5 - 2.0)
  pitch: 0                  # ้Ÿณ่ฐƒ (-50Hz - +50Hz)

stt:
  engine: faster-whisper
  model_size: large-v3      # tiny | base | small | medium | large-v3
  device: auto              # auto | cpu | cuda
  compute_type: float16     # float16 | int8 | float32
  auto_detect: true         # ่‡ชๅŠจๆฃ€ๆต‹่ฏญ็ง

voice_clone:
  model: xtts_v2
  temperature: 0.65
  top_p: 0.85
  repetition_penalty: 5.0

security:
  severity_threshold: low   # low | medium | high | critical
  exclude_patterns:
    - "**/test/**"
    - "**/node_modules/**"
    - "**/.venv/**"
  languages:
    - python
    - javascript
    - shell

server:
  host: 0.0.0.0
  port: 8080
  cors_origins: ["*"]

logging:
  level: INFO               # DEBUG | INFO | WARNING | ERROR
  file: ~/.agent-voice-hub/logs/app.log
```

---

## ๐ŸŽต ๆ”ฏๆŒ็š„้Ÿณ่‰ฒ (Supported Voices)

Edge-TTS ๆไพ› **300+ ้Ÿณ่‰ฒ**๏ผŒไปฅไธ‹ไธบๅธธ็”จไธญๆ–‡้Ÿณ่‰ฒ๏ผš

| ้Ÿณ่‰ฒ ID | ่ฏญ่จ€ | ๆ€งๅˆซ | ่ฏดๆ˜Ž |
|---------|------|:---:|------|
| **zh-CN-XiaoxiaoNeural** | ไธญๆ–‡๏ผˆๆ™ฎ้€š่ฏ๏ผ‰ | ๅฅณ | โญ ้ป˜่ฎคๆŽจ่๏ผŒ่‡ช็„ถๆต็•… |
| **zh-CN-YunxiNeural** | ไธญๆ–‡๏ผˆๆ™ฎ้€š่ฏ๏ผ‰ | ็”ท | ้˜ณๅ…‰ๅฐ‘ๅนด้Ÿณ |
| **zh-CN-YunjianNeural** | ไธญๆ–‡๏ผˆๆ™ฎ้€š่ฏ๏ผ‰ | ็”ท | ๆฒ‰็จณๆ–ฐ้—ปๆ’ญๆŠฅ |
| **zh-CN-XiaoyiNeural** | ไธญๆ–‡๏ผˆๆ™ฎ้€š่ฏ๏ผ‰ | ๅฅณ | ็”œ็พŽๅฎขๆœ้Ÿณ |
| **zh-CN-YunyangNeural** | ไธญๆ–‡๏ผˆๆ™ฎ้€š่ฏ๏ผ‰ | ็”ท | ไธ“ไธšไธปๆŒ้Ÿณ |
| **zh-CN-XiaochenNeural** | ไธญๆ–‡๏ผˆๆ™ฎ้€š่ฏ๏ผ‰ | ๅฅณ | ๆธฉๆš–็Ÿฅๆ€ง |
| **zh-CN-XiaohanNeural** | ไธญๆ–‡๏ผˆๆ™ฎ้€š่ฏ๏ผ‰ | ๅฅณ | ็ซฏๅบ„ๅคงๆฐ” |
| **z

... (truncated)
voice

Comments

Sign in to leave a comment

Loading comments...