Voice
Agent Voice Hub
Voice interaction plugin for AI agents (Hermes, OpenClaw) - TTS, STT, voice cloning, security scanning
Install
pip install agent-voice-hub
Configuration Example
# ๆนๅผไบ๏ผ้่ฟ YAML ้
็ฝฎๆณจๅ
# ~/.hermes/skills/voice-hub.yaml
name: voice-hub
description: AI ่ฏญ้ณๅค็ๆ่ฝ
commands:
speak:
handler: agent_voice_hub.integrations.hermes:speak
args:
text: { type: string, required: true }
voice: { type: string, default: "zh-CN-XiaoxiaoNeural" }
listen:
handler: agent_voice_hub.integrations.hermes:listen
args:
audio_path: { type: string, required: true }
README
<div align="center">
# ๐๏ธ Agent Voice Hub
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://github.com/gancnmd/agent-voice-hub)
**ไธญๆ๏ผAI ๆบ่ฝไฝ่ฏญ้ณไธญๆข โ ไธ็ซๅผ TTS / STT / ๅฃฐ้ณๅ
้ / ไปฃ็ ๅฎๅ
จๆซๆๅนณๅฐ**
**English: All-in-one TTS / STT / Voice Cloning / Code Security Scanner for AI Agents**
</div>
---
## โจ ๅ่ฝ็นๆง (Features)
| ็ถๆ | ๆจกๅ | ่ฏดๆ |
|:---:|------|------|
| โ
| **TTS ๆๆฌ่ฝฌ่ฏญ้ณ** | Edge-TTS๏ผๅ
่ดน๏ผ300+ ้ณ่ฒ๏ผ+ Coqui XTTS๏ผๅฃฐ้ณๅ
้๏ผ |
| โ
| **STT ่ฏญ้ณ่ฝฌๆๆฌ** | faster-whisper๏ผๆฏๆ 99 ็ง่ฏญ่จ๏ผ่ชๅจๆฃๆต่ฏญ็ง |
| โ
| **ๅฃฐ้ณๅ
้** | Coqui XTTS v2 ้ถๆ ทๆฌๅ
้๏ผไป
้ 6-30 ็งๅ่้ณ้ข |
| โ
| **ๅฎๅ
จๆซๆๅจ** | 46 ็งๆผๆดๆจกๅผ๏ผๅฝไปคๆณจๅ
ฅใSQLiใXSSใ็กฌ็ผ็ ๅฏ้ฅ็ญ๏ผ |
| โ
| **ๆบ่ฝไฝ้ๆ** | Hermes Skill / OpenClaw Plugin / ้็จ REST API |
| โ
| **CLI ๅทฅๅ
ท** | speak / listen / scan / voices / clone ไบๅคงๅฝไปค |
| โ
| **YAML ้
็ฝฎ** | ็ตๆดป็ YAML ้
็ฝฎๆไปถ้ฉฑๅจ |
---
## ๐๏ธ ๆถๆๆฆ่ง (Architecture)
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐๏ธ Agent Voice Hub โ
โโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโค
โ CLI ๅฑ โ API ๅฑ โ Agent ้ๆๅฑ โ ้
็ฝฎๅฑ (YAML) โ
โ speak โ REST API โ Hermes โ config.yaml โ
โ listen โ WebSocketโ OpenClaw โ โ
โ scan โ โ Generic โ โ
โ voices โ โ โ โ
โ clone โ โ โ โ
โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๆ ธๅฟๅผๆ (Core Engine) โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ TTS Engine โ STT Engine โ Security Scanner โ
โ โโโโโโโโโโโ โ โโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โEdge-TTS โ โ โ faster- โ โ โ 46 vulnerability โ โ
โ โ(ๅ
่ดน) โ โ โ whisper โ โ โ patterns โ โ
โ โโโโโโโโโโโค โ โ(99 languagesโ โ โ Python/JS/Shell โ โ
โ โCoqui โ โ โโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โXTTS v2 โ โ โ โ
โ โ(ๅ
้) โ โ โ โ
โ โโโโโโโโโโโ โ โ โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
---
## ๐ฆ ๅฎ่ฃ
(Installation)
### ไป PyPI ๅฎ่ฃ
```bash
pip install agent-voice-hub
```
### ไปๆบ็ ๅฎ่ฃ
```bash
git clone https://github.com/gancnmd/agent-voice-hub.git
cd agent-voice-hub
pip install -e .
```
### ๅฎ่ฃ
ๅฏ้ไพ่ต
```bash
# ๅฃฐ้ณๅ
้ๅ่ฝ๏ผ้่ฆ GPU๏ผ
pip install agent-voice-hub[clone]
# ๅฎๆดๅฎ่ฃ
๏ผๆๆๅ่ฝ๏ผ
pip install agent-voice-hub[all]
```
---
## ๐ ๅฟซ้ๅผๅง (Quick Start)
### CLI ๅฝไปค่ก
```bash
# ๆๆฌ่ฝฌ่ฏญ้ณ โ ไธญๆ
agent-voice speak "ไฝ ๅฅฝไธ็" --voice zh-CN-XiaoxiaoNeural --output hello.mp3
# ๆๆฌ่ฝฌ่ฏญ้ณ โ ่ฑๆ
agent-voice speak "Hello World" --voice en-US-AriaNeural --output hello.mp3
# ่ฏญ้ณ่ฝฌๆๆฌ
agent-voice listen audio.mp3 --language zh --output transcript.txt
# ่ชๅจๆฃๆต่ฏญ็งๅนถ่ฝฌๅ
agent-voice listen audio.mp3 --auto-detect
# ๆฅ็ๅฏ็จ้ณ่ฒ
agent-voice voices --language zh-CN
# ไปฃ็ ๅฎๅ
จๆซๆ
agent-voice scan ./src --format json --output report.json
# ๅฃฐ้ณๅ
้
agent-voice clone --ref-audio speaker.wav --text "่ฟๆฏๅ
้็ๅฃฐ้ณ" --output cloned.mp3
```
### Python API
```python
from agent_voice_hub import TTS, STT, SecurityScanner
# ========== TTS ๆๆฌ่ฝฌ่ฏญ้ณ ==========
tts = TTS(engine="edge-tts")
# ไธญๆ่ฏญ้ณๅๆ
tts.speak(
"ๆฌข่ฟไฝฟ็จ Agent Voice Hub",
voice="zh-CN-XiaoxiaoNeural",
output="welcome.mp3"
)
# ่ฑๆ่ฏญ้ณๅๆ
tts.speak(
"Welcome to Agent Voice Hub",
voice="en-US-AriaNeural",
output="welcome_en.mp3"
)
# ๅๅบไธญๆ้ณ่ฒ
voices = tts.list_voices(language="zh-CN")
for v in voices:
print(f"{v['name']} โ {v['gender']}")
# ========== STT ่ฏญ้ณ่ฝฌๆๆฌ ==========
stt = STT(engine="faster-whisper")
# ่ชๅจๆฃๆต่ฏญ็ง
result = stt.transcribe("audio.mp3", auto_detect=True)
print(f"่ฏญ็ง: {result.language}, ็ฝฎไฟกๅบฆ: {result.confidence}")
print(f"ๆๆฌ: {result.text}")
# ๆๅฎ่ฏญ็ง
result = stt.transcribe("audio.mp3", language="zh")
# ========== ๅฎๅ
จๆซๆ ==========
scanner = SecurityScanner()
# ๆซๆ็ฎๅฝ
report = scanner.scan("./src")
print(f"ๅ็ฐ {report.total_issues} ไธชๅฎๅ
จ้ฎ้ข")
for issue in report.issues:
print(f"[{issue.severity}] {issue.file}:{issue.line} โ {issue.description}")
```
---
## ๐ค ๆบ่ฝไฝ้ๆ (Agent Integrations)
### Hermes Skill
```python
# ๅจ Hermes ๆบ่ฝไฝไธญไฝฟ็จ voice-hub skill
# ๆนๅผไธ๏ผ็ดๆฅ่ฐ็จ
from agent_voice_hub.integrations import hermes_skill
@hermes_skill
def voice_assistant(text: str, action: str = "speak"):
"""่ฏญ้ณๅฉๆๆ่ฝ โ ๆฏๆ TTS/STT/ๆซๆ"""
if action == "speak":
return tts.speak(text, voice="zh-CN-XiaoxiaoNeural")
elif action == "listen":
return stt.transcribe(text)
elif action == "scan":
return scanner.scan(text)
```
```yaml
# ๆนๅผไบ๏ผ้่ฟ YAML ้
็ฝฎๆณจๅ
# ~/.hermes/skills/voice-hub.yaml
name: voice-hub
description: AI ่ฏญ้ณๅค็ๆ่ฝ
commands:
speak:
handler: agent_voice_hub.integrations.hermes:speak
args:
text: { type: string, required: true }
voice: { type: string, default: "zh-CN-XiaoxiaoNeural" }
listen:
handler: agent_voice_hub.integrations.hermes:listen
args:
audio_path: { type: string, required: true }
```
### OpenClaw Plugin
```python
# agent_voice_hub/integrations/openclaw_plugin.py
from agent_voice_hub.integrations import OpenClawPlugin
class VoiceHubPlugin(OpenClawPlugin):
name = "voice-hub"
version = "1.0.0"
async def on_message(self, message):
"""่ชๅจ่ฏญ้ณๅๅค"""
if message.has_audio:
text = self.stt.transcribe(message.audio)
response = await self.agent.process(text)
audio = self.tts.speak(response)
return self.reply(audio)
```
### ้็จ REST API
```bash
# ๅฏๅจ API ๆๅก
agent-voice serve --port 8080
# TTS ่ฏทๆฑ
curl -X POST http://localhost:8080/api/tts \
-H "Content-Type: application/json" \
-d '{"text": "ไฝ ๅฅฝไธ็", "voice": "zh-CN-XiaoxiaoNeural"}' \
--output speech.mp3
# STT ่ฏทๆฑ
curl -X POST http://localhost:8080/api/stt \
-F "[email protected]" \
-F "language=zh"
# ๅฎๅ
จๆซๆ่ฏทๆฑ
curl -X POST http://localhost:8080/api/scan \
-H "Content-Type: application/json" \
-d '{"path": "./src", "format": "json"}'
```
---
## ๐ ๅฎๅ
จๆซๆๅจ (Security Scanner)
ๅ
็ฝฎ **46 ็งๆผๆดๆฃๆตๆจกๅผ**๏ผ่ฆ็ไธปๆต็ผ็จ่ฏญ่จ๏ผ
| ็ฑปๅซ | ๆฃๆต้กน | ่ฏญ่จ |
|------|--------|------|
| ๐ ๅฝไปคๆณจๅ
ฅ | `os.system()`, `subprocess.shell=True`, `eval()`, `exec()` | Python |
| ๐ SQL ๆณจๅ
ฅ | ๅญ็ฌฆไธฒๆผๆฅ SQLใๆชๅๆฐๅๆฅ่ฏข | Python/JS |
| ๐ XSS | `innerHTML`, `document.write()`, ๆช่ฝฌไน่พๅบ | JavaScript |
| ๐ ็กฌ็ผ็ ๅฏ้ฅ | API Keyใๅฏ็ ใToken ๅญ้ข้ | ๅ
จ่ฏญ่จ |
| ๐ ่ทฏๅพ้ๅ | `../` ๆผๆฅใๆช้ช่ฏ็จๆท่พๅ
ฅ่ทฏๅพ | ๅ
จ่ฏญ่จ |
| โ ๏ธ ๅๅบๅๅ | `pickle.loads()`, `yaml.load()` ๆ SafeLoader | Python |
| ๐ SSRF | ๆช้ช่ฏ URLใๅ
็ฝๅฐๅๆช่ฟๆปค | Python/JS |
| ๐ ๅผฑๅ ๅฏ | MD5/SHA1 ็จไบๅฏ็ ใECB ๆจกๅผ | ๅ
จ่ฏญ่จ |
### ๆซๆ็คบไพ
```bash
$ agent-voice scan ./my_project
๐ ๅฎๅ
จๆซๆๆฅๅ โ ./my_project
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ๆซๆๆไปถ: 47 ไธช
ๅ็ฐ้ฎ้ข: 12 ไธช
๐ด ไธฅ้ (Critical): 2
๐ ้ซๅฑ (High): 3
๐ก ไธญๅฑ (Medium): 4
๐ต ไฝๅฑ (Low): 3
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ด CRITICAL โ auth.py:23
็กฌ็ผ็ ๅฏ็ : password = "admin123"
ๅปบ่ฎฎ: ไฝฟ็จ็ฏๅขๅ้ๆๅฏ้ฅ็ฎก็ๆๅก
๐ด CRITICAL โ utils.py:45
ๅฝไปคๆณจๅ
ฅ: os.system(f"convert {user_input}")
ๅปบ่ฎฎ: ไฝฟ็จ subprocess.run() ๅนถไผ ๅ
ฅๅ่กจๅๆฐ
๐ HIGH โ db.py:67
SQL ๆณจๅ
ฅ: cursor.execute(f"SELECT * FROM users WHERE id={uid}")
ๅปบ่ฎฎ: ไฝฟ็จๅๆฐๅๆฅ่ฏข cursor.execute("SELECT * FROM users WHERE id=?", (uid,))
๐ก MEDIUM โ app.js:12
XSS ้ฃ้ฉ: element.innerHTML = userInput
ๅปบ่ฎฎ: ไฝฟ็จ textContent ๆ DOMPurify ๅๅ
โ
ๆซๆๅฎๆใไฟฎๅคๅปบ่ฎฎ่ฏทๅ่ๆฅๅ่ฏฆๆ
ใ
```
---
## ๐ค ๅฃฐ้ณๅ
้ (Voice Cloning)
ๅบไบ **Coqui XTTS v2** ็้ถๆ ทๆฌๅฃฐ้ณๅ
้๏ผไป
้ 6-30 ็งๅ่้ณ้ขๅณๅฏ็ๆ้ซๅบฆ็ธไผผ็่ฏญ้ณใ
### ไฝฟ็จๆนๆณ
```python
from agent_voice_hub import VoiceCloner
cloner = VoiceCloner(model="xtts_v2")
# ไปๅ่้ณ้ขๅ
้ๅฃฐ้ณ
cloner.clone(
reference_audio="speaker_sample.wav", # 6-30 ็งๅ่้ณ้ข
text="่ฟๆฏ็จๅ
้ๅฃฐ้ณ่ฏดๅบ็ๅ
ๅฎน", # ่ฆๅๆ็ๆๆฌ
output="cloned_output.wav", # ่พๅบๆไปถ
language="zh" # ่ฏญ่จ
)
```
### CLI ไฝฟ็จ
```bash
# ๅบๆฌๅ
้
agent-voice clone \
--ref-audio ./samples/speaker.wav \
--text "ไฝ ๅฅฝ๏ผ่ฟๆฏๆๅ
้็ๅฃฐ้ณ" \
--output cloned.wav
# ๆๅฎ่ฏญ่จๅๆธฉๅบฆๅๆฐ
agent-voice clone \
--ref-audio ./samples/speaker.wav \
--text "Hello, this is my cloned voice" \
--language en \
--temperature 0.7 \
--output cloned_en.wav
```
### ๅ่้ณ้ข่ฆๆฑ
| ๅๆฐ | ่ฆๆฑ |
|------|------|
| ๆถ้ฟ | 6-30 ็ง๏ผๆจ่ 10-15 ็ง๏ผ |
| ๆ ผๅผ | WAV / MP3 / FLAC / OGG |
| ้ๆ ท็ | โฅ 16kHz |
| ่ดจ้ | ๆธ
ๆฐๆ ๅช้ณ๏ผๅไบบ่ฏด่ฏ |
| ็ฏๅข | ๆจ่ GPU๏ผCUDA๏ผ๏ผCPU ไบฆๅฏ๏ผ่พๆ
ข๏ผ |
---
## โ๏ธ ้
็ฝฎ (Configuration)
้
็ฝฎๆไปถ้ป่ฎค่ทฏๅพ๏ผ`~/.agent-voice-hub/config.yaml`
```yaml
# config.yaml
tts:
engine: edge-tts # edge-tts | xtts
default_voice: zh-CN-XiaoxiaoNeural
output_format: mp3 # mp3 | wav | ogg
speed: 1.0 # ่ฏญ้ (0.5 - 2.0)
pitch: 0 # ้ณ่ฐ (-50Hz - +50Hz)
stt:
engine: faster-whisper
model_size: large-v3 # tiny | base | small | medium | large-v3
device: auto # auto | cpu | cuda
compute_type: float16 # float16 | int8 | float32
auto_detect: true # ่ชๅจๆฃๆต่ฏญ็ง
voice_clone:
model: xtts_v2
temperature: 0.65
top_p: 0.85
repetition_penalty: 5.0
security:
severity_threshold: low # low | medium | high | critical
exclude_patterns:
- "**/test/**"
- "**/node_modules/**"
- "**/.venv/**"
languages:
- python
- javascript
- shell
server:
host: 0.0.0.0
port: 8080
cors_origins: ["*"]
logging:
level: INFO # DEBUG | INFO | WARNING | ERROR
file: ~/.agent-voice-hub/logs/app.log
```
---
## ๐ต ๆฏๆ็้ณ่ฒ (Supported Voices)
Edge-TTS ๆไพ **300+ ้ณ่ฒ**๏ผไปฅไธไธบๅธธ็จไธญๆ้ณ่ฒ๏ผ
| ้ณ่ฒ ID | ่ฏญ่จ | ๆงๅซ | ่ฏดๆ |
|---------|------|:---:|------|
| **zh-CN-XiaoxiaoNeural** | ไธญๆ๏ผๆฎ้่ฏ๏ผ | ๅฅณ | โญ ้ป่ฎคๆจ่๏ผ่ช็ถๆต็
|
| **zh-CN-YunxiNeural** | ไธญๆ๏ผๆฎ้่ฏ๏ผ | ็ท | ้ณๅ
ๅฐๅนด้ณ |
| **zh-CN-YunjianNeural** | ไธญๆ๏ผๆฎ้่ฏ๏ผ | ็ท | ๆฒ็จณๆฐ้ปๆญๆฅ |
| **zh-CN-XiaoyiNeural** | ไธญๆ๏ผๆฎ้่ฏ๏ผ | ๅฅณ | ็็พๅฎขๆ้ณ |
| **zh-CN-YunyangNeural** | ไธญๆ๏ผๆฎ้่ฏ๏ผ | ็ท | ไธไธไธปๆ้ณ |
| **zh-CN-XiaochenNeural** | ไธญๆ๏ผๆฎ้่ฏ๏ผ | ๅฅณ | ๆธฉๆ็ฅๆง |
| **zh-CN-XiaohanNeural** | ไธญๆ๏ผๆฎ้่ฏ๏ผ | ๅฅณ | ็ซฏๅบๅคงๆฐ |
| **z
... (truncated)
voice
Comments
Sign in to leave a comment