Tools
Soulx Digitalhuman
SoulX-FlashTalk digital human plugin for OpenClaw and Claude Code -- audio-driven avatar with sub-second latency
Install
pip install -r
README
# SoulX-DigitalHuman-Plugin
Audio-driven digital human plugin for [OpenClaw](https://github.com/openclaw/openclaw) and [Claude Code](https://docs.anthropic.com/en/docs/claude-code) — powered by [SoulX-FlashTalk](https://github.com/Soul-AILab/SoulX-FlashTalk).
## Overview
SoulX-FlashTalk is a 14B parameter model achieving **0.87s first-frame latency** and **32 FPS real-time throughput** on 8×H800. It generates an animated avatar driven by audio — lip-sync talking head from any speech input.
This plugin integrates that capability into your AI agent workflow:
- **Claude Code hook** — avatar video appears when a task completes
- **OpenClaw skill** — agent-side avatar rendering hook
- **Claude wrapper** (`soulx-claude.py`) — stream avatar alongside Claude's output
- **Multi-engine TTS** — MiniMax / MeloTTS / Edge-TTS feeding audio into SoulX
## Key Features
| Feature | Description |
|---|---|
| Sub-second avatar | 0.87s first-frame latency on 8×H800 |
| Audio-driven lip-sync | Any TTS output feeds directly into SoulX |
| Stream + Final modes | Real-time avatar or single render on completion |
| Multi-GPU support | 8×H800 for full real-time; single GPU with CPU offload |
| Fallback chain | SoulX → SoulX-FlashHead → static avatar → TTS only |
| OpenClaw + Claude | Same hook pattern as [javis-tts-plugin](https://github.com/Nasal/javis-tts-plugin) |
## Quick Start
```bash
# Clone
git clone https://github.com/fuleinist/soulx-digitalhuman-plugin.git
cd soulx-digitalhuman-plugin
# Install dependencies
pip install -r requirements.txt
# Download models (~80GB disk)
bash models/download_models.sh
# Install Claude Code hook
python claude_hooks/install.py
# Run with avatar
python examples/soulx-claude.py "write a hello world function"
```
## Documentation
See [PLAN.md](PLAN.md) for full architecture, module breakdown, and integration guide.
## Requirements
- **GPU**: 40–64GB VRAM for single-GPU (with CPU offload) or 8×H800 for real-time
- **Python**: 3.10+
- **Audio libs**: ffmpeg
- **Models**: SoulX-FlashTalk-14B + chinese-wav2vec2-base (via HuggingFace)
## License
MIT
tools
Comments
Sign in to leave a comment