Soulx Digitalhuman

Name: Soulx Digitalhuman
Rating: 3.5 (1 reviews)
Author: fuleinist

By fuleinist 👁 105 views ▲ 0 votes

SoulX-FlashTalk digital human plugin for OpenClaw and Claude Code -- audio-driven avatar with sub-second latency

GitHub

Install

pip install -r

README

# SoulX-DigitalHuman-Plugin

Audio-driven digital human plugin for [OpenClaw](https://github.com/openclaw/openclaw) and [Claude Code](https://docs.anthropic.com/en/docs/claude-code) — powered by [SoulX-FlashTalk](https://github.com/Soul-AILab/SoulX-FlashTalk).

## Overview

SoulX-FlashTalk is a 14B parameter model achieving **0.87s first-frame latency** and **32 FPS real-time throughput** on 8×H800. It generates an animated avatar driven by audio — lip-sync talking head from any speech input.

This plugin integrates that capability into your AI agent workflow:

- **Claude Code hook** — avatar video appears when a task completes
- **OpenClaw skill** — agent-side avatar rendering hook
- **Claude wrapper** (`soulx-claude.py`) — stream avatar alongside Claude's output
- **Multi-engine TTS** — MiniMax / MeloTTS / Edge-TTS feeding audio into SoulX

## Key Features

| Feature | Description |
|---|---|
| Sub-second avatar | 0.87s first-frame latency on 8×H800 |
| Audio-driven lip-sync | Any TTS output feeds directly into SoulX |
| Stream + Final modes | Real-time avatar or single render on completion |
| Multi-GPU support | 8×H800 for full real-time; single GPU with CPU offload |
| Fallback chain | SoulX → SoulX-FlashHead → static avatar → TTS only |
| OpenClaw + Claude | Same hook pattern as [javis-tts-plugin](https://github.com/Nasal/javis-tts-plugin) |

## Quick Start

```bash
# Clone
git clone https://github.com/fuleinist/soulx-digitalhuman-plugin.git
cd soulx-digitalhuman-plugin

# Install dependencies
pip install -r requirements.txt

# Download models (~80GB disk)
bash models/download_models.sh

# Install Claude Code hook
python claude_hooks/install.py

# Run with avatar
python examples/soulx-claude.py "write a hello world function"
```

## Documentation

See [PLAN.md](PLAN.md) for full architecture, module breakdown, and integration guide.

## Requirements

- **GPU**: 40–64GB VRAM for single-GPU (with CPU offload) or 8×H800 for real-time
- **Python**: 3.10+
- **Audio libs**: ffmpeg
- **Models**: SoulX-FlashTalk-14B + chinese-wav2vec2-base (via HuggingFace)

## License

MIT

tools