Voice
Rtc
OpenClaw WebRTC voice communication plugin + Go RTC Node (pion/webrtc)
README
# OpenClaw RTC Plugin
WebRTC voice communication plugin for [OpenClaw](https://github.com/nicepkg/openclaw). Enables browser-based real-time voice calls through the OpenClaw Gateway.
## Architecture
```
Browser (WebUI) OpenClaw Gateway RTC Node (Go)
ββββββββββββββββ βββββββββββββββββββ ββββββββββββββββ
β getUserMedia βββββ WS βββββββββΆβ Plugin (TS) βββ WS βββββββΆβ pion/webrtc β
β RTCPeer β rtc.call.start β Orchestrator β node.invoke β PeerManager β
β ββββ answer+ICE βββ Call routing ββββ result βββ Echo/Loopbackβ
β Audio I/O βββββ RTP/DTLS βββββββββββββββββββββββββββββββββββΆβ Audio pipe β
ββββββββββββββββ (direct) βββββββββββββββββββ ββββββββββββββββ
```
**Three components:**
1. **Plugin** (`plugin/`) β TypeScript extension loaded by OpenClaw Gateway. Registers `rtc.call.*` gateway methods, manages call orchestration, and routes commands to the RTC Node via `node.invoke`.
2. **RTC Node** (`rtc-node/`) β Standalone Go program using [pion/webrtc](https://github.com/pion/webrtc). Connects to Gateway as `role: "node"` with Ed25519 device identity. Handles WebRTC PeerConnection lifecycle: SDP negotiation, ICE exchange, and audio processing.
3. **WebUI** (`plugin/ui/`) β Self-contained HTML/JS frontend. Captures microphone audio, establishes WebRTC connection, displays call state.
## Directory Structure
```
βββ plugin/ # OpenClaw Gateway plugin (TypeScript)
β βββ index.ts # Plugin entry β registers gateway methods
β βββ package.json # Plugin dependencies (werift, ws, zod)
β βββ openclaw.plugin.json # Plugin manifest and config schema
β βββ src/ # Orchestration layer
β β βββ config.ts # Configuration parsing and validation
β β βββ orchestrator.ts # Call lifecycle management
β β βββ store.ts # State store
β β βββ types.ts # Shared type definitions
β βββ node/ # TypeScript RTC Node (legacy, replaced by Go)
β β βββ index.ts # Node entry point
β β βββ gateway-client.ts # Gateway WS client + Ed25519 auth
β β βββ peer-manager.ts # werift PeerConnection manager
β β βββ audio-pipeline.ts # Audio processing pipeline
β β βββ volc-client.ts # Volcano Engine TTS integration
β βββ ui/ # Browser WebUI
β βββ index.html # Single-page app entry
β βββ app.ts / app.js # App logic (WS connection, call UI)
β βββ rtc-peer.ts / .js # WebRTC peer wrapper
β βββ styles.css # Styling
β
βββ rtc-node/ # Go RTC Node (pion/webrtc)
β βββ go.mod / go.sum # Go module (openclaw-rtc-node)
β βββ main.go # CLI entry (--gateway, --token, --loopback)
β βββ gateway.go # Gateway WS client, auth handshake, message dispatch
β βββ identity.go # Ed25519 device identity (generate, persist, sign)
β βββ peer.go # PeerManager: PeerConnection, SDP, ICE, echo/loopback
β βββ loopback.go # Audio loopback: immediate echo + delayed replay
β
βββ docs/ # Development documentation
βββ DEVLOG.md # Detailed development log (5 phases)
βββ go-rtc-node-plan.md # Go rewrite technical design
```
## Quick Start
### 1. Start Gateway
```bash
# Ensure OpenClaw gateway is running with the plugin loaded
openclaw gateway run --port 18789
```
### 2. Build & Run RTC Node (Go)
```bash
cd rtc-node
go build -o rtc-node .
./rtc-node \
--gateway ws://127.0.0.1:18789 \
--token YOUR_GATEWAY_TOKEN \
--no-loopback # immediate echo mode
```
**CLI flags:**
| Flag | Default | Description |
|------|---------|-------------|
| `--gateway` | `ws://127.0.0.1:18789` | Gateway WebSocket URL |
| `--token` | (empty) | Gateway auth token |
| `--loopback N` | `5` | Record N seconds then loop playback |
| `--no-loopback` | false | Immediate echo (loopback=0) |
**Environment variables:** `OPENCLAW_GATEWAY_URL`, `OPENCLAW_GATEWAY_TOKEN`
### 3. Open WebUI
```bash
cd plugin/ui
python3 -m http.server 8766
# Open in browser:
# http://localhost:8766/#gateway=ws://127.0.0.1:18789&token=YOUR_TOKEN
```
Click the call button. You should hear your voice echoed back.
## Signaling Flow
```
Browser Gateway RTC Node
β β β
βββ rtc.call.start βββββββΆβ β
β (SDP offer) βββ node.invoke ββββββββββΆβ
β β rtc.call.accept β
β β (offer SDP) β
β β βββ SetRemoteDescription
β β βββ CreateAnswer
β β βββ GatheringComplete
β ββββ invoke.result ββββββββ
ββββ answer + candidates ββ (answer SDP + β
β β ICE candidates) β
βββ rtc.call.ice βββββββββΆβββ node.invoke ββββββββββΆβββ AddICECandidate
β β β
ββββββββββ RTP/DTLS (direct P2P) ββββββββββββββββββΆβ
β β β
βββ rtc.call.hangup ββββββΆβββ node.invoke ββββββββββΆβββ pc.Close()
```
## Key Technical Decisions
### DTLS Role: Server (Passive)
The most critical fix. pion/webrtc as answerer defaults to `DTLSRoleClient` (active), meaning it sends `ClientHello` immediately after ICE connects β before the browser has received the SDP answer. This causes intermittent (~80%) DTLS handshake failures.
**Fix:** `SettingEngine.SetAnsweringDTLSRole(DTLSRoleServer)` makes pion passive. The browser initiates DTLS only after receiving the answer, guaranteeing correct timing.
See [docs/DEVLOG.md β Phase 5, Problem 2](docs/DEVLOG.md) for the full root cause analysis.
### Default MediaEngine (No Custom Codec Registration)
Using `webrtc.NewPeerConnection()` with pion's default MediaEngine instead of custom codec registration. The default properly registers Opus with `Channels: 2` and `SDPFmtpLine: "minptime=10;useinbandfec=1"`, matching browser expectations exactly.
### AddTrack Before SetRemoteDescription
Following pion's [reflect example](https://github.com/pion/webrtc/tree/master/examples/reflect): create the output track and call `AddTrack` before `SetRemoteDescription`. This ensures the answer SDP includes a send direction for the audio track.
## Audio Modes
### Immediate Echo (`--no-loopback`)
Reads RTP packets from the browser's audio track and writes them back immediately. You hear your own voice with ~20ms round-trip delay.
### Delayed Loopback (`--loopback N`)
1. First N seconds: echo audio back while recording all RTP packets to a buffer
2. After N seconds: stop echoing, continuously replay the buffered packets (20ms per packet)
## Development
### Plugin Development
The plugin requires OpenClaw's plugin infrastructure. Install it as an extension:
```bash
# Symlink into OpenClaw extensions directory
ln -sfn /path/to/openclaw-rtc-plugin/plugin ~/.openclaw/extensions/webrtc
# Or configure in openclaw.json
{
"plugins": {
"allow": ["webrtc"],
"load": { "paths": ["/path/to/openclaw-rtc-plugin/plugin"] },
"entries": { "webrtc": { "enabled": true } }
}
}
```
### Go RTC Node Development
```bash
cd rtc-node
go build -o rtc-node .
# For local pion development, add replace directive in go.mod:
# replace github.com/pion/webrtc/v4 => /path/to/pion/webrtc
```
### Dependencies
**Go RTC Node:**
- [pion/webrtc v4](https://github.com/pion/webrtc) β WebRTC implementation
- [gorilla/websocket](https://github.com/gorilla/websocket) β WebSocket client
- [google/uuid](https://github.com/google/uuid) β UUID generation
- Go stdlib `crypto/ed25519` β Device identity
**Plugin (TypeScript):**
- [werift](https://github.com/nicepkg/werift) β WebRTC (legacy TS node, being replaced)
- ws β WebSocket
- zod β Schema validation
## Documentation
- [docs/DEVLOG.md](docs/DEVLOG.md) β Detailed development log covering all 5 phases (plugin loading, gateway auth, signaling, WebUI, Go rewrite)
- [docs/go-rtc-node-plan.md](docs/go-rtc-node-plan.md) β Technical design for the Go RTC Node rewrite
## Status
- [x] Gateway plugin: call orchestration, method routing
- [x] Go RTC Node: Gateway auth, SDP negotiation, ICE exchange
- [x] Immediate echo mode (verified stable)
- [x] DTLS reliability fix (DTLSRoleServer)
- [ ] Delayed loopback mode (implemented, needs testing)
- [ ] Volcano Engine TTS integration (Go)
- [ ] Audio file playback
- [ ] TURN server support for NAT traversal
## License
MIT
voice
Comments
Sign in to leave a comment