Voice
Plugin Voximplant
VoxImplant telephony plugin for OpenClaw — AI voice calls via VoxEngine + OpenAI Realtime API
Install
openclaw plugins install ./path/to/openclaw-plugin-voximplant
Configuration Example
{
"plugins": {
"allow": ["voximplant"],
"entries": {
"voximplant": {
"enabled": true,
"config": {
"apiKey": "your-voximplant-api-key",
"accountId": "12345678",
"applicationId": "99887766",
"scenarioId": "11223344",
"ruleId": "55667788",
"callerId": "+74951234567",
"maxConcurrentCalls": 10,
"maxDurationSeconds": 300
}
}
}
}
}
README
# openclaw-plugin-voximplant
VoxImplant telephony plugin for [OpenClaw](https://openclaw.ai) -- AI-powered voice calls via VoxEngine + OpenAI Realtime API.
## Why VoxImplant
VoxImplant is a cloud telephony platform that allows purchasing **Russian phone numbers** (+7) through a fully legal, transparent process -- with proper documentation and business registration. This makes it possible to run OpenClaw AI voice agents on legitimate Russian PSTN numbers, which is not available through most international telephony providers (Twilio, Telnyx, Plivo, etc.).
Key advantages:
- **Russian phone numbers (+7)**: legal purchase with proper KYC/documentation
- **No geo-restrictions**: works from any location, including behind NAT/VPN
- **Sub-second latency**: OpenAI Realtime API with native G.711 mu-law -- no codec conversion
- **No tunnel required**: WS client mode -- the gateway connects to VoxEngine, not the other way around
- **OpenClaw integration**: agent tools, memory, skills -- all available during the call
## Architecture
```
Subscriber (PSTN / SIP)
|
VoxImplant Cloud (VoxEngine thin bridge scenario)
| media_session_access_secure_url (WSS)
|
OpenClaw Gateway <-- WS client connects to VoxEngine session
|
+--[realtime mode]--> OpenAI Realtime API (gpt-realtime)
| native audio-in / audio-out
| G.711 mu-law 8 kHz
|
+--[legacy mode]----> STT -> LLM -> TTS pipeline
via OpenClaw runtime APIs
```
**WS client mode**: the gateway connects *to* VoxEngine's session URL. No public URL, no tunnel, no port forwarding required.
## Prerequisites
- [OpenClaw](https://openclaw.ai) >= 2026.4.8
- Node.js >= 18
- VoxImplant account with:
- An Application
- A deployed Scenario (use `scenario/voxengine-bridge.js`)
- A Rule linking the scenario to a phone number
- An API key (from account settings)
- A purchased phone number (used as Caller ID)
- OpenAI API key (for Realtime mode)
## Installation
```bash
# From local directory
openclaw plugins install ./path/to/openclaw-plugin-voximplant
# Then restart the gateway
openclaw gateway restart
```
Or place manually in `~/.openclaw/extensions/voximplant/` and run `npm install` inside the directory.
## VoxImplant Dashboard Setup
### 1. Create an Application
In the [VoxImplant dashboard](https://manage.voximplant.com/), go to **Applications** and create a new application (e.g. `openclaw-voice`).
### 2. Deploy the Bridge Scenario
Go to **Scenarios** within your application. Create a new scenario and paste the full contents of [`scenario/voxengine-bridge.js`](scenario/voxengine-bridge.js).
This is a thin bridge -- it does NOT contain any AI logic. It:
- Allows incoming WebSocket connections from the gateway
- Dials the target PSTN number
- Binds duplex mu-law audio between the WebSocket and the phone call
### 3. Create a Rule
Go to **Routing** within your application. Create a new rule that links your scenario to your phone number pattern. Note the **Rule ID** -- you'll need it for config.
### 4. Get Your Credentials
From your application, note down:
- **Application ID** (visible in the app settings)
- **Scenario ID** (visible in the scenario list)
- **Rule ID** (visible in the routing rules list)
From [account settings](https://manage.voximplant.com/settings), get:
- **Account ID**
- **API Key** (create one if needed)
From **Numbers**, note your purchased phone number (this becomes your **Caller ID**).
## OpenClaw Configuration
Add to your `openclaw.json`:
```json
{
"plugins": {
"allow": ["voximplant"],
"entries": {
"voximplant": {
"enabled": true,
"config": {
"apiKey": "your-voximplant-api-key",
"accountId": "12345678",
"applicationId": "99887766",
"scenarioId": "11223344",
"ruleId": "55667788",
"callerId": "+74951234567",
"maxConcurrentCalls": 10,
"maxDurationSeconds": 300
}
}
}
}
}
```
### Config Reference
| Key | Required | Default | Description |
|-----|----------|---------|-------------|
| `apiKey` | Yes | -- | VoxImplant API key |
| `accountId` | Yes | -- | VoxImplant account ID |
| `applicationId` | Yes | -- | Application ID containing the bridge scenario |
| `scenarioId` | Yes | -- | ID of the deployed bridge scenario |
| `callerId` | Yes | -- | Outbound caller ID (purchased phone number, E.164) |
| `ruleId` | No | -- | Rule ID for call routing |
| `applicationName` | No | -- | Application name (alternative to applicationId) |
| `wsPort` | No | 8085 | Port for the HTTP health/RPC server |
| `inboundEnabled` | No | false | Enable inbound call handling |
| `inboundGreeting` | No | "Hello! How can I help you today?" | Greeting for inbound calls |
| `maxConcurrentCalls` | No | 10 | Maximum concurrent call sessions |
| `maxDurationSeconds` | No | 300 | Auto-hangup after this many seconds |
If required keys are missing, the plugin loads but stays inactive and logs a warning.
## Phone Agent Setup (Realtime Mode)
In Realtime mode, the gateway reads instructions from a dedicated OpenClaw agent workspace.
### 1. Create the agent workspace
```
~/.openclaw/workspace/phone/
AGENTS.md # Agent instructions (system prompt for voice conversations)
TOOLS.md # Tool descriptions available to the agent
```
### 2. Register the agent in `openclaw.json`
```json
{
"agents": {
"list": [
{
"id": "phone-agent",
"name": "Phone Agent",
"workspace": "C:\\Users\\YourUser\\.openclaw\\workspace\\phone"
}
]
}
}
```
### 3. Write `AGENTS.md`
Example for a Russian-language phone assistant:
```markdown
# Phone Agent
You are a voice assistant for phone conversations. Speak Russian.
Be concise and helpful. Keep responses short (1-3 sentences).
Do not use markdown or formatting -- you are speaking, not writing.
```
### 4. Set `OPENAI_API_KEY`
The Realtime API uses your OpenAI API key. Set it in the environment:
```bash
export OPENAI_API_KEY=sk-your-key
```
Or add to `.env` in your OpenClaw home directory.
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `OPENAI_API_KEY` | -- | OpenAI API key (required for Realtime mode) |
| `OPENCLAW_VOXI_MODE` | `realtime` | Operating mode: `realtime` or `legacy` |
| `OPENCLAW_VOXI_AGENT_ID` | `phone-agent` | OpenClaw agent ID for voice conversations |
| `OPENCLAW_VOXI_REALTIME_MODEL` | `gpt-realtime` | OpenAI Realtime model name |
| `OPENCLAW_VOXI_REALTIME_VOICE` | `alloy` | Voice for Realtime API (alloy, echo, fable, onyx, nova, shimmer) |
| `OPENCLAW_VOXI_TTS_SPEED` | `1` | TTS playback speed, legacy mode |
| `OPENCLAW_VOXI_TTS_STREAMING` | `1` | Enable streaming TTS via OpenAI API (legacy mode) |
| `OPENCLAW_VOXI_VOICE_STREAM_AGENT` | `1` | Enable streaming agent with onBlockReply (legacy mode) |
| `OPENCLAW_VOXI_AGENT_MODEL` | -- | Override agent LLM model (legacy mode) |
| `OPENCLAW_VOXI_AGENT_PROVIDER` | -- | Override agent LLM provider (legacy mode) |
| `OPENCLAW_VOXI_DOWLINK_SILENCE_KEEPALIVE` | `1` | Send silence frames during LLM wait |
| `OPENCLAW_VOXI_DOWLINK_PACE_MS` | -- | Downlink audio pacing interval (ms) |
| `OPENCLAW_VOXI_BARGE_IN_ON_SPEECH_START` | `0` | 1 = barge-in on VAD start, 0 = on transcript |
| `OPENCLAW_VOXI_PCM16_WIRE_ENDIAN` | -- | PCM16 wire byte order (`le` or `be`) |
See `.env.example` for a template.
## Operating Modes
### Realtime Mode (default)
Uses the OpenAI Realtime API (`gpt-realtime`) for native speech-to-speech:
- Audio flows directly between the caller and the Realtime API
- G.711 mu-law 8 kHz -- no codec conversion needed
- Server-side VAD for turn detection
- Sub-second latency for voice responses
- Function calling supported via Realtime API events
Set `OPENCLAW_VOXI_MODE=realtime` (or leave unset -- it's the default).
### Legacy Mode
Uses OpenClaw's STT -> LLM -> TTS pipeline:
- Audio is transcribed via STT, sent to the LLM agent, and the response is synthesized via TTS
- Higher latency (5-15s typical) but works with any LLM model
- Streaming agent and streaming TTS reduce latency to 2-5s
Set `OPENCLAW_VOXI_MODE=legacy` to use this mode.
## Agent Tool
The plugin registers a `voximplant_call` tool that OpenClaw agents can invoke.
### Initiate a call
```json
{
"action": "initiate_call",
"to": "+74951234567",
"message": "Hello! I'm calling from OpenClaw."
}
```
### End a call
```json
{
"action": "end_call",
"callId": "uuid-..."
}
```
### Get call status
```json
{
"action": "get_status",
"callId": "uuid-..."
}
```
## Gateway RPC Methods
- `voximplant.status` -- returns plugin status, active calls, config
- `voximplant.call` -- initiate an outbound call (params: `to`, `message`)
## WebSocket Protocol
The plugin uses VoxEngine's native media stream protocol: JSON text frames with base64 mu-law audio.
| Event | Direction | Description |
|-------|-----------|-------------|
| `start` | VoxEngine -> Plugin | Session start with mediaFormat and customParameters |
| `media` | Bidirectional | Audio chunk: base64 mu-law payload |
| `stop` | VoxEngine -> Plugin | Session end with mediaInfo |
Control messages (Plugin -> VoxEngine):
| Op | Description |
|----|-------------|
| `hello_ack` | Confirm session setup |
| `call_end` | AI-side hangup |
| `barge_in` | User interrupted |
## Project Structure
```
openclaw-plugin-voximplant/
index.js # Plugin entry point (tools, services, gateway methods)
openclaw.plugin.json # Manifest + config schema
package.json
src/
gateway.js # WS client + audio bridge + Realtime/legacy pipeline
voximplant-api.js # VoxImplant HTTP API client (StartScenarios)
call-state.js # Call lifecycle management
file-log.js # Dual logger (console + JSONL file)
scenario/
voxe
... (truncated)
voice
Comments
Sign in to leave a comment