Plugin Voximplant

Name: Plugin Voximplant
Rating: 3.5 (1 reviews)
Author: Bitgrohot

By Bitgrohot 👁 133 views ▲ 0 votes

VoxImplant telephony plugin for OpenClaw — AI voice calls via VoxEngine + OpenAI Realtime API

GitHub

Install

openclaw plugins install ./path/to/openclaw-plugin-voximplant

Configuration Example

{
  "plugins": {
    "allow": ["voximplant"],
    "entries": {
      "voximplant": {
        "enabled": true,
        "config": {
          "apiKey": "your-voximplant-api-key",
          "accountId": "12345678",
          "applicationId": "99887766",
          "scenarioId": "11223344",
          "ruleId": "55667788",
          "callerId": "+74951234567",
          "maxConcurrentCalls": 10,
          "maxDurationSeconds": 300
        }
      }
    }
  }
}

README

# openclaw-plugin-voximplant

VoxImplant telephony plugin for [OpenClaw](https://openclaw.ai) -- AI-powered voice calls via VoxEngine + OpenAI Realtime API.

## Why VoxImplant

VoxImplant is a cloud telephony platform that allows purchasing **Russian phone numbers** (+7) through a fully legal, transparent process -- with proper documentation and business registration. This makes it possible to run OpenClaw AI voice agents on legitimate Russian PSTN numbers, which is not available through most international telephony providers (Twilio, Telnyx, Plivo, etc.).

Key advantages:
- **Russian phone numbers (+7)**: legal purchase with proper KYC/documentation
- **No geo-restrictions**: works from any location, including behind NAT/VPN
- **Sub-second latency**: OpenAI Realtime API with native G.711 mu-law -- no codec conversion
- **No tunnel required**: WS client mode -- the gateway connects to VoxEngine, not the other way around
- **OpenClaw integration**: agent tools, memory, skills -- all available during the call

## Architecture

```
Subscriber (PSTN / SIP)
       |
VoxImplant Cloud (VoxEngine thin bridge scenario)
       |  media_session_access_secure_url (WSS)
       |
OpenClaw Gateway  <--  WS client connects to VoxEngine session
       |
       +--[realtime mode]--> OpenAI Realtime API (gpt-realtime)
       |                     native audio-in / audio-out
       |                     G.711 mu-law 8 kHz
       |
       +--[legacy mode]----> STT -> LLM -> TTS pipeline
                             via OpenClaw runtime APIs
```

**WS client mode**: the gateway connects *to* VoxEngine's session URL. No public URL, no tunnel, no port forwarding required.

## Prerequisites

- [OpenClaw](https://openclaw.ai) >= 2026.4.8
- Node.js >= 18
- VoxImplant account with:
  - An Application
  - A deployed Scenario (use `scenario/voxengine-bridge.js`)
  - A Rule linking the scenario to a phone number
  - An API key (from account settings)
  - A purchased phone number (used as Caller ID)
- OpenAI API key (for Realtime mode)

## Installation

```bash
# From local directory
openclaw plugins install ./path/to/openclaw-plugin-voximplant

# Then restart the gateway
openclaw gateway restart
```

Or place manually in `~/.openclaw/extensions/voximplant/` and run `npm install` inside the directory.

## VoxImplant Dashboard Setup

### 1. Create an Application

In the [VoxImplant dashboard](https://manage.voximplant.com/), go to **Applications** and create a new application (e.g. `openclaw-voice`).

### 2. Deploy the Bridge Scenario

Go to **Scenarios** within your application. Create a new scenario and paste the full contents of [`scenario/voxengine-bridge.js`](scenario/voxengine-bridge.js).

This is a thin bridge -- it does NOT contain any AI logic. It:
- Allows incoming WebSocket connections from the gateway
- Dials the target PSTN number
- Binds duplex mu-law audio between the WebSocket and the phone call

### 3. Create a Rule

Go to **Routing** within your application. Create a new rule that links your scenario to your phone number pattern. Note the **Rule ID** -- you'll need it for config.

### 4. Get Your Credentials

From your application, note down:
- **Application ID** (visible in the app settings)
- **Scenario ID** (visible in the scenario list)
- **Rule ID** (visible in the routing rules list)

From [account settings](https://manage.voximplant.com/settings), get:
- **Account ID**
- **API Key** (create one if needed)

From **Numbers**, note your purchased phone number (this becomes your **Caller ID**).

## OpenClaw Configuration

Add to your `openclaw.json`:

```json
{
  "plugins": {
    "allow": ["voximplant"],
    "entries": {
      "voximplant": {
        "enabled": true,
        "config": {
          "apiKey": "your-voximplant-api-key",
          "accountId": "12345678",
          "applicationId": "99887766",
          "scenarioId": "11223344",
          "ruleId": "55667788",
          "callerId": "+74951234567",
          "maxConcurrentCalls": 10,
          "maxDurationSeconds": 300
        }
      }
    }
  }
}
```

### Config Reference

| Key | Required | Default | Description |
|-----|----------|---------|-------------|
| `apiKey` | Yes | -- | VoxImplant API key |
| `accountId` | Yes | -- | VoxImplant account ID |
| `applicationId` | Yes | -- | Application ID containing the bridge scenario |
| `scenarioId` | Yes | -- | ID of the deployed bridge scenario |
| `callerId` | Yes | -- | Outbound caller ID (purchased phone number, E.164) |
| `ruleId` | No | -- | Rule ID for call routing |
| `applicationName` | No | -- | Application name (alternative to applicationId) |
| `wsPort` | No | 8085 | Port for the HTTP health/RPC server |
| `inboundEnabled` | No | false | Enable inbound call handling |
| `inboundGreeting` | No | "Hello! How can I help you today?" | Greeting for inbound calls |
| `maxConcurrentCalls` | No | 10 | Maximum concurrent call sessions |
| `maxDurationSeconds` | No | 300 | Auto-hangup after this many seconds |

If required keys are missing, the plugin loads but stays inactive and logs a warning.

## Phone Agent Setup (Realtime Mode)

In Realtime mode, the gateway reads instructions from a dedicated OpenClaw agent workspace.

### 1. Create the agent workspace

```
~/.openclaw/workspace/phone/
  AGENTS.md     # Agent instructions (system prompt for voice conversations)
  TOOLS.md      # Tool descriptions available to the agent
```

### 2. Register the agent in `openclaw.json`

```json
{
  "agents": {
    "list": [
      {
        "id": "phone-agent",
        "name": "Phone Agent",
        "workspace": "C:\\Users\\YourUser\\.openclaw\\workspace\\phone"
      }
    ]
  }
}
```

### 3. Write `AGENTS.md`

Example for a Russian-language phone assistant:

```markdown
# Phone Agent

You are a voice assistant for phone conversations. Speak Russian.
Be concise and helpful. Keep responses short (1-3 sentences).
Do not use markdown or formatting -- you are speaking, not writing.
```

### 4. Set `OPENAI_API_KEY`

The Realtime API uses your OpenAI API key. Set it in the environment:

```bash
export OPENAI_API_KEY=sk-your-key
```

Or add to `.env` in your OpenClaw home directory.

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `OPENAI_API_KEY` | -- | OpenAI API key (required for Realtime mode) |
| `OPENCLAW_VOXI_MODE` | `realtime` | Operating mode: `realtime` or `legacy` |
| `OPENCLAW_VOXI_AGENT_ID` | `phone-agent` | OpenClaw agent ID for voice conversations |
| `OPENCLAW_VOXI_REALTIME_MODEL` | `gpt-realtime` | OpenAI Realtime model name |
| `OPENCLAW_VOXI_REALTIME_VOICE` | `alloy` | Voice for Realtime API (alloy, echo, fable, onyx, nova, shimmer) |
| `OPENCLAW_VOXI_TTS_SPEED` | `1` | TTS playback speed, legacy mode |
| `OPENCLAW_VOXI_TTS_STREAMING` | `1` | Enable streaming TTS via OpenAI API (legacy mode) |
| `OPENCLAW_VOXI_VOICE_STREAM_AGENT` | `1` | Enable streaming agent with onBlockReply (legacy mode) |
| `OPENCLAW_VOXI_AGENT_MODEL` | -- | Override agent LLM model (legacy mode) |
| `OPENCLAW_VOXI_AGENT_PROVIDER` | -- | Override agent LLM provider (legacy mode) |
| `OPENCLAW_VOXI_DOWLINK_SILENCE_KEEPALIVE` | `1` | Send silence frames during LLM wait |
| `OPENCLAW_VOXI_DOWLINK_PACE_MS` | -- | Downlink audio pacing interval (ms) |
| `OPENCLAW_VOXI_BARGE_IN_ON_SPEECH_START` | `0` | 1 = barge-in on VAD start, 0 = on transcript |
| `OPENCLAW_VOXI_PCM16_WIRE_ENDIAN` | -- | PCM16 wire byte order (`le` or `be`) |

See `.env.example` for a template.

## Operating Modes

### Realtime Mode (default)

Uses the OpenAI Realtime API (`gpt-realtime`) for native speech-to-speech:

- Audio flows directly between the caller and the Realtime API
- G.711 mu-law 8 kHz -- no codec conversion needed
- Server-side VAD for turn detection
- Sub-second latency for voice responses
- Function calling supported via Realtime API events

Set `OPENCLAW_VOXI_MODE=realtime` (or leave unset -- it's the default).

### Legacy Mode

Uses OpenClaw's STT -> LLM -> TTS pipeline:

- Audio is transcribed via STT, sent to the LLM agent, and the response is synthesized via TTS
- Higher latency (5-15s typical) but works with any LLM model
- Streaming agent and streaming TTS reduce latency to 2-5s

Set `OPENCLAW_VOXI_MODE=legacy` to use this mode.

## Agent Tool

The plugin registers a `voximplant_call` tool that OpenClaw agents can invoke.

### Initiate a call

```json
{
  "action": "initiate_call",
  "to": "+74951234567",
  "message": "Hello! I'm calling from OpenClaw."
}
```

### End a call

```json
{
  "action": "end_call",
  "callId": "uuid-..."
}
```

### Get call status

```json
{
  "action": "get_status",
  "callId": "uuid-..."
}
```

## Gateway RPC Methods

- `voximplant.status` -- returns plugin status, active calls, config
- `voximplant.call` -- initiate an outbound call (params: `to`, `message`)

## WebSocket Protocol

The plugin uses VoxEngine's native media stream protocol: JSON text frames with base64 mu-law audio.

| Event | Direction | Description |
|-------|-----------|-------------|
| `start` | VoxEngine -> Plugin | Session start with mediaFormat and customParameters |
| `media` | Bidirectional | Audio chunk: base64 mu-law payload |
| `stop` | VoxEngine -> Plugin | Session end with mediaInfo |

Control messages (Plugin -> VoxEngine):

| Op | Description |
|----|-------------|
| `hello_ack` | Confirm session setup |
| `call_end` | AI-side hangup |
| `barge_in` | User interrupted |

## Project Structure

```
openclaw-plugin-voximplant/
  index.js                    # Plugin entry point (tools, services, gateway methods)
  openclaw.plugin.json        # Manifest + config schema
  package.json
  src/
    gateway.js                # WS client + audio bridge + Realtime/legacy pipeline
    voximplant-api.js          # VoxImplant HTTP API client (StartScenarios)
    call-state.js              # Call lifecycle management
    file-log.js                # Dual logger (console + JSONL file)
  scenario/
    voxe

... (truncated)

voice