Tools
Cloudflare Media Ai
OpenClaw plugin for Cloudflare AI Worker models use
Install
npm install
```
README
# @kesor/openclaw-cloudflare-plugin
OpenClaw plugin for **Cloudflare Workers AI** - provides media understanding capabilities (speech-to-text, image description, video description, text-to-speech) using Cloudflare's AI models.
## Features
- **Speech-to-Text (Audio Transcription)**: Transcribe audio using Cloudflare Whisper models
- **Image Understanding**: Describe images using vision models (Llama Vision, Llava)
- **Video Understanding**: Describe video frames using vision models
- **Text-to-Speech (TTS)**: Generate speech using Deepgram Aura models
- **Automatic Integration**: Works seamlessly with OpenClaw's media-understanding pipeline
- **Multiple Models**: Configurable models for audio, vision, and TTS tasks
## Requirements
- OpenClaw gateway running
- Cloudflare account with Workers AI enabled
- Cloudflare API token with AI read permissions
### Getting Cloudflare Credentials
1. **Account ID**: Found in your Cloudflare Dashboard URL (e.g., `https://dash.cloudflare.com/your-account-id`)
2. **API Token**: Create one at https://dash.cloudflare.com/profile/api-tokens
- Click "Create Token"
- Use the "Edit Workers AI" template or create a custom token with `AI:Read` permission
- Copy the token (it won't be shown again)
## Installation
**Note**: The `openclaw plugins install` command does not support remote URLs. Use one of the options below.
### Option 1: From npm (recommended)
```bash
openclaw plugins install @kesor/openclaw-cloudflare-plugin
```
### Option 2: From a local path
```bash
# Clone the plugin repo or use local path
openclaw plugins install /path/to/plugin-cloudflare
```
Or use `--link` to create a symlink instead of copying:
```bash
openclaw plugins install /path/to/plugin-cloudflare --link
```
### Option 3: From archive file
```bash
# Build a tarball and install
cd /path/to/plugin-cloudflare
npm pack
# Produces: openclaw-cloudflare-plugin-0.1.0.tgz
openclaw plugins install ./openclaw-cloudflare-plugin-0.1.0.tgz
```
### Option 4: Built into OpenClaw (custom build)
If you're compiling OpenClaw yourself:
```bash
# From your OpenClaw source directory
cp -r /path/to/plugin-cloudflare extensions/cloudflare-ai
# Rebuild OpenClaw
pnpm build
```
After installation, configure the plugin and restart the Gateway.
```
Then configure and restart the Gateway.
## Configuration
You can configure the plugin via OpenClaw config **or** environment variables.
### Option 1: OpenClaw Config
Add to your OpenClaw config under `plugins.entries.cloudflare-ai.config`:
```json5
{
"apiToken": "your_cloudflare_api_token",
"accountId": "your_cloudflare_account_id",
"audioModel": "@cf/openai/whisper-large-v3-turbo",
"imageModel": "@cf/meta/llama-3.2-11b-vision-instruct-fp8",
"ttsModel": "@cf/deepgram/aura-2-en",
"defaultLanguage": "en",
"timeout": 60000
}
```
### Option 2: Environment Variables
The plugin also supports environment variables as an alternative to config:
```bash
# Required
export CLOUDFLARE_API_TOKEN="your_api_token"
export CLOUDFLARE_ACCOUNT_ID="your_account_id"
# Optional (defaults shown)
export CLOUDFLARE_AI_AUDIO_MODEL="@cf/openai/whisper-large-v3-turbo"
export CLOUDFLARE_AI_IMAGE_MODEL="@cf/meta/llama-3.2-11b-vision-instruct-fp8"
export CLOUDFLARE_AI_TTS_MODEL="@cf/deepgram/aura-2-en"
```
Environment variables take precedence over config values.
### Configuration Options
| Option | Type | Default | Environment Variable | Description |
|--------|------|---------|---------------------|-------------|
| `apiToken` | string | (required) | `CLOUDFLARE_API_TOKEN` | Cloudflare API token with AI read permissions |
| `accountId` | string | (required) | `CLOUDFLARE_ACCOUNT_ID` | Cloudflare account ID |
| `audioModel` | string | `@cf/openai/whisper-large-v3-turbo` | `CLOUDFLARE_AI_AUDIO_MODEL` | Whisper model for STT |
| `imageModel` | string | `@cf/meta/llama-3.2-11b-vision-instruct-fp8` | `CLOUDFLARE_AI_IMAGE_MODEL` | Vision model for image/video |
| `ttsModel` | string | `@cf/deepgram/aura-2-en` | `CLOUDFLARE_AI_TTS_MODEL` | TTS model for speech synthesis |
| `defaultLanguage` | string | `""` (auto-detect) | - | Language code (e.g., "en", "es") |
| `timeout` | number | `60000` | - | Request timeout in milliseconds |
## Available Models
### Speech-to-Text (Audio)
| Model ID | Description | Pricing |
|----------|-------------|---------|
| `@cf/openai/whisper-large-v3-turbo` | Faster large model (recommended) | $0.00051/min |
| `@cf/openai/whisper` | General-purpose Whisper | $0.00045/min |
| `@cf/openai/whisper-tiny-en` | English-only tiny | Free |
### Vision (Image/Video)
| Model ID | Description | Context |
|----------|-------------|---------|
| `@cf/meta/llama-3.2-11b-vision-instruct-fp8` | Llama 3.2 11B Vision (recommended) | 128K |
| `@cf/meta/llama-3.2-90b-vision-instruct-fp8` | Llama 3.2 90B Vision | 128K |
| `@cf/llava-1.5-7b-vision-fp8` | Llava 1.5 7B | 4K |
### Text-to-Speech (TTS)
| Model ID | Description | Pricing |
|----------|-------------|---------|
| `@cf/deepgram/aura-2-en` | Deepgram Aura 2 (English, recommended) | $0.03/1K chars |
| `@cf/deepgram/aura-2-es` | Deepgram Aura 2 (Spanish) | $0.03/1K chars |
| `@cf/deepgram/aura-1` | Deepgram Aura 1 | $0.015/1K chars |
| `@cf/myshellai/melotts` | MeloTTS (multi-lingual) | Free |
**Note**: To use Llama 3.2 Vision models, you must first accept the Meta License and Acceptable Use Policy by sending a request:
```bash
curl https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/meta/llama-3.2-11b-vision-instruct-fp8 \
-X POST \
-H "Authorization: Bearer $API_TOKEN" \
-d '{ "prompt": "agree"}'
```
## Usage
Once configured, the plugin automatically integrates with OpenClaw's media-understanding pipeline. When users send audio, images, or videos:
1. OpenClaw detects the media attachment
2. The media is processed through the configured providers
3. Cloudflare AI Workers is used if configured as the provider
### Setting as Default Provider
To use Cloudflare as your default media provider, configure `tools.media` in your OpenClaw config:
```json5
{
"tools": {
"media": {
"audio": {
"defaultProvider": "cloudflare-ai"
},
"image": {
"defaultProvider": "cloudflare-ai"
},
"video": {
"defaultProvider": "cloudflare-ai"
},
"tts": {
"defaultProvider": "cloudflare-ai"
}
}
}
}
```
### Verifying Installation
After configuration, restart the Gateway and check logs for:
```
[cloudflare-ai] Plugin started
```
You can also verify the plugin is registered by checking `openclaw plugins list`.
### How It Works
The plugin uses OpenClaw's `registerMediaProvider` API to register as a media provider. When media is attached to incoming messages:
```
User sends audio/image/video
โ
โผ
OpenClaw media-understanding pipeline
โ
โผ
Cloudflare AI Workers provider (this plugin)
โ
โผ
Transcription / Description returned to agent
```
## Architecture
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OpenClaw Gateway โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ cloudflare-ai plugin โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ MediaProvider: cloudflare-ai โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ transcribeAudio() โ โ โ
โ โ โ - Receives audio buffer โ โ โ
โ โ โ - Calls Cloudflare Whisper API โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ describeImage() โ โ โ
โ โ โ - Receives image buffer โ โ โ
โ โ โ - Calls Cloudflare Vision API โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ describeVideo() โ โ โ
โ โ โ - Receives video frame buffer โ โ โ
โ โ โ - Calls Cloudflare Vision API โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ textToSpeech() โ โ โ
โ โ โ - Receives text โ โ โ
โ โ โ - Calls Cloudflare TTS API โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Cloudflare Workers AI โ
โ โข Whisper (STT) โ
โ โข Llama Vision โ
โ โข Deepgram Aura (TTS) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
## Development
### Prerequisites
- Node.js 22+
- pnpm
### Setup
```bash
cd plugin-cloudflare
pnpm install
```
### Running Tests
```bash
# From OpenClaw workspace
pnpm vitest run --config vitest.extensions.config.ts extensions/cloudflare-ai/src/index.test.ts
```
### Type Checking
```bash
pnpm tsc --noEmit
```
### Building
```bash
pnpm build
```
## Troubleshooting
### Plugin Not Loading
Check the Gateway logs for errors:
```bash
openclaw doctor
```
### API Errors
- Verify your `accountId` is correct (not the zone ID)
- Ensure the API token has AI:Read permissions
- Check Cloudflare Workers AI is active on your account (free tier works)
### Model Not Available
Some models require acceptance of terms:
- Llama 3.2 Vision: Accept Meta License via API (see above)
- Check [Cloudflare Workers AI Models](https://developers.cloudflare.com/workers-ai/models/) for availability
## License
MIT
tools
Comments
Sign in to leave a comment