← Back to Plugins
Integration

Crawl4ai

escapingcarrot By escapingcarrot ⭐ 1 stars 👁 17 views ▲ 0 votes

OpenClaw plugin that packages Crawl4AI self-host as local MCP tools with proxy-aware Docker and release verification.

GitHub

Install

npm install
npm

README

# OpenClaw Crawl4AI Plugin

`crawl4ai` packages a local [Crawl4AI](https://github.com/unclecode/crawl4ai) self-host deployment as an OpenClaw plugin without replacing `web_fetch`.

This repository is the standalone source for the plugin. It is meant to be cloned, packed into a `.tgz`, and installed into OpenClaw as a trusted local plugin.

## What You Get

- `openclaw crawl4ai up`
- `openclaw crawl4ai down`
- `openclaw crawl4ai status`
- `openclaw crawl4ai doctor`
- Automatic `mcp.servers.crawl4ai` wiring through a local `stdio` bridge
- Proxy-first Docker defaults for WSL/Linux environments in mainland China

## Supported Scope

- WSL/Linux
- Docker on `PATH`
- Host proxy available for non-mainland-China internet access
- OpenClaw 2026.4.15 or newer recommended

This plugin does not modify the OpenClaw installation tree and does not replace `web_fetch`.

## Quick Start

From the repository root, the one-command installer is:

```bash
bash ./scripts/install-and-verify.sh
```

That command:

- packs the current repository into a fresh `.tgz`
- installs it into OpenClaw as a trusted local plugin
- runs `openclaw crawl4ai up`
- runs `openclaw crawl4ai doctor`

If you already have a built archive, you can point the script at it:

```bash
bash ./scripts/install-and-verify.sh ./openclaw-crawl4ai-plugin-<version>.tgz
```

If you prefer manual steps:

```bash
npm install
npm test
npm pack
openclaw plugins install --dangerously-force-unsafe-install ./openclaw-crawl4ai-plugin-<version>.tgz
openclaw crawl4ai up
openclaw crawl4ai doctor
```

## Why Trusted Install Is Required

OpenClaw's plugin installer scans packaged code for dangerous patterns before install.

This plugin is expected to trigger that scan because it:

- shells out to Docker and OpenClaw CLI helpers
- reads environment variables
- performs live network probes for `status` and `doctor`

The unsafe-install flag is an explicit trust acknowledgement for a local plugin whose job is to manage a container runtime and verify live network access.

## Runtime Defaults

- Crawl4AI image: `unclecode/crawl4ai:latest`
- Container name: `crawl4ai`
- Upstream service: `http://127.0.0.1:11235`
- Bridge mode: local `stdio`
- Default proxy: `http://127.0.0.1:7897`
- Default `NO_PROXY`: `127.0.0.1,localhost`

When the proxy host is loopback, the plugin starts Crawl4AI with Docker host networking so the container can reach the host proxy directly.

## Docs

- [Installation guide](./docs/INSTALL.md)
- [Troubleshooting guide](./docs/TROUBLESHOOTING.md)
- [One-shot AI deployment prompt](./docs/AI_DEPLOY_PROMPT.md)
- [Suggested upstream outreach notes](./docs/UPSTREAM_OUTREACH.md)

## Repository Layout

- `src/`: CLI, bridge, and runtime logic
- `tests/`: unit and integration-style tests
- `docs/`: install, troubleshooting, and deployment assets
- `openclaw.plugin.json`: OpenClaw plugin manifest

## Release Flow

```bash
npm run pack:release
npm run release:verify
```

`pack:release` runs the unit test suite and emits a new `openclaw-crawl4ai-plugin-<version>.tgz`.

`release:verify` is the pre-publish smoke test. It runs:

- `npm ci`
- `npm test`
- `npm pack`
- isolated tgz install into a temporary OpenClaw state
- `openclaw crawl4ai up|status|doctor|down`
- live `md`, `html`, and `crawl` calls through the installed bridge
- cleanup rechecks for the isolated config, extension directory, Docker container, and `/tmp/openclaw-crawl4ai-*`

By default it uses balanced cleanup: it deletes temporary state and runtime residue, but keeps the `.tgz`, a JSON summary, and a command log under `artifacts/release-verification/`.

If you are doing failure forensics and want to preserve the temporary state directory, use:

```bash
npm run release:verify:keep-evidence
```

If a shared host-network Crawl4AI runtime is already active on the machine, the verifier reuses it for non-destructive checks and skips the destructive `down` step for that shared container.

## License

[MIT](./LICENSE)
integration

Comments

Sign in to leave a comment

Loading comments...