← Back to Skills
Browser Automation

actionbook

Senpeng Chai By Senpeng Chai 👁 208 views ▲ 1 votes

Actionbook activates when an AI agent needs to operate on a website β€” providing structured action manuals, verified selectors, and controlled browser automation for safe and deterministic web interaction.

GitHub
---
name: actionbook
description: Activate when the user needs to interact with any website β€” browser automation, web scraping, screenshots, form filling, UI testing, monitoring, or building AI agents. Provides verified action manuals with step-by-step instructions and pre-tested selectors.
---

## When to Use This Skill

**Activate this skill when the user's request involves interacting with a website or web page**

Activate when the user:
- Needs to do anything on a website ("Send a LinkedIn message", "Book an Airbnb", "Search Google for...")
- Asks how to interact with a site ("How do I post a tweet?", "How to apply on LinkedIn?")
- Wants to fill out forms, click buttons, navigate, search, filter, or browse on a specific site
- Wants to take a screenshot of a web page or monitor changes
- Builds browser-based AI agents, web scrapers, or E2E tests for external websites
- Automates repetitive web tasks (data entry, form submission, content posting)
- Wants to control their existing Chrome browser (Extension mode)

## Browser Modes

Actionbook supports two browser control modes:

| Mode | Flag | Use Case |
|------|------|----------|
| **CDP** (default) | (none) | Launches a dedicated browser instance via Chrome DevTools Protocol |
| **Extension** | `--extension` | Controls the user's existing Chrome browser via a Chrome Extension + WebSocket bridge |

**Extension mode:** Use when the user wants to operate their already-open Chrome (existing logins, cookies, tabs), or when the task requires the user's real session state.

**CDP mode (default):** Use for clean environments, headless automation, CI/CD, or profile-based session isolation.

All commands work identically in both modes β€” the only difference is the `--extension` flag (or `ACTIONBOOK_EXTENSION=1`).

## How to Use

> **CRITICAL RULE β€” Action Manual First (Per-Page-Type):**
> Before executing ANY `actionbook browser` command on a page, complete Phase 1 (`actionbook search` β†’ `actionbook get`). This includes ALL browser commands: `click`, `fill`, `text`, `eval`, `snapshot`, `screenshot`, and any other interaction.
>
> **This rule applies per page type.** Every time you navigate to a page with a different URL pattern, repeat Phase 1 before any interaction:
>
> 1. `actionbook search` β€” query by task description for the new page type
> 2. `actionbook get` β€” if a manual exists, retrieve selectors
> 3. **Only then** execute browser commands, using Action Manual selectors first
> 4. If no manual exists β†’ `actionbook browser snapshot` as fallback
>
> **What counts as a "different page type":**
> - Different URL path structure (e.g., `x.com/home` β†’ `x.com/:user/status/:id`)
> - Different functional purpose (e.g., search results page β†’ item detail page)
> - Different domain or subdomain
> - Note: Pagination, sorting, or refreshing within the same page type does NOT count
>
> **Common violation:** Having prior knowledge of a site's DOM does NOT exempt you from this rule. Action Manual selectors are pre-verified and maintained; selectors from memory may be outdated.

### Phase 1: Get Action Manual

```bash
# Step 1: Search for action manuals (always do this first)
actionbook search "arxiv search papers"
# Returns: area IDs with descriptions

# Step 2: Get the full manual (use area_id from search results)
actionbook get "arxiv.org:/search/advanced:default"
# Returns: Page structure, UI Elements with CSS/XPath selectors

# If you navigate to a NEW page type, repeat Steps 1-2 for that page.
# Example: after landing on a paper detail page:
#   actionbook search "arxiv paper abstract page"
#   actionbook get "arxiv.org:/abs/1910.06709:default"
```

### Phase 2: Execute with Browser

After opening a page, choose your path:
- **Have Action Manual selectors?** β†’ Use them directly. Do not run `snapshot`.
- **Manual selector fails at runtime?** β†’ `snapshot` β†’ retry with snapshot selectors (see Fallback Strategy).
- **No Action Manual at all?** β†’ `snapshot` as primary source.

```bash
# Step 3: Open browser
actionbook browser open "https://arxiv.org/search/advanced"

# Step 4: Use CSS selectors from Action Manual directly
actionbook browser fill "#terms-0-term" "Neural Network"
actionbook browser select "#terms-0-field" "title"
actionbook browser click "#date-filter_by-2"
actionbook browser fill "#date-year" "2025"
actionbook browser click "form[action='/search/advanced'] button.is-link"

# Step 5: Wait for results
actionbook browser wait-nav

# Step 6: Extract data
actionbook browser text

# Step 7: Close browser
actionbook browser close
```

### Phase 2 (alt): Execute with Extension mode

Extension mode uses identical browser commands β€” just add `--extension`. But you **must** follow the full lifecycle below.

```bash
# Step 3: Open URL in user's Chrome
actionbook --extension browser open "https://arxiv.org/search/advanced"

# Step 4-7: Same commands, just add --extension
actionbook --extension browser fill "#terms-0-term" "Neural Network"
actionbook --extension browser select "#terms-0-field" "title"
actionbook --extension browser click "#date-filter_by-2"
actionbook --extension browser fill "#date-year" "2025"
actionbook --extension browser click "form[action='/search/advanced'] button.is-link"
actionbook --extension browser wait-nav
actionbook --extension browser text

# Step 8: Cleanup (CRITICAL β€” see Extension Mode Lifecycle below)
actionbook --extension browser close    # release debug connection FIRST
actionbook extension stop               # then stop bridge server
```

> **Extension mode tabs:** `browser close` closes the current tab. Close tabs opened via `browser open` when the task is done. Do not close pre-existing tabs.

## Action Manual Format

Action manuals return page URL, page structure (DOM hierarchy), and UI elements with selectors:

```yaml
  ### button_advanced_search
  - ID: button_advanced_search
  - Description: Advanced search navigation button
  - Type: link
  - Allow Methods: click
  - Selectors:
    - css: button.button.is-small.is-cul-darker (confidence: 0.65)
    - xpath: //button[contains(@class, 'button')] (confidence: 0.55)
    - role: getByRole('link', { name: 'Advanced Search' }) (confidence: 0.9)
```

## Action Search Commands

```bash
actionbook search "<query>"                    # Basic search
actionbook search "<query>" --domain site.com  # Filter by domain
actionbook search "<query>" --url <url>        # Filter by URL
actionbook search "<query>" -p 2 -s 20         # Page 2, 20 results

actionbook get "<area_id>"                     # Full details with selectors
# area_id format: "site.com:/path:area_name"

actionbook sources list                        # List available sources
actionbook sources search "<query>"            # Search sources by keyword
```

## Extension Setup & Management

Commands for managing the Chrome Extension bridge:

```bash
actionbook extension install              # Install extension files to local config dir
actionbook extension path                 # Show extension directory (for Chrome "Load unpacked")
actionbook extension serve                # Start WebSocket bridge (keep running in background)
actionbook extension stop                 # Stop the running bridge server (sends SIGTERM)
actionbook extension status               # Check bridge and extension connection status
actionbook extension ping                 # Ping the extension to verify link is alive
```

**Setup flow (one-time):**
1. `actionbook extension install` β€” extract extension files and register native messaging host
2. Open `chrome://extensions` β†’ enable Developer mode β†’ Load unpacked β†’ select the path from `actionbook extension path`
3. `actionbook extension serve` β€” start bridge (keep running)
4. Extension auto-connects via native messaging (no manual token needed in most cases). If auto-pairing fails: copy token from `serve` output β†’ paste in extension popup β†’ Save

**Connection check before automation:**
```bash
actionbook extension status    # should show "running"
actionbook extension ping      # should show "responded"
```

## Browser Commands

All browser commands work in both CDP and Extension mode. For Extension mode, add `--extension` flag or set `ACTIONBOOK_EXTENSION=1`.

### Navigation

```bash
actionbook browser open <url>                  # Open URL in new tab
actionbook browser goto <url>                  # Navigate current page
actionbook browser back                        # Go back
actionbook browser forward                     # Go forward
actionbook browser reload                      # Reload page
actionbook browser pages                       # List open tabs
actionbook browser switch <page_id>            # Switch tab
actionbook browser close                       # Close browser
actionbook browser restart                     # Restart browser
actionbook browser connect <endpoint>          # Connect to existing browser (CDP port or URL)
```

### Interactions

Every selector you pass to these commands must come from an Action Manual (`actionbook get`) or a `snapshot` taken in this session. Do not use selectors from memory or training data.

```bash
actionbook browser click "<selector>"                  # Click element
actionbook browser click "<selector>" --wait 1000      # Wait then click
actionbook browser fill "<selector>" "text"            # Clear and type
actionbook browser type "<selector>" "text"            # Append text
actionbook browser select "<selector>" "value"         # Select dropdown
actionbook browser hover "<selector>"                  # Hover
actionbook browser focus "<selector>"                  # Focus
actionbook browser press Enter                         # Press key
actionbook browser upload <file> [<file2> ...]         # Upload file(s)
actionbook browser upload <file> -s "<selector>"       # Upload with selector
actionbook browser upload <file> --ref-id e0           # Upload with snapshot ref
```

### Get Information

```bash
actionbook browser text                        # Full page text
actionbook browser text "<selector>"           # Element text
actionbook browser html                        # Full page HTML
actionbook browser html "<selector>"           # Element HTML
actionbook browser snapshot                    # Accessibility tree
actionbook browser snapshot --filter interactive --max-tokens 500  # Focused snapshot
actionbook browser snapshot --diff             # Show changes since last snapshot
actionbook browser viewport                    # Viewport dimensions
actionbook browser status                      # Browser detection info
actionbook browser info "<selector>"           # Element details (visibility, position, styles)
actionbook browser console                     # Capture console logs
actionbook browser console --level error       # Filter by level (log/info/warn/error)
actionbook browser console --duration 5000     # Listen for new messages (ms)
actionbook browser fetch "<url>"               # One-shot fetch (navigate+wait+extract+close)
actionbook browser fetch "<url>" --format text --json  # With format and JSON output
actionbook browser fetch "<url>" --lite        # HTTP-first, skip browser for static pages
```

### Wait

```bash
actionbook browser wait "<selector>"                   # Wait for element
actionbook browser wait "<selector>" --timeout 5000    # Custom timeout
actionbook browser wait-nav                            # Wait for navigation
actionbook browser wait-idle                           # Wait for network idle
actionbook browser wait-idle --timeout 15000           # Custom timeout
actionbook browser wait-fn "expression"                # Wait for JS condition
actionbook browser wait-fn "document.querySelector('.results')" --timeout 5000
```

### Screenshots & Export

```bash
# Ensure target directory exists before saving screenshots
actionbook browser screenshot                  # Save screenshot.png
actionbook browser screenshot output.png       # Custom path
actionbook browser screenshot --full-page      # Full page
actionbook browser pdf output.pdf              # Export as PDF
```

### JavaScript & Inspection

> **`eval` is last-resort only.** Before using `browser eval` with `querySelector`, you must have already run `snapshot` on this page. Base selectors on snapshot/inspect output, never on memorized DOM knowledge.

```bash
actionbook browser eval "document.title"               # Execute JS
actionbook browser inspect 100 200                     # Inspect at coordinates
actionbook browser inspect 100 200 --desc "login btn"  # With description
```

### Cookies

```bash
actionbook browser cookies list/get/set/delete/clear
actionbook browser cookies set "name" "value" --domain ".example.com"
```

### Storage

```bash
actionbook browser storage list                        # List all localStorage keys
actionbook browser storage get "key"                   # Get value
actionbook browser storage set "key" "value"           # Set value
actionbook browser storage remove "key"                # Remove key
actionbook browser storage clear                       # Clear all
actionbook browser storage list --session              # sessionStorage operations
actionbook browser storage get "key" --session
```

### Device Emulation

```bash
actionbook browser emulate iphone-14                   # Preset device
actionbook browser emulate ipad                        # Tablet
actionbook browser emulate desktop-hd                  # 1920x1080
actionbook browser emulate 1280x720                    # Custom resolution
```

### Advanced Operations

```bash
# Batch execution (run multiple actions in one command)
actionbook browser batch --file actions.json
actionbook browser batch --file actions.json --delay 100  # Custom delay (ms)
cat actions.json | actionbook browser batch            # From stdin

# Fingerprint rotation (stealth mode)
actionbook browser fingerprint rotate
actionbook browser fingerprint rotate --os windows --screen 1920x1080
```

## Global Flags

```bash
# Output & Logging
actionbook --json <command>                      # JSON output
actionbook --verbose <command>                   # Verbose logging
actionbook --session-tag <tag> <command>         # Tag operations for log correlation

# Browser Mode & Connection
actionbook --headless <command>                  # Headless mode (CDP only)
actionbook -P <profile> <command>                # Use specific profile (CDP only)
actionbook --cdp <port|url> <command>            # CDP connection
actionbook --extension <command>                 # Use Chrome Extension mode
# or: ACTIONBOOK_EXTENSION=1 actionbook <command>

# Performance & Resource Control
actionbook --block-images <command>              # Block image downloads (2-5x faster)
actionbook --block-media <command>               # Block images, fonts, CSS, media

# Stealth & Anti-Detection
actionbook --stealth <command>                   # Enable anti-bot detection
actionbook --stealth-os windows <command>        # Stealth with OS override
actionbook --stealth-gpu rtx4080 <command>       # Stealth with GPU override

# Page Behavior Control
actionbook --no-animations <command>             # Disable CSS animations/transitions
actionbook --auto-dismiss-dialogs <command>      # Auto-handle alert/confirm/prompt

# Advanced Features
actionbook --rewrite-urls <command>              # Rewrite anti-scrape URLs (x.com→xcancel.com)
actionbook --wait-hint <hint> <command>          # Domain-aware wait (instant/fast/normal/slow/heavy)
```

## Guidelines

### Selector Priority
- Search by task description, not element name ("arxiv search papers" not "search button")
- Prefer Action Manual selectors β€” they are pre-verified and don't require snapshot
- Prefer CSS ID selectors (`#id`) over XPath when both are provided
- Fall back to snapshot when selectors fail

### Prohibited Patterns
- Do not run `snapshot` when you already have Action Manual selectors β€” snapshot is a fallback, not a discovery step
- Do not use `browser eval` with hardcoded/memorized selectors to bypass the workflow
- Do not skip search because a page "looks similar" to one you already searched β€” different URL patterns require separate searches
- Do not use `browser text` or `browser eval` as the first command on a new page type without completing Phase 1

### Extension Mode
- Follow the full lifecycle β€” pre-flight β†’ connect β†’ execute β†’ cleanup (see [Extension Mode Lifecycle](#extension-mode-lifecycle-critical))
- Verify extension is installed before starting bridge; prefer auto-pair over manual token
- Always run `browser close` before stopping the bridge to release the debug connection
- The user's real browser is being controlled β€” avoid destructive actions (clearing all cookies, closing all tabs) without confirmation
- L3 operations (some cookie/storage modifications) may require manual approval in the extension popup

### Login Page Handling
When you hit a login/auth wall (sign-in page, password prompt, MFA/OTP, CAPTCHA, account chooser):

1. **Pause automation and keep the current browser session open** (same tab/profile/cookies).
2. **Ask the user to complete login manually** in that same browser window.
3. After user confirms login is done, **continue in the same session**.
4. If login lands on a different page type, rerun Phase 1 (`search` β†’ `get`) for that new page type before further commands.

Do not switch tools just because a login page appears.

### Browser Lifecycle
Always clean up when the task is complete:
- **CDP mode:** Run `actionbook browser close` as the final step
- **Extension mode:** `browser close` to release debug connection β†’ `extension stop` to stop bridge
- **Exception:** Only skip cleanup if the user explicitly asks to keep the tab open

## Fallback Strategy

### When Fallback is Needed

Actionbook stores pre-computed page data captured at indexing time. This data may become outdated as websites evolve:

- **Selector execution failure** - The returned CSS/XPath selector does not match any element
- **Element mismatch** - The selector matches an element with unexpected type or behavior
- **Multiple selector failures** - Several selectors from the same action fail consecutively

### Fallback Chain

When Action Manual selectors don't work, follow this ordered fallback chain:

1. **Snapshot the page** β€” `actionbook browser snapshot` to get the current accessibility tree; use selectors from the snapshot output
2. **Inspect visually** β€” `actionbook browser screenshot` to see the current state
3. **Inspect by coordinates** β€” `actionbook browser inspect <x> <y>` to find elements at specific positions
4. **Execute JS (last resort)** β€” Before using `eval`, verify:
   1. Have I run `snapshot` on this page? (If no β†’ snapshot first)
   2. Is my selector from snapshot/inspect output in this session? (If no β†’ stop, you're using memorized selectors)
   3. Did snapshot selectors already fail? (If no β†’ use snapshot selectors instead)
   Only proceed with `eval` if all checks pass.

### When to Exit

If actionbook search returns no results or action fails unexpectedly, use other available tools to continue the task.

## Examples

### End-to-end with Action Manual

```bash
# 1. Find selectors
actionbook search "airbnb search" --domain airbnb.com

# 2. Get detailed selectors (area_id from search results)
actionbook get "airbnb.com:/:default"

# 3. Automate using pre-verified selectors
actionbook browser open "https://www.airbnb.com"
actionbook browser fill "input[data-testid='structured-search-input-field-query']" "Tokyo"
actionbook browser click "button[data-testid='structured-search-input-search-button']"
actionbook browser wait-nav
actionbook browser text
actionbook browser close
```

### Extension mode: Operate on user's Chrome

```bash
# Verify bridge is running
actionbook extension status

# Use the user's existing logged-in session
actionbook --extension browser open "https://github.com/notifications"
actionbook --extension browser wait-nav
actionbook --extension browser text ".notifications-list"
actionbook --extension browser screenshot notifications.png
actionbook --extension browser close
```

### Extension Mode Lifecycle (CRITICAL)

When using Extension mode, **always** follow this complete lifecycle: pre-flight β†’ connect β†’ execute β†’ cleanup.

#### 1. Pre-flight: Ask user about extension installation

Before any technical checks, **ask the user** whether they have the Actionbook Chrome Extension installed.

- **User confirms installed** β†’ proceed to Step 2 (Connect).
- **User says not installed** β†’ run the installation flow:

```bash
# Install extension files locally
actionbook extension install
actionbook extension path
# β†’ On macOS, copy to visible dir if needed:
#    cp -r "$(actionbook extension path)" ~/Document/actionbook-extension
```

Then guide the user to load it in Chrome:
1. Open `chrome://extensions` β†’ enable Developer mode
2. Click "Load unpacked" β†’ select the extension directory
3. After user confirms loaded β†’ proceed to Step 2

> **Limitation:** The CLI can only verify that extension files exist locally. There is no way to detect whether Chrome has actually loaded the extension until a connection is attempted in Step 2.

#### 2. Connect: Start bridge, auto-pair with retry

Start the bridge server and attempt auto-pairing. **Retry up to 3 times** before considering manual fallback.

```bash
# Start bridge (run in background)
actionbook extension serve

# Attempt 1: Wait for auto-pairing via Native Messaging
sleep 3
actionbook extension ping
# β†’ If ping succeeds β†’ proceed to Step 3 (Execute)

# Attempt 2: If ping fails, wait longer and retry
sleep 5
actionbook extension ping
# β†’ If ping succeeds β†’ proceed to Step 3 (Execute)

# Attempt 3: Final retry
sleep 5
actionbook extension ping
# β†’ If ping succeeds β†’ proceed to Step 3 (Execute)
```

**Only after all 3 auto-pair attempts fail**, escalate based on the error:

- **"Extension not connected"** β†’ Ask user to verify the extension is enabled in `chrome://extensions` and retry
- **"Invalid token"** β†’ Only now provide the token from `serve` output for manual paste in the extension popup
- **Other errors** β†’ Check `actionbook extension status` for diagnostics

**IMPORTANT:** Do NOT expose the session token prematurely. The token is a last-resort fallback β€” most users will connect successfully via auto-pair within 3 attempts.

#### 3. Execute: Browser automation

```bash
actionbook --extension browser open "https://example.com"
# ... perform browser operations ...
```

#### 4. Cleanup: Release debug connection, THEN stop bridge

```bash
# Step 1: Release the debugging connection (MUST come first)
actionbook --extension browser close

# Step 2: Stop the bridge server
actionbook extension stop

# Step 3: Verify Chrome no longer shows "debugging" banner
actionbook extension status    # should show "not running"
```

**WARNING:** Skipping Step 1 and directly killing the bridge process will leave Chrome showing "Actionbook is debugging this browser". Always release the debug connection before stopping the bridge.

### Extension mode: Troubleshooting

```bash
# Bridge not running?
actionbook extension serve              # Start it

# Extension not responding?
actionbook extension ping               # Check connectivity

# Token expired? (idle > 30 min)
# Restart serve and re-pair in extension popup
actionbook extension serve              # Prints new token
```

### Multi-page-type workflow (re-search on navigation)

```bash
# === Page Type 1: x.com/home (timeline) ===
actionbook search "twitter timeline" --domain x.com
actionbook get "x.com:/:default"
actionbook --extension browser open "https://x.com/home"
# Action Manual returned selectors β†’ use them directly (no snapshot)
actionbook --extension browser text "<selector-from-manual>"

# === Navigate to Page Type 2: x.com/:user/status/:id ===
# Different page type β†’ re-search before any interaction
actionbook search "twitter tweet detail" --domain x.com
# No results? Use snapshot fallback:
actionbook --extension browser snapshot
actionbook --extension browser text "<selector-from-snapshot>"

# === Back to Page Type 1 ===
actionbook --extension browser back
# Same page type as before β€” Action Manual selectors are still valid (page structure is stable).
# No need to re-search or re-snapshot. However, dynamic content (e.g., new tweets loaded)
# may differ β€” if you need fresh content, use `browser text` with the same Manual selectors.

# === Done β€” close the tab we opened ===
actionbook --extension browser close
```
browser-automation resilience browser-automation web-scraping

Comments

Sign in to leave a comment

Loading comments...