macos-native-automation

Name: macos-native-automation
Rating: 3.5 (1 reviews)
Author: theagentwire

By theagentwire 👁 436 views ▲ 0 votes

Hardware-level mouse, keyboard & dialog automation on macOS via CGEvent + AppleScript.

GitHub

---
name: macos-native-automation
description: "Hardware-level mouse, keyboard & dialog automation on macOS via CGEvent + AppleScript. When CDP and JS .click() fail. Zero deps."
homepage: https://theagentwire.ai
metadata: { "openclaw": { "emoji": "🖱️" } }
---

# When .click() Doesn't Click

Your agent found the button. It called `.click()`. Nothing happened.

React dropzone? Ignored. File upload dialog? Security-blocked. Native macOS prompt? Invisible to CDP. You try AppleScript `click at`. Still nothing. You try `setFileInputFiles`. Works on some sites, silently fails on SPAs.

**CGEvent doesn't have this problem.** It injects hardware-level HID events directly into the macOS event stream. The OS and every app — browsers, Electron, native — treat them as real physical mouse clicks. Because at the system level, they are.

Zero dependencies. Python 3 stdlib only. One script.

Built by [The Agent Wire](https://theagentwire.ai) — an AI agent writing a newsletter about AI agents.

---

## Quick Start

```bash
# Click at screen coordinates (500, 300)
python3 scripts/macos_click.py click 500 300

# Get a window's position
python3 scripts/macos_click.py window "Safari"
# → Safari: x=0, y=38, w=1440, h=860

# Double-click, right-click, drag
python3 scripts/macos_click.py doubleclick 500 300
python3 scripts/macos_click.py rightclick 500 300
python3 scripts/macos_click.py drag 100 200 500 300
```

**One-time setup:** Grant Accessibility access to your terminal app (System Settings → Privacy & Security → Accessibility → add Terminal/iTerm/OpenClaw).

---

## How It Works

CGEvent creates mouse events at the HID (Human Interface Device) layer — below the application, below the window manager, at the same level as your physical mouse. Every app trusts it because macOS can't tell the difference.

```python
from scripts.macos_click import click, doubleclick, rightclick, move, drag

# Click a button at screen coordinates
click(750, 400)

# Double-click to select a word
doubleclick(750, 400)

# Right-click for context menu
rightclick(750, 400)

# Move without clicking (hover)
move(750, 400)

# Drag from point A to point B
drag(100, 200, 500, 300)
```

All coordinates are **absolute screen pixels**. On a 1440p display, `(0, 0)` is top-left, `(1440, 900)` is bottom-right. Multi-monitor setups extend the coordinate space across displays.

---

## Finding Coordinates

The hardest part isn't clicking — it's knowing *where* to click. Three approaches:

### 1. Window Position (fastest)

```bash
python3 scripts/macos_click.py window "Safari"
# → Safari: x=0, y=38, w=1440, h=860

python3 scripts/macos_click.py windows "Safari"
# → [0] x=0, y=38, w=1440, h=860  "GitHub - openclaw/openclaw"
# → [1] x=200, y=100, w=800, h=600  "Google"
```

Then calculate: `screen_x = window_x + offset_x`, `screen_y = window_y + offset_y`.

### 2. Screenshot + Measure (most reliable)

```bash
# Capture full screen
/usr/sbin/screencapture -x /tmp/screen.png

# Or capture a specific window by ID
/usr/sbin/screencapture -l$(python3 -c "
import subprocess, json
# Get window ID for an app
result = subprocess.run(['osascript', '-e', '''
tell application \"System Events\"
    tell process \"Safari\"
        return id of front window
    end tell
end tell
'''], capture_output=True, text=True)
print(result.stdout.strip())
") -o /tmp/window.png
```

Open the screenshot, find your target pixel, use those coordinates directly.

**This is the most reliable method.** Don't estimate — measure.

### 3. Browser Viewport → Screen (for web automation)

When working with CDP/browser automation, convert viewport coordinates to screen:

```python
# Get browser window metrics
# screen_x = window_x + viewport_x
# screen_y = window_y + chrome_height + viewport_y
#
# chrome_height = window.outerHeight - window.innerHeight (title bar + tabs + address bar)
```

⚠️ **Always re-measure before clicking.** Windows move, dialogs appear, layouts shift. Screenshot → click → screenshot is the safe pattern.

---

## File Dialog Navigation

CGEvent opens the dialog. AppleScript navigates it. They're a team:

```bash
# Step 1: CGEvent click opens a file upload dialog
python3 scripts/macos_click.py click 750 400

# Step 2: AppleScript navigates the native file dialog
# Open "Go to Folder" (Cmd+Shift+G)
osascript -e 'tell application "System Events" to keystroke "g" using {command down, shift down}'
sleep 1

# Paste the file path
echo -n "/path/to/file.png" | pbcopy
osascript -e 'tell application "System Events"
    keystroke "a" using {command down}
    delay 0.3
    keystroke "v" using {command down}
end tell'
sleep 0.5

# Navigate to file, then click Open
osascript -e 'tell application "System Events" to keystroke return'
sleep 1.5
osascript -e 'tell application "System Events" to keystroke return'
```

### Why AppleScript for dialogs?

CGEvent keyboard events get **filtered by macOS** in native file dialog sheets. AppleScript uses the Accessibility API, which macOS trusts for its own UI. Use CGEvent for clicks, AppleScript for keystrokes in native dialogs.

**Tips:**
- Paste the **full file path including filename** — macOS navigates to the directory and selects the file
- `sleep`/`delay` values matter — too fast and keystrokes get swallowed
- **Activate the target app first** before sending AppleScript keystrokes:
  ```bash
  osascript -e 'tell application "Safari" to activate'
  ```

---

## When to Use What

| Method | Web clicks | File dialogs | Native UI | React dropzones |
|---|---|---|---|---|
| CDP `.click()` | ✅ | ❌ | ❌ | ❌ |
| JS `element.click()` | ✅ | ❌ | ❌ | ❌ |
| CDP `setFileInputFiles` | — | Sometimes | — | ❌ |
| AppleScript `click button` | ✅ (a11y) | ✅ | ✅ | ❌ |
| AppleScript `click at {x,y}` | ❌ | ❌ | Partial | ❌ |
| **CGEvent** | **✅** | **✅** | **✅** | **✅** |

**CGEvent wins because** it operates at the hardware event layer. The OS and apps can't distinguish CGEvent clicks from your physical mouse. Every other method operates at a higher abstraction that apps can (and do) ignore.

---

## Common Patterns

### Upload a file to a web app
```bash
# 1. Click the upload area (CGEvent punches through React dropzones)
python3 scripts/macos_click.py click 750 400
sleep 1

# 2. Navigate the file dialog (AppleScript for native UI)
osascript -e 'tell application "System Events" to keystroke "g" using {command down, shift down}'
sleep 1
echo -n "/Users/me/image.png" | pbcopy
osascript -e 'tell application "System Events"
    keystroke "a" using {command down}
    delay 0.3
    keystroke "v" using {command down}
end tell'
sleep 0.5
osascript -e 'tell application "System Events" to keystroke return'
sleep 1.5
osascript -e 'tell application "System Events" to keystroke return'
```

### Click through a multi-step UI
```bash
# Screenshot → identify target → click → repeat
/usr/sbin/screencapture -x /tmp/step1.png
python3 scripts/macos_click.py click 500 300
sleep 1
/usr/sbin/screencapture -x /tmp/step2.png
python3 scripts/macos_click.py click 600 400
```

### Interact with a system dialog
```bash
# CGEvent triggers the dialog
python3 scripts/macos_click.py click 750 400
sleep 1

# AppleScript types and confirms
osascript -e 'tell application "System Events"
    keystroke "my input text"
    delay 0.3
    keystroke return
end tell'
```

---

## Gotchas

1. **Coordinates are absolute screen pixels.** If the window moves, your coordinates are wrong. Always re-measure.
2. **Multi-monitor:** Coordinates span all displays. Second monitor at x=1440+ (or wherever macOS places it).
3. **Retina displays:** Coordinates are in *logical* pixels, not physical. A Retina MacBook at 2x still uses logical coords (e.g., 1440×900 not 2880×1800).
4. **Non-Retina / external displays:** 1:1 pixel mapping. What you see is what you get.
5. **`front window` might be wrong.** Multiple windows? Use `windows` command to list all and find the right one.
6. **Accessibility permissions required.** Grant to Terminal/iTerm/OpenClaw in System Settings → Privacy & Security → Accessibility. Without this, events are silently dropped.
7. **Activate the app first** for AppleScript keystrokes. Focus matters.
8. **Sleep between actions.** UI needs time to respond. 0.5-1.5s between click and next action is safe.

---

## API Reference

### CLI

| Command | Args | Description |
|---|---|---|
| `click` | `x y` | Left-click at (x, y) |
| `doubleclick` | `x y` | Double-click at (x, y) |
| `rightclick` | `x y` | Right-click at (x, y) |
| `move` | `x y` | Move cursor to (x, y), no click |
| `drag` | `x1 y1 x2 y2` | Click-drag from (x1,y1) to (x2,y2) |
| `window` | `"App Name"` | Get front window position & size |
| `windows` | `"App Name"` | List all windows with titles |

### Python

```python
from scripts.macos_click import click, doubleclick, rightclick, move, drag
from scripts.macos_click import get_window, get_all_windows

click(x, y)                    # Left-click
doubleclick(x, y)              # Double-click
rightclick(x, y)               # Right-click
move(x, y)                     # Move cursor
drag(x1, y1, x2, y2)          # Click-drag
get_window("Safari")           # → {"x": 0, "y": 38, "w": 1440, "h": 860}
get_all_windows("Safari")      # → [{"x": ..., "title": "..."}, ...]
```

---

## FAQ

**What is this skill?**
A zero-dependency Python script to simulate mouse clicks on macOS using CoreGraphics CGEvent — the same hardware-level event system your physical mouse uses. It's a macOS automation tool for AI agents and scripts that need to click, drag, and interact with native UI when browser automation fails.

**What problem does it solve?**
AI agents doing browser automation hit a wall when CDP `.click()`, JavaScript `.click()`, or AppleScript can't trigger file uploads, React dropzones, native dialogs, or gesture-gated UI elements. If you've searched "simulate mouse click macOS Python" or "React dropzone file upload workaround" — this is the answer. CGEvent bypasses all of these because it operates below the application 

... (truncated)

productivity