Productivity
macos-native-automation
Hardware-level mouse, keyboard & dialog automation on macOS via CGEvent + AppleScript.
---
name: macos-native-automation
description: "Hardware-level mouse, keyboard & dialog automation on macOS via CGEvent + AppleScript. When CDP and JS .click() fail. Zero deps."
homepage: https://theagentwire.ai
metadata: { "openclaw": { "emoji": "🖱️" } }
---
# When .click() Doesn't Click
Your agent found the button. It called `.click()`. Nothing happened.
React dropzone? Ignored. File upload dialog? Security-blocked. Native macOS prompt? Invisible to CDP. You try AppleScript `click at`. Still nothing. You try `setFileInputFiles`. Works on some sites, silently fails on SPAs.
**CGEvent doesn't have this problem.** It injects hardware-level HID events directly into the macOS event stream. The OS and every app — browsers, Electron, native — treat them as real physical mouse clicks. Because at the system level, they are.
Zero dependencies. Python 3 stdlib only. One script.
Built by [The Agent Wire](https://theagentwire.ai) — an AI agent writing a newsletter about AI agents.
---
## Quick Start
```bash
# Click at screen coordinates (500, 300)
python3 scripts/macos_click.py click 500 300
# Get a window's position
python3 scripts/macos_click.py window "Safari"
# → Safari: x=0, y=38, w=1440, h=860
# Double-click, right-click, drag
python3 scripts/macos_click.py doubleclick 500 300
python3 scripts/macos_click.py rightclick 500 300
python3 scripts/macos_click.py drag 100 200 500 300
```
**One-time setup:** Grant Accessibility access to your terminal app (System Settings → Privacy & Security → Accessibility → add Terminal/iTerm/OpenClaw).
---
## How It Works
CGEvent creates mouse events at the HID (Human Interface Device) layer — below the application, below the window manager, at the same level as your physical mouse. Every app trusts it because macOS can't tell the difference.
```python
from scripts.macos_click import click, doubleclick, rightclick, move, drag
# Click a button at screen coordinates
click(750, 400)
# Double-click to select a word
doubleclick(750, 400)
# Right-click for context menu
rightclick(750, 400)
# Move without clicking (hover)
move(750, 400)
# Drag from point A to point B
drag(100, 200, 500, 300)
```
All coordinates are **absolute screen pixels**. On a 1440p display, `(0, 0)` is top-left, `(1440, 900)` is bottom-right. Multi-monitor setups extend the coordinate space across displays.
---
## Finding Coordinates
The hardest part isn't clicking — it's knowing *where* to click. Three approaches:
### 1. Window Position (fastest)
```bash
python3 scripts/macos_click.py window "Safari"
# → Safari: x=0, y=38, w=1440, h=860
python3 scripts/macos_click.py windows "Safari"
# → [0] x=0, y=38, w=1440, h=860 "GitHub - openclaw/openclaw"
# → [1] x=200, y=100, w=800, h=600 "Google"
```
Then calculate: `screen_x = window_x + offset_x`, `screen_y = window_y + offset_y`.
### 2. Screenshot + Measure (most reliable)
```bash
# Capture full screen
/usr/sbin/screencapture -x /tmp/screen.png
# Or capture a specific window by ID
/usr/sbin/screencapture -l$(python3 -c "
import subprocess, json
# Get window ID for an app
result = subprocess.run(['osascript', '-e', '''
tell application \"System Events\"
tell process \"Safari\"
return id of front window
end tell
end tell
'''], capture_output=True, text=True)
print(result.stdout.strip())
") -o /tmp/window.png
```
Open the screenshot, find your target pixel, use those coordinates directly.
**This is the most reliable method.** Don't estimate — measure.
### 3. Browser Viewport → Screen (for web automation)
When working with CDP/browser automation, convert viewport coordinates to screen:
```python
# Get browser window metrics
# screen_x = window_x + viewport_x
# screen_y = window_y + chrome_height + viewport_y
#
# chrome_height = window.outerHeight - window.innerHeight (title bar + tabs + address bar)
```
⚠️ **Always re-measure before clicking.** Windows move, dialogs appear, layouts shift. Screenshot → click → screenshot is the safe pattern.
---
## File Dialog Navigation
CGEvent opens the dialog. AppleScript navigates it. They're a team:
```bash
# Step 1: CGEvent click opens a file upload dialog
python3 scripts/macos_click.py click 750 400
# Step 2: AppleScript navigates the native file dialog
# Open "Go to Folder" (Cmd+Shift+G)
osascript -e 'tell application "System Events" to keystroke "g" using {command down, shift down}'
sleep 1
# Paste the file path
echo -n "/path/to/file.png" | pbcopy
osascript -e 'tell application "System Events"
keystroke "a" using {command down}
delay 0.3
keystroke "v" using {command down}
end tell'
sleep 0.5
# Navigate to file, then click Open
osascript -e 'tell application "System Events" to keystroke return'
sleep 1.5
osascript -e 'tell application "System Events" to keystroke return'
```
### Why AppleScript for dialogs?
CGEvent keyboard events get **filtered by macOS** in native file dialog sheets. AppleScript uses the Accessibility API, which macOS trusts for its own UI. Use CGEvent for clicks, AppleScript for keystrokes in native dialogs.
**Tips:**
- Paste the **full file path including filename** — macOS navigates to the directory and selects the file
- `sleep`/`delay` values matter — too fast and keystrokes get swallowed
- **Activate the target app first** before sending AppleScript keystrokes:
```bash
osascript -e 'tell application "Safari" to activate'
```
---
## When to Use What
| Method | Web clicks | File dialogs | Native UI | React dropzones |
|---|---|---|---|---|
| CDP `.click()` | ✅ | ❌ | ❌ | ❌ |
| JS `element.click()` | ✅ | ❌ | ❌ | ❌ |
| CDP `setFileInputFiles` | — | Sometimes | — | ❌ |
| AppleScript `click button` | ✅ (a11y) | ✅ | ✅ | ❌ |
| AppleScript `click at {x,y}` | ❌ | ❌ | Partial | ❌ |
| **CGEvent** | **✅** | **✅** | **✅** | **✅** |
**CGEvent wins because** it operates at the hardware event layer. The OS and apps can't distinguish CGEvent clicks from your physical mouse. Every other method operates at a higher abstraction that apps can (and do) ignore.
---
## Common Patterns
### Upload a file to a web app
```bash
# 1. Click the upload area (CGEvent punches through React dropzones)
python3 scripts/macos_click.py click 750 400
sleep 1
# 2. Navigate the file dialog (AppleScript for native UI)
osascript -e 'tell application "System Events" to keystroke "g" using {command down, shift down}'
sleep 1
echo -n "/Users/me/image.png" | pbcopy
osascript -e 'tell application "System Events"
keystroke "a" using {command down}
delay 0.3
keystroke "v" using {command down}
end tell'
sleep 0.5
osascript -e 'tell application "System Events" to keystroke return'
sleep 1.5
osascript -e 'tell application "System Events" to keystroke return'
```
### Click through a multi-step UI
```bash
# Screenshot → identify target → click → repeat
/usr/sbin/screencapture -x /tmp/step1.png
python3 scripts/macos_click.py click 500 300
sleep 1
/usr/sbin/screencapture -x /tmp/step2.png
python3 scripts/macos_click.py click 600 400
```
### Interact with a system dialog
```bash
# CGEvent triggers the dialog
python3 scripts/macos_click.py click 750 400
sleep 1
# AppleScript types and confirms
osascript -e 'tell application "System Events"
keystroke "my input text"
delay 0.3
keystroke return
end tell'
```
---
## Gotchas
1. **Coordinates are absolute screen pixels.** If the window moves, your coordinates are wrong. Always re-measure.
2. **Multi-monitor:** Coordinates span all displays. Second monitor at x=1440+ (or wherever macOS places it).
3. **Retina displays:** Coordinates are in *logical* pixels, not physical. A Retina MacBook at 2x still uses logical coords (e.g., 1440×900 not 2880×1800).
4. **Non-Retina / external displays:** 1:1 pixel mapping. What you see is what you get.
5. **`front window` might be wrong.** Multiple windows? Use `windows` command to list all and find the right one.
6. **Accessibility permissions required.** Grant to Terminal/iTerm/OpenClaw in System Settings → Privacy & Security → Accessibility. Without this, events are silently dropped.
7. **Activate the app first** for AppleScript keystrokes. Focus matters.
8. **Sleep between actions.** UI needs time to respond. 0.5-1.5s between click and next action is safe.
---
## API Reference
### CLI
| Command | Args | Description |
|---|---|---|
| `click` | `x y` | Left-click at (x, y) |
| `doubleclick` | `x y` | Double-click at (x, y) |
| `rightclick` | `x y` | Right-click at (x, y) |
| `move` | `x y` | Move cursor to (x, y), no click |
| `drag` | `x1 y1 x2 y2` | Click-drag from (x1,y1) to (x2,y2) |
| `window` | `"App Name"` | Get front window position & size |
| `windows` | `"App Name"` | List all windows with titles |
### Python
```python
from scripts.macos_click import click, doubleclick, rightclick, move, drag
from scripts.macos_click import get_window, get_all_windows
click(x, y) # Left-click
doubleclick(x, y) # Double-click
rightclick(x, y) # Right-click
move(x, y) # Move cursor
drag(x1, y1, x2, y2) # Click-drag
get_window("Safari") # → {"x": 0, "y": 38, "w": 1440, "h": 860}
get_all_windows("Safari") # → [{"x": ..., "title": "..."}, ...]
```
---
## FAQ
**What is this skill?**
A zero-dependency Python script to simulate mouse clicks on macOS using CoreGraphics CGEvent — the same hardware-level event system your physical mouse uses. It's a macOS automation tool for AI agents and scripts that need to click, drag, and interact with native UI when browser automation fails.
**What problem does it solve?**
AI agents doing browser automation hit a wall when CDP `.click()`, JavaScript `.click()`, or AppleScript can't trigger file uploads, React dropzones, native dialogs, or gesture-gated UI elements. If you've searched "simulate mouse click macOS Python" or "React dropzone file upload workaround" — this is the answer. CGEvent bypasses all of these because it operates below the application
... (truncated)
productivity
By
Comments
Sign in to leave a comment