← Back to Skills
Browser

desktop-control

matagul By matagul 👁 3 views ▲ 0 votes

Advanced desktop automation with mouse, keyboard, and screen

GitHub
---
description: Advanced desktop automation with mouse, keyboard, and screen control
---

# Desktop Control Skill

**The most advanced desktop automation skill for OpenClaw.** Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.

## 🎯 Features

### Mouse Control
- ✅ **Absolute positioning** - Move to exact coordinates
- ✅ **Relative movement** - Move from current position
- ✅ **Smooth movement** - Natural, human-like mouse paths
- ✅ **Click types** - Left, right, middle, double, triple clicks
- ✅ **Drag & drop** - Drag from point A to point B
- ✅ **Scroll** - Vertical and horizontal scrolling
- ✅ **Position tracking** - Get current mouse coordinates

### Keyboard Control
- ✅ **Text typing** - Fast, accurate text input
- ✅ **Hotkeys** - Execute keyboard shortcuts (Ctrl+C, Win+R, etc.)
- ✅ **Special keys** - Enter, Tab, Escape, Arrow keys, F-keys
- ✅ **Key combinations** - Multi-key press combinations
- ✅ **Hold & release** - Manual key state control
- ✅ **Typing speed** - Configurable WPM (instant to human-like)

### Screen Operations
- ✅ **Screenshot** - Capture entire screen or regions
- ✅ **Image recognition** - Find elements on screen (via OpenCV)
- ✅ **Color detection** - Get pixel colors at coordinates
- ✅ **Multi-monitor** - Support for multiple displays

### Window Management
- ✅ **Window list** - Get all open windows
- ✅ **Activate window** - Bring window to front
- ✅ **Window info** - Get position, size, title
- ✅ **Minimize/Maximize** - Control window states

### Safety Features
- ✅ **Failsafe** - Move mouse to corner to abort
- ✅ **Pause control** - Emergency stop mechanism
- ✅ **Approval mode** - Require confirmation for actions
- ✅ **Bounds checking** - Prevent out-of-screen operations
- ✅ **Logging** - Track all automation actions

---

## 🚀 Quick Start

### Installation

First, install required dependencies:

```bash
pip install pyautogui pillow opencv-python pygetwindow
```

### Basic Usage

```python
from skills.desktop_control import DesktopController

# Initialize controller
dc = DesktopController(failsafe=True)

# Mouse operations
dc.move_mouse(500, 300)  # Move to coordinates
dc.click()  # Left click at current position
dc.click(100, 200, button="right")  # Right click at position

# Keyboard operations
dc.type_text("Hello from OpenClaw!")
dc.hotkey("ctrl", "c")  # Copy
dc.press("enter")

# Screen operations
screenshot = dc.screenshot()
position = dc.get_mouse_position()
```

---

## 📋 Complete API Reference

### Mouse Functions

#### `move_mouse(x, y, duration=0, smooth=True)`
Move mouse to absolute screen coordinates.

**Parameters:**
- `x` (int): X coordinate (pixels from left)
- `y` (int): Y coordinate (pixels from top)
- `duration` (float): Movement time in seconds (0 = instant, 0.5 = smooth)
- `smooth` (bool): Use bezier curve for natural movement

**Example:**
```python
# Instant movement
dc.move_mouse(1000, 500)

# Smooth 1-second movement
dc.move_mouse(1000, 500, duration=1.0)
```

#### `move_relative(x_offset, y_offset, duration=0)`
Move mouse relative to current position.

**Parameters:**
- `x_offset` (int): Pixels to move horizontally (positive = right)
- `y_offset` (int): Pixels to move vertically (positive = down)
- `duration` (float): Movement time in seconds

**Example:**
```python
# Move 100px right, 50px down
dc.move_relative(100, 50, duration=0.3)
```

#### `click(x=None, y=None, button='left', clicks=1, interval=0.1)`
Perform mouse click.

**Parameters:**
- `x, y` (int, optional): Coordinates to click (None = current position)
- `button` (str): 'left', 'right', 'middle'
- `clicks` (int): Number of clicks (1 = single, 2 = double)
- `interval` (float): Delay between multiple clicks

**Example:**
```python
# Simple left click
dc.click()

# Double-click at specific position
dc.click(500, 300, clicks=2)

# Right-click
dc.click(button='right')
```

#### `drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')`
Drag and drop operation.

**Parameters:**
- `start_x, start_y` (int): Starting coordinates
- `end_x, end_y` (int): Ending coordinates
- `duration` (float): Drag duration
- `button` (str): Mouse button to use

**Example:**
```python
# Drag file from desktop to folder
dc.drag(100, 100, 500, 500, duration=1.0)
```

#### `scroll(clicks, direction='vertical', x=None, y=None)`
Scroll mouse wheel.

**Parameters:**
- `clicks` (int): Scroll amount (positive = up/left, negative = down/right)
- `direction` (str): 'vertical' or 'horizontal'
- `x, y` (int, optional): Position to scroll at

**Example:**
```python
# Scroll down 5 clicks
dc.scroll(-5)

# Scroll up 10 clicks
dc.scroll(10)

# Horizontal scroll
dc.scroll(5, direction='horizontal')
```

#### `get_mouse_position()`
Get current mouse coordinates.

**Returns:** `(x, y)` tuple

**Example:**
```python
x, y = dc.get_mouse_position()
print(f"Mouse is at: {x}, {y}")
```

---

### Keyboard Functions

#### `type_text(text, interval=0, wpm=None)`
Type text with configurable speed.

**Parameters:**
- `text` (str): Text to type
- `interval` (float): Delay between keystrokes (0 = instant)
- `wpm` (int, optional): Words per minute (overrides interval)

**Example:**
```python
# Instant typing
dc.type_text("Hello World")

# Human-like typing at 60 WPM
dc.type_text("Hello World", wpm=60)

# Slow typing with 0.1s between keys
dc.type_text("Hello World", interval=0.1)
```

#### `press(key, presses=1, interval=0.1)`
Press and release a key.

**Parameters:**
- `key` (str): Key name (see Key Names section)
- `presses` (int): Number of times to press
- `interval` (float): Delay between presses

**Example:**
```python
# Press Enter
dc.press('enter')

# Press Space 3 times
dc.press('space', presses=3)

# Press Down arrow
dc.press('down')
```

#### `hotkey(*keys, interval=0.05)`
Execute keyboard shortcut.

**Parameters:**
- `*keys` (str): Keys to press together
- `interval` (float): Delay between key presses

**Example:**
```python
# Copy (Ctrl+C)
dc.hotkey('ctrl', 'c')

# Paste (Ctrl+V)
dc.hotkey('ctrl', 'v')

# Open Run dialog (Win+R)
dc.hotkey('win', 'r')

# Save (Ctrl+S)
dc.hotkey('ctrl', 's')

# Select All (Ctrl+A)
dc.hotkey('ctrl', 'a')
```

#### `key_down(key)` / `key_up(key)`
Manually control key state.

**Example:**
```python
# Hold Shift
dc.key_down('shift')
dc.type_text("hello")  # Types "HELLO"
dc.key_up('shift')

# Hold Ctrl and click (for multi-select)
dc.key_down('ctrl')
dc.click(100, 100)
dc.click(200, 100)
dc.key_up('ctrl')
```

---

### Screen Functions

#### `screenshot(region=None, filename=None)`
Capture screen or region.

**Parameters:**
- `region` (tuple, optional): (left, top, width, height) for partial capture
- `filename` (str, optional): Path to save image

**Returns:** PIL Image object

**Example:**
```python
# Full screen
img = dc.screenshot()

# Save to file
dc.screenshot(filename="screenshot.png")

# Capture specific region
img = dc.screenshot(region=(100, 100, 500, 300))
```

#### `get_pixel_color(x, y)`
Get color of pixel at coordinates.

**Returns:** RGB tuple `(r, g, b)`

**Example:**
```python
r, g, b = dc.get_pixel_color(500, 300)
print(f"Color at (500, 300): RGB({r}, {g}, {b})")
```

#### `find_on_screen(image_path, confidence=0.8)`
Find image on screen (requires OpenCV).

**Parameters:**
- `image_path` (str): Path to template image
- `confidence` (float): Match threshold (0-1)

**Returns:** `(x, y, width, height)` or None

**Example:**
```python
# Find button on screen
location = dc.find_on_screen("button.png")
if location:
    x, y, w, h = location
    # Click center of found image
    dc.click(x + w//2, y + h//2)
```

#### `get_screen_size()`
Get screen resolution.

**Returns:** `(width, height)` tuple

**Example:**
```python
width, height = dc.get_screen_size()
print(f"Screen: {width}x{height}")
```

---

### Window Functions

#### `get_all_windows()`
List all open windows.

**Returns:** List of window titles

**Example:**
```python
windows = dc.get_all_windows()
for title in windows:
    print(f"Window: {title}")
```

#### `activate_window(title_substring)`
Bring window to front by title.

**Parameters:**
- `title_substring` (str): Part of window title to match

**Example:**
```python
# Activate Chrome
dc.activate_window("Chrome")

# Activate VS Code
dc.activate_window("Visual Studio Code")
```

#### `get_active_window()`
Get currently focused window.

**Returns:** Window title (str)

**Example:**
```python
active = dc.get_active_window()
print(f"Active window: {active}")
```

---

### Clipboard Functions

#### `copy_to_clipboard(text)`
Copy text to clipboard.

**Example:**
```python
dc.copy_to_clipboard("Hello from OpenClaw!")
```

#### `get_from_clipboard()`
Get text from clipboard.

**Returns:** str

**Example:**
```python
text = dc.get_from_clipboard()
print(f"Clipboard: {text}")
```

---

## ⌨️ Key Names Reference

### Alphabet Keys
`'a'` through `'z'`

### Number Keys
`'0'` through `'9'`

### Function Keys
`'f1'` through `'f24'`

### Special Keys
- `'enter'` / `'return'`
- `'esc'` / `'escape'`
- `'space'` / `'spacebar'`
- `'tab'`
- `'backspace'`
- `'delete'` / `'del'`
- `'insert'`
- `'home'`
- `'end'`
- `'pageup'` / `'pgup'`
- `'pagedown'` / `'pgdn'`

### Arrow Keys
- `'up'` / `'down'` / `'left'` / `'right'`

### Modifier Keys
- `'ctrl'` / `'control'`
- `'shift'`
- `'alt'`
- `'win'` / `'winleft'` / `'winright'`
- `'cmd'` / `'command'` (Mac)

### Lock Keys
- `'capslock'`
- `'numlock'`
- `'scrolllock'`

### Punctuation
- `'.'` / `','` / `'?'` / `'!'` / `';'` / `':'`
- `'['` /

... (truncated)
browser

Comments

Sign in to leave a comment

Loading comments...