Files
cc-haha/docs/en/features/computer-use.md
程序员阿江(Relakkes) 007b8abab8 docs: add VitePress documentation site with i18n and GitHub Pages deployment
- Set up VitePress with bilingual support (Chinese root + English /en/)
- Create unified sidebar navigation with all doc sections
- Add custom Anthropic brand theme (#D97757)
- Create Quick Start pages (zh/en) from README content
- Migrate 7 .en.md files to docs/en/ directory structure
- Translate memory/agent/skills docs to English (9 files)
- Generate English-only architecture diagrams (11 images via PicTactic)
- Copy English-ready images for agent/skills sections (21 images)
- Configure custom domain (claudecodehaha.relakkesyang.org)
- Localize Chinese UI labels (sidebar, outline, footer)
- Add documentation site badge to READMEs
- Fix dead links and update cross-references for VitePress

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:21:45 +08:00

9.7 KiB

Computer Use Guide

Modified Version: This feature is a heavily modified version of the Computer Use (internal codename "Chicago") found in the leaked Claude Code source. The official implementation relies on Anthropic's private native modules (@ant/computer-use-swift, @ant/computer-use-input) that are not publicly available. We replaced the entire underlying operation layer with a Python bridge (pyautogui + mss + pyobjc), enabling anyone to run Computer Use on macOS.


Table of Contents


Overview

Computer Use allows AI models to directly control your computer — taking screenshots, moving the mouse, clicking buttons, typing text, and managing application windows.

24 MCP tools are available:

Category Tools
Screenshot screenshot, zoom
Mouse left_click, right_click, middle_click, double_click, triple_click, left_click_drag, mouse_move, left_mouse_down, left_mouse_up, cursor_position, scroll
Keyboard type, key, hold_key
Apps open_application, switch_display
Permissions request_access, list_granted_applications
Clipboard read_clipboard, write_clipboard
Other wait, computer_batch

Supported Platforms

Platform Architecture Status Notes
macOS Apple Silicon (M1/M2/M3/M4) Fully supported Recommended
macOS Intel x86_64 Fully supported
Windows Any ⚠️ Theoretically possible Core libs (pyautogui + mss) are cross-platform, but pyobjc parts (app management) need to be replaced with win32com. Not yet adapted
Linux Any ⚠️ Theoretically possible Same as above — pyobjc needs to be replaced with wmctrl + xdotool. Not yet adapted

Requirements

  • Bun >= 1.1.0
  • Python >= 3.8 (venv and dependencies are auto-installed on first use)
  • macOS permissions: Accessibility + Screen Recording

How It Works

Computer Use operates through a screenshot → analyze → act feedback loop:

┌────────────────────────────────────────────────────┐
│  AI Model (Claude / any Anthropic-protocol model)   │
│                                                     │
│  1. Receives user request: "open Music app"         │
│  2. Calls screenshot tool → receives screen image   │
│  3. Model analyzes pixels, identifies UI elements   │
│     → "search box is at (756, 342)"                 │
│  4. Calls left_click { coordinate: [756, 342] }     │
│  5. Calls type { text: "search query" }             │
│  6. Calls screenshot again → verify → next step...  │
└───────────────┬────────────────────────────────────┘
                │ MCP Tool Call
                ▼
┌────────────────────────────────────────────────────┐
│  TypeScript Tool Layer (vendor/computer-use-mcp)    │
│  - Security checks (app allowlist, TCC permissions) │
│  - Coordinate transformation                        │
│  - Tool dispatch → executor                         │
└───────────────┬────────────────────────────────────┘
                │ callPythonHelper()
                ▼
┌────────────────────────────────────────────────────┐
│  Python Bridge (runtime/mac_helper.py)              │
│  pyautogui.click(756, 342)   ← mouse control        │
│  mss.grab(monitor)           ← screenshot            │
│  NSWorkspace.open(bundleId)  ← app management        │
└────────────────────────────────────────────────────┘

Key: Coordinate analysis is performed entirely by the model's vision capabilities — it "sees" the screenshot like a human sees a screen, identifying buttons, text fields, and other UI elements directly from pixels.


Quick Start

1. Install dependencies

bun install

2. Ensure Python 3 is available

python3 --version  # >= 3.8 required

Python dependencies are automatically installed into .runtime/venv/ on first Computer Use invocation.

3. Grant macOS permissions

Accessibility:

open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"

Add your terminal app (iTerm, Terminal, Ghostty, etc.) to the allow list.

Screen Recording:

open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"

Add your terminal app as well. You may need to restart your terminal after granting permission.

4. Start

./bin/claude-haha

5. Use

Just ask in natural language:

> Take a screenshot of my desktop
> Open Safari and search for something
> Type "hello" in the text editor

Security

Mechanism Description
App allowlist Each session requires explicit authorization for which apps Claude can interact with
Concurrency lock Only one Claude session can use Computer Use at a time (file lock)
Clipboard guard Original clipboard content is saved and restored when typing via clipboard
Sensitive action gates System keyboard shortcuts require additional authorization

Note: Since we replaced the native modules with Python bridge, the global Escape hotkey abort and auto-hide features from the original implementation are not available. Use Ctrl+C to abort instead.


Environment Variables

Variable Default Description
CLAUDE_COMPUTER_USE_ENABLED 1 Set to 0 to disable Computer Use
CLAUDE_COMPUTER_USE_COORDINATE_MODE pixels Coordinate mode: pixels or normalized_0_100
CLAUDE_COMPUTER_USE_CLIPBOARD_PASTE 1 Enable clipboard-based text input
CLAUDE_COMPUTER_USE_MOUSE_ANIMATION 1 Enable mouse animation
CLAUDE_COMPUTER_USE_DEBUG 0 Debug mode

Technical Architecture

Gate Bypass

The official Claude Code gates Computer Use behind three layers:

Layer Original Mechanism Our Approach
Compile-time feature('CHICAGO_MCP') (Bun macro) Replaced with true
Subscription hasRequiredSubscription() (Max/Pro only) getChicagoEnabled() returns true directly
Remote config GrowthBook tengu_malort_pedway Same — no remote dependency
Default-disabled isDefaultDisabledBuiltin('computer-use') Returns false

Python Bridge

On first invocation, the bridge automatically:

  1. Creates a Python virtual environment (.runtime/venv/)
  2. Installs pip
  3. Installs dependencies (mss, Pillow, pyautogui, pyobjc-*)
  4. Validates via SHA256 hash (only reinstalls when requirements.txt changes)

Approaches We Tried

Approach 1: Extract native .node modules from Claude Code binary

Extracted computer-use-swift.node and computer-use-input.node from the installed Claude Code Mach-O binary. Synchronous methods worked, but async Swift methods (screenshot) hung due to N-API async incompatibility between Bun versions.

Approach 2: Create empty stub packages

Stub packages allowed compilation but provided no actual functionality.

Approach 3: Python Bridge (current)

Replaced all native module calls with Python subprocess calls via callPythonHelper(). Zero binary dependencies, auto-bootstrapping, full functionality on any macOS.


Known Limitations

Limitation Description
macOS only Windows/Linux need pyobjc replacements
No global Escape abort Original used CGEventTap; use Ctrl+C instead
No auto-hide windows Original's prepareDisplay relied on Swift
Slightly higher latency ~100ms Python process startup overhead per call

References and Credits

Project License Contribution
wimi321/macos-computer-use-skill MIT Python bridge architecture, mac_helper.py runtime, executor adaptation
domdomegg/computer-use-mcp MIT Independent Computer Use MCP server (nut.js based), used as reference
paoloanzn/free-code - Feature flag system analysis
oboard/claude-code-rev - Early leaked source restoration, stub package reference

Underlying Libraries

Library Purpose
pyautogui Mouse and keyboard control
mss Screenshot capture
Pillow Image processing and compression
pyobjc macOS Cocoa/Quartz framework bindings