Agent Desktop

v1.0.0Browser Automationstable

Native desktop automation CLI for AI agents. Control any application through OS accessibility trees with structured JSON output and deterministic element refs.

accessibilityaccessibility-apiai-agentsautomationcli
Share:
766
Stars
0
Downloads
0
Weekly
0/5

What is Agent Desktop?

Agent Desktop is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to native desktop automation cli for ai agents. control any application through os accessibility trees with structured json output and deterministic element refs.

Native desktop automation CLI for AI agents. Control any application through OS accessibility trees with structured JSON output and deterministic element refs.

This server falls under the Browser Automation category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • Native desktop automation CLI for AI agents. Control any app

Use Cases

Native OS accessibility tree automation
Deterministic desktop application control
lahfir

Maintainer

LicenseApache-2.0
Languagerust
Versionv1.0.0
UpdatedMay 21, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx agent-desktop

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use Agent Desktop

Agent Desktop is a native macOS CLI tool that exposes OS accessibility trees to AI agents via structured JSON output, enabling deterministic desktop automation without GUI scripting or pixel-level vision models. It provides 54 commands covering observation (snapshots, screenshots, element search), interaction (click, type, keyboard shortcuts, drag), and system operations (clipboard, notifications, app management) with a deterministic ref system (@e1, @e2) scoped to stable snapshot IDs. Developers building AI agents that need to control real desktop applications — such as automating Slack, Finder, or any macOS app — use it as a reliable, token-efficient alternative to screenshot-based automation.

Prerequisites

  • macOS 13.0 (Ventura) or later
  • Node.js 18+ (for npx installation) or Rust 1.78+ (for building from source)
  • macOS Accessibility permissions granted to your terminal application (System Settings > Privacy & Security > Accessibility)
  • An MCP client such as Claude Desktop
1

Install agent-desktop via npm

Install the agent-desktop CLI globally using npm. This makes the 'agent-desktop' command available system-wide.

npm install -g agent-desktop
2

Grant accessibility permissions

Agent Desktop requires macOS Accessibility API access. Run the permissions command to check current permission state and get guidance on what to enable.

agent-desktop permissions --request
3

Take your first accessibility snapshot

Capture a shallow accessibility tree overview of an application. The --skeleton flag limits depth to 3 levels for a fast, token-efficient overview.

agent-desktop snapshot --app Finder --skeleton --compact
4

Interact with an element using refs

Use the ref (@e12) and snapshot ID returned from a snapshot to click or interact with a specific element. Refs are deterministic within a snapshot session.

# First take a snapshot to get refs
agent-desktop snapshot --app Finder -i --compact

# Then click using the ref and snapshot ID
agent-desktop click @e12 --snapshot s8f3k2p9
5

Configure the MCP server

Add Agent Desktop to your MCP client configuration file so AI assistants can invoke desktop automation commands as tools.

{
  "mcpServers": {
    "agent-desktop": {
      "command": "npx",
      "args": ["agent-desktop"]
    }
  }
}

Agent Desktop Examples

Client configuration

MCP client configuration for Agent Desktop on macOS.

{
  "mcpServers": {
    "agent-desktop": {
      "command": "npx",
      "args": ["agent-desktop"]
    }
  }
}

Prompts to try

Example prompts to use with Agent Desktop through an MCP client.

- "Take a snapshot of the currently focused application and describe its UI"
- "Open Safari, navigate to https://example.com, and take a screenshot"
- "Type 'Hello World' into the currently focused text field"
- "Press Cmd+S to save the current document in the active application"
- "List all currently running applications on my Mac"

Troubleshooting Agent Desktop

Commands fail with PERM_DENIED error code

Go to System Settings > Privacy & Security > Accessibility and add your terminal application (Terminal, iTerm2, or the app running your MCP client). Run 'agent-desktop permissions --request' again to verify the new state.

STALE_REF error when clicking an element

Refs are scoped to a snapshot ID and become invalid after the UI changes. Take a new snapshot to get fresh refs before interacting with elements.

Snapshots are very large and slow when --skeleton is not used

Use --skeleton for an initial overview (78-96% token reduction), then use --root @eN --snapshot <id> to drill into a specific sub-tree. Add --compact to omit empty nodes and --interactive-only (-i) to limit output to actionable elements.

Frequently Asked Questions about Agent Desktop

What is Agent Desktop?

Agent Desktop is a Model Context Protocol (MCP) server that native desktop automation cli for ai agents. control any application through os accessibility trees with structured json output and deterministic element refs. It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Agent Desktop?

Follow the installation instructions on the Agent Desktop GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Agent Desktop?

Agent Desktop works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Agent Desktop free to use?

Yes, Agent Desktop is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Browse More Browser Automation MCP Servers

Explore all browser automation servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "agent-desktop": { "command": "npx", "args": ["-y", "agent-desktop"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Agent Desktop?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides