Agent Desktop
Native desktop automation CLI for AI agents. Control any application through OS accessibility trees with structured JSON output and deterministic element refs.
What is Agent Desktop?
Agent Desktop is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to native desktop automation cli for ai agents. control any application through os accessibility trees with structured json output and deterministic element refs.
Native desktop automation CLI for AI agents. Control any application through OS accessibility trees with structured JSON output and deterministic element refs.
This server falls under the Browser Automation category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- Native desktop automation CLI for AI agents. Control any app
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx agent-desktopConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Agent Desktop
Agent Desktop is a native macOS CLI tool that exposes OS accessibility trees to AI agents via structured JSON output, enabling deterministic desktop automation without GUI scripting or pixel-level vision models. It provides 54 commands covering observation (snapshots, screenshots, element search), interaction (click, type, keyboard shortcuts, drag), and system operations (clipboard, notifications, app management) with a deterministic ref system (@e1, @e2) scoped to stable snapshot IDs. Developers building AI agents that need to control real desktop applications — such as automating Slack, Finder, or any macOS app — use it as a reliable, token-efficient alternative to screenshot-based automation.
Prerequisites
- macOS 13.0 (Ventura) or later
- Node.js 18+ (for npx installation) or Rust 1.78+ (for building from source)
- macOS Accessibility permissions granted to your terminal application (System Settings > Privacy & Security > Accessibility)
- An MCP client such as Claude Desktop
Install agent-desktop via npm
Install the agent-desktop CLI globally using npm. This makes the 'agent-desktop' command available system-wide.
npm install -g agent-desktopGrant accessibility permissions
Agent Desktop requires macOS Accessibility API access. Run the permissions command to check current permission state and get guidance on what to enable.
agent-desktop permissions --requestTake your first accessibility snapshot
Capture a shallow accessibility tree overview of an application. The --skeleton flag limits depth to 3 levels for a fast, token-efficient overview.
agent-desktop snapshot --app Finder --skeleton --compactInteract with an element using refs
Use the ref (@e12) and snapshot ID returned from a snapshot to click or interact with a specific element. Refs are deterministic within a snapshot session.
# First take a snapshot to get refs
agent-desktop snapshot --app Finder -i --compact
# Then click using the ref and snapshot ID
agent-desktop click @e12 --snapshot s8f3k2p9Configure the MCP server
Add Agent Desktop to your MCP client configuration file so AI assistants can invoke desktop automation commands as tools.
{
"mcpServers": {
"agent-desktop": {
"command": "npx",
"args": ["agent-desktop"]
}
}
}Agent Desktop Examples
Client configuration
MCP client configuration for Agent Desktop on macOS.
{
"mcpServers": {
"agent-desktop": {
"command": "npx",
"args": ["agent-desktop"]
}
}
}Prompts to try
Example prompts to use with Agent Desktop through an MCP client.
- "Take a snapshot of the currently focused application and describe its UI"
- "Open Safari, navigate to https://example.com, and take a screenshot"
- "Type 'Hello World' into the currently focused text field"
- "Press Cmd+S to save the current document in the active application"
- "List all currently running applications on my Mac"Troubleshooting Agent Desktop
Commands fail with PERM_DENIED error code
Go to System Settings > Privacy & Security > Accessibility and add your terminal application (Terminal, iTerm2, or the app running your MCP client). Run 'agent-desktop permissions --request' again to verify the new state.
STALE_REF error when clicking an element
Refs are scoped to a snapshot ID and become invalid after the UI changes. Take a new snapshot to get fresh refs before interacting with elements.
Snapshots are very large and slow when --skeleton is not used
Use --skeleton for an initial overview (78-96% token reduction), then use --root @eN --snapshot <id> to drill into a specific sub-tree. Add --compact to omit empty nodes and --interactive-only (-i) to limit output to actionable elements.
Frequently Asked Questions about Agent Desktop
What is Agent Desktop?
Agent Desktop is a Model Context Protocol (MCP) server that native desktop automation cli for ai agents. control any application through os accessibility trees with structured json output and deterministic element refs. It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Agent Desktop?
Follow the installation instructions on the Agent Desktop GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with Agent Desktop?
Agent Desktop works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Agent Desktop free to use?
Yes, Agent Desktop is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.
Agent Desktop Alternatives — Similar Browser Automation Servers
Looking for alternatives to Agent Desktop? Here are other popular browser automation servers you can use with Claude, Cursor, and VS Code.
Chrome DevTools MCP
★ 40.6kAI-powered Chrome automation server with natural language element detection. Control Chrome browser through MCP protocol for testing, debugging, and performance analysis. Features 91% accuracy in element location, works with free AI models, and suppo
UI TARS Desktop
★ 34.9k📇 🏠 - Browser automation capabilities using Puppeteer, both support local and remote browser connection.
Playwright
★ 32.8kA production-ready browser automation server that enables AI assistants to interact with web pages using tools for navigation, element interaction, and data extraction. It features a built-in Inspector UI and robust crash recovery for reliable automa
Page Agent
★ 18.0kJavaScript in-page GUI agent. Control web interfaces with natural language.
Chrome
★ 11.7kAn extension-based MCP server that enables AI assistants to control your browser, leveraging existing sessions and login states for automation and content analysis. It provides over 20 tools for semantic tab search, interactive element manipulation,
LAMDA
★ 7.8kThe most powerful Android RPA agent framework, next generation mobile automation.
Browse More Browser Automation MCP Servers
Explore all browser automation servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Agent Desktop in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Agent Desktop?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.