Late CLI
Orchestrate an entire AI dev team on 5GB VRAM. Ephemeral subagents, exact-match diffs. Single static binary, any model. Zero config, zero context bloat.
What is Late CLI?
Late CLI is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to orchestrate an entire ai dev team on 5gb vram. ephemeral subagents, exact-match diffs. single static binary, any model. zero config, zero context bloat.
Orchestrate an entire AI dev team on 5GB VRAM. Ephemeral subagents, exact-match diffs. Single static binary, any model. Zero config, zero context bloat.
This server falls under the Coding Agents category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- Orchestrate an entire AI dev team on 5GB VRAM. Ephemeral sub
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx late-cliConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Late CLI
Late is a Go-based AI coding CLI that orchestrates an entire AI development team using ephemeral subagents on as little as 5GB VRAM, making it practical for local LLM workflows without cloud APIs. It uses exact-match search/replace diffs with autonomous self-healing, supports hybrid model routing (a reasoning model for orchestration, a fast local model for execution), and integrates natively with MCP servers via standard I/O. Late works with Claude, DeepSeek, Qwen, Gemma, and any OpenAI-compatible API, shipping as a single static binary with zero configuration required for local llama.cpp models.
Prerequisites
- macOS, Linux, or Windows (WSL) operating system
- For local models: llama.cpp server running on port 8080 (no API key needed)
- For cloud models: OPENAI_API_KEY or compatible API key and OPENAI_BASE_URL pointing to your provider
- An MCP-compatible server if you want to extend Late with external tools (optional)
- Homebrew installed for the recommended macOS/Linux installation method
Install Late
Install using Homebrew on macOS or Linux. Alternative methods include the universal install script for other systems.
brew tap mlhher/late && brew install lateAlternative: universal install script
If Homebrew is not available, use the universal installer which supports Linux, macOS, and Windows WSL.
curl -sfL https://raw.githubusercontent.com/mlhher/late-cli/main/install.sh | bashConfigure environment variables for cloud models
For cloud-hosted models, set the API credentials. For local llama.cpp running on port 8080, no configuration is needed — Late connects automatically.
export OPENAI_BASE_URL=https://api.anthropic.com/v1
export OPENAI_API_KEY=your_api_key_here
export OPENAI_MODEL=claude-3-5-sonnet-20241022Start a coding session
Run Late in your project directory. It will read your codebase context and be ready to spawn subagents for individual tasks.
lateIntegrate an MCP server
Add an external MCP server to Late by pointing it to the server's stdio command. This maps MCP tools directly into Late's agent capabilities.
late --mcp-server "npx -y some-mcp-server"Use hybrid model routing
Configure Late to use a powerful reasoning model for orchestration while using a fast local model for code execution subagents, reducing cost and latency.
late --orchestrator-model claude-3-5-sonnet-20241022 \
--executor-model qwen2.5-coder:7bLate CLI Examples
Client configuration
MCP client configuration to connect Claude Desktop to Late CLI as an MCP server.
{
"mcpServers": {
"late-cli": {
"command": "npx",
"args": ["late-cli"],
"env": {
"OPENAI_API_KEY": "your_api_key_here",
"OPENAI_BASE_URL": "https://api.openai.com/v1",
"OPENAI_MODEL": "gpt-4o"
}
}
}
}Prompts to try
Example prompts for AI-assisted coding tasks using Late CLI.
- "Refactor the authentication module to use JWT instead of session cookies"
- "Add unit tests for all public methods in src/utils.go"
- "Find and fix all TypeScript type errors in the frontend directory"
- "Implement the TODO items in api/handlers.go and write corresponding tests"Troubleshooting Late CLI
Late cannot connect to a local model
Ensure your llama.cpp server is running on port 8080 with 'llama-server -m your-model.gguf --port 8080'. Late defaults to localhost:8080 for local models. If you changed the port, set OPENAI_BASE_URL=http://localhost:YOUR_PORT/v1.
Exact-match diffs fail and self-healing loops indefinitely
Self-healing triggers when the search string does not match current file content, often due to trailing whitespace or line-ending differences. Run 'late --dry-run' to preview proposed changes before applying. If the issue persists, ensure your editor is not reformatting files in ways that diverge from the model's view.
Homebrew tap not found or install fails
Try the universal install script as an alternative: 'curl -sfL https://raw.githubusercontent.com/mlhher/late-cli/main/install.sh | bash'. On Arch Linux, use 'yay -S late-cli-bin'. You can also download binaries directly from the GitHub releases page.
Frequently Asked Questions about Late CLI
What is Late CLI?
Late CLI is a Model Context Protocol (MCP) server that orchestrate an entire ai dev team on 5gb vram. ephemeral subagents, exact-match diffs. single static binary, any model. zero config, zero context bloat. It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Late CLI?
Follow the installation instructions on the Late CLI GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with Late CLI?
Late CLI works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Late CLI free to use?
Yes, Late CLI is open source and available under the NOASSERTION license. You can use it freely in both personal and commercial projects.
Late CLI Alternatives — Similar Coding Agents Servers
Looking for alternatives to Late CLI? Here are other popular coding agents servers you can use with Claude, Cursor, and VS Code.
Dify
★ 142.2kProduction-ready platform for agentic workflow development.
Ruflo
★ 54.0k🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integrat
Goose
★ 45.7kan open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
Antigravity Awesome Skills
★ 38.3kInstallable GitHub library of 1,400+ agentic skills for Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity, and more. Includes installer CLI, bundles, workflows, and official/community skill collections.
AgentScope
★ 25.5kBuild and run agents you can see, understand and trust.
Serena
★ 24.5kA coding agent toolkit that provides IDE-like semantic code retrieval and editing tools, enabling LLMs to efficiently navigate and modify codebases using symbol-level operations instead of basic file reading and string replacements.
Browse More Coding Agents MCP Servers
Explore all coding agents servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Late CLI in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Late CLI?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.