Late CLI

v1.0.0Coding Agentsstable

Orchestrate an entire AI dev team on 5GB VRAM. Ephemeral subagents, exact-match diffs. Single static binary, any model. Zero config, zero context bloat.

agentai-agentai-coding-assistantautonomous-agentsclaude
Share:
312
Stars
0
Downloads
0
Weekly
0/5

What is Late CLI?

Late CLI is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to orchestrate an entire ai dev team on 5gb vram. ephemeral subagents, exact-match diffs. single static binary, any model. zero config, zero context bloat.

Orchestrate an entire AI dev team on 5GB VRAM. Ephemeral subagents, exact-match diffs. Single static binary, any model. Zero config, zero context bloat.

This server falls under the Coding Agents category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • Orchestrate an entire AI dev team on 5GB VRAM. Ephemeral sub

Use Cases

Orchestrate AI dev team on low VRAM
Ephemeral subagent spawning
Exact-match code diffs
mlhher

Maintainer

LicenseNOASSERTION
Languagego
Versionv1.0.0
UpdatedMay 21, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx late-cli

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use Late CLI

Late is a Go-based AI coding CLI that orchestrates an entire AI development team using ephemeral subagents on as little as 5GB VRAM, making it practical for local LLM workflows without cloud APIs. It uses exact-match search/replace diffs with autonomous self-healing, supports hybrid model routing (a reasoning model for orchestration, a fast local model for execution), and integrates natively with MCP servers via standard I/O. Late works with Claude, DeepSeek, Qwen, Gemma, and any OpenAI-compatible API, shipping as a single static binary with zero configuration required for local llama.cpp models.

Prerequisites

  • macOS, Linux, or Windows (WSL) operating system
  • For local models: llama.cpp server running on port 8080 (no API key needed)
  • For cloud models: OPENAI_API_KEY or compatible API key and OPENAI_BASE_URL pointing to your provider
  • An MCP-compatible server if you want to extend Late with external tools (optional)
  • Homebrew installed for the recommended macOS/Linux installation method
1

Install Late

Install using Homebrew on macOS or Linux. Alternative methods include the universal install script for other systems.

brew tap mlhher/late && brew install late
2

Alternative: universal install script

If Homebrew is not available, use the universal installer which supports Linux, macOS, and Windows WSL.

curl -sfL https://raw.githubusercontent.com/mlhher/late-cli/main/install.sh | bash
3

Configure environment variables for cloud models

For cloud-hosted models, set the API credentials. For local llama.cpp running on port 8080, no configuration is needed — Late connects automatically.

export OPENAI_BASE_URL=https://api.anthropic.com/v1
export OPENAI_API_KEY=your_api_key_here
export OPENAI_MODEL=claude-3-5-sonnet-20241022
4

Start a coding session

Run Late in your project directory. It will read your codebase context and be ready to spawn subagents for individual tasks.

late
5

Integrate an MCP server

Add an external MCP server to Late by pointing it to the server's stdio command. This maps MCP tools directly into Late's agent capabilities.

late --mcp-server "npx -y some-mcp-server"
6

Use hybrid model routing

Configure Late to use a powerful reasoning model for orchestration while using a fast local model for code execution subagents, reducing cost and latency.

late --orchestrator-model claude-3-5-sonnet-20241022 \
     --executor-model qwen2.5-coder:7b

Late CLI Examples

Client configuration

MCP client configuration to connect Claude Desktop to Late CLI as an MCP server.

{
  "mcpServers": {
    "late-cli": {
      "command": "npx",
      "args": ["late-cli"],
      "env": {
        "OPENAI_API_KEY": "your_api_key_here",
        "OPENAI_BASE_URL": "https://api.openai.com/v1",
        "OPENAI_MODEL": "gpt-4o"
      }
    }
  }
}

Prompts to try

Example prompts for AI-assisted coding tasks using Late CLI.

- "Refactor the authentication module to use JWT instead of session cookies"
- "Add unit tests for all public methods in src/utils.go"
- "Find and fix all TypeScript type errors in the frontend directory"
- "Implement the TODO items in api/handlers.go and write corresponding tests"

Troubleshooting Late CLI

Late cannot connect to a local model

Ensure your llama.cpp server is running on port 8080 with 'llama-server -m your-model.gguf --port 8080'. Late defaults to localhost:8080 for local models. If you changed the port, set OPENAI_BASE_URL=http://localhost:YOUR_PORT/v1.

Exact-match diffs fail and self-healing loops indefinitely

Self-healing triggers when the search string does not match current file content, often due to trailing whitespace or line-ending differences. Run 'late --dry-run' to preview proposed changes before applying. If the issue persists, ensure your editor is not reformatting files in ways that diverge from the model's view.

Homebrew tap not found or install fails

Try the universal install script as an alternative: 'curl -sfL https://raw.githubusercontent.com/mlhher/late-cli/main/install.sh | bash'. On Arch Linux, use 'yay -S late-cli-bin'. You can also download binaries directly from the GitHub releases page.

Frequently Asked Questions about Late CLI

What is Late CLI?

Late CLI is a Model Context Protocol (MCP) server that orchestrate an entire ai dev team on 5gb vram. ephemeral subagents, exact-match diffs. single static binary, any model. zero config, zero context bloat. It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Late CLI?

Follow the installation instructions on the Late CLI GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Late CLI?

Late CLI works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Late CLI free to use?

Yes, Late CLI is open source and available under the NOASSERTION license. You can use it freely in both personal and commercial projects.

Browse More Coding Agents MCP Servers

Explore all coding agents servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "late-cli": { "command": "npx", "args": ["-y", "late-cli"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Late CLI?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides