Headroom

v1.0.0Knowledge & Memorystable

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

agentaianthropicclaude-codecompression
Share:
1,935
Stars
0
Downloads
0
Weekly
0/5

What is Headroom?

Headroom is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to compress tool outputs, logs, files, and rag chunks before they reach the llm. 60-95% fewer tokens, same answers. library, proxy, mcp server.

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

This server falls under the Knowledge & Memory category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • Compress tool outputs, logs, files, and RAG chunks before th

Use Cases

Compress tool outputs and logs for LLMs
60-95% token reduction with same quality answers
chopratejas

Maintainer

LicenseApache-2.0
Languagepython
Versionv1.0.0
UpdatedMay 22, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx headroom

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use Headroom

Headroom is a context compression library, proxy, and MCP server that reduces the token count of tool outputs, logs, files, and RAG chunks by 60-95% before they reach an LLM, while preserving answer quality. It works in three deployment modes: as a Python/TypeScript library you call inline, as a drop-in HTTP proxy that intercepts requests to any LLM API, or as an MCP server exposing headroom_compress, headroom_retrieve, and headroom_stats tools. Engineering teams and AI agent developers use Headroom to reduce costs, fit more context into limited windows, and speed up inference on large codebases or log-heavy workflows.

Prerequisites

  • Python 3.10+ for the Python library or MCP server mode
  • Node.js 18+ for the TypeScript/npm package
  • Optional: Apple Silicon Mac for GPU-accelerated memory embedder (pytorch_mps)
  • Optional: HuggingFace model access for local embedding models (ONNX Runtime)
  • An MCP client such as Claude Desktop or Claude Code to use the MCP server mode
1

Install Headroom

Install the Python package with all optional extras for full MCP and compression functionality. Requires Python 3.10 or higher. For TypeScript projects, install the npm package instead.

# Python (full install)
pip install "headroom-ai[all]"

# Python with pipx
pipx install --python python3.13 "headroom-ai[all]"

# TypeScript/Node
npm install headroom-ai
2

Quick start: wrap your AI CLI tool

The fastest way to see Headroom working is to wrap an existing AI CLI tool. Headroom intercepts the context, compresses it, and forwards it transparently.

headroom wrap claude
headroom perf  # view compression savings after a session
3

Install the MCP server

Run the built-in install command to register Headroom as an MCP server with your configured MCP clients. This exposes the headroom_compress, headroom_retrieve, and headroom_stats tools.

headroom mcp install
4

Configure your MCP client manually (alternative)

If you prefer manual configuration, add the Headroom MCP server directly to your Claude Desktop or other MCP client config file.

{
  "mcpServers": {
    "headroom": {
      "command": "headroom",
      "args": ["mcp"],
      "env": {
        "HEADROOM_EMBEDDER_RUNTIME": "pytorch_mps"
      }
    }
  }
}
5

Use Headroom as a proxy for zero-code integration

Start Headroom in proxy mode to intercept and compress requests from any application that targets an LLM API. Point your app at the proxy port instead of the real API endpoint.

headroom proxy --port 8787
# Then point your app at http://localhost:8787 instead of api.anthropic.com
6

Use Headroom in Python code

For library-mode integration, import compress() and call it on your message list before sending to an LLM. It returns a compressed version of the messages array.

from headroom import compress

# Compress messages before sending to Claude
compressed = await compress(messages, model='claude-3-5-sonnet')
response = client.messages.create(model='claude-3-5-sonnet', messages=compressed)

Headroom Examples

Client configuration

Claude Desktop configuration to use Headroom as an MCP server, enabling the compression tools in your AI sessions.

{
  "mcpServers": {
    "headroom": {
      "command": "headroom",
      "args": ["mcp"],
      "env": {
        "HEADROOM_EMBEDDER_RUNTIME": "pytorch_mps"
      }
    }
  }
}

Prompts to try

Example requests to make once the Headroom MCP server is connected to your AI client.

- "Compress this 50,000 line log file before analyzing it for errors"
- "Use headroom_compress to reduce this JSON tool output and then summarize it"
- "Show me the headroom_stats for this session — how many tokens have been saved?"
- "Retrieve the most relevant sections from this large document about the error I'm seeing"

Troubleshooting Headroom

Installation fails with Python version error

Headroom requires Python 3.10 or higher. Check your version with 'python3 --version' and upgrade if needed, or use pyenv to manage multiple versions.

ONNX Runtime not found when using embedding-based compression

Set ORT_STRATEGY=system and ORT_LIB_LOCATION to the path of your onnxruntime shared library. Alternatively, install with 'pip install headroom-ai[ml]' which pulls in the correct ONNX Runtime wheel for your platform.

Proxy mode does not intercept requests from my application

Ensure your application's API base URL is set to http://localhost:8787 (or the custom port you specified). The proxy expects standard OpenAI-compatible API request format.

Frequently Asked Questions about Headroom

What is Headroom?

Headroom is a Model Context Protocol (MCP) server that compress tool outputs, logs, files, and rag chunks before they reach the llm. 60-95% fewer tokens, same answers. library, proxy, mcp server. It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Headroom?

Follow the installation instructions on the Headroom GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Headroom?

Headroom works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Headroom free to use?

Yes, Headroom is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Browse More Knowledge & Memory MCP Servers

Explore all knowledge & memory servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "headroom": { "command": "npx", "args": ["-y", "headroom"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Headroom?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides