Headroom
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
What is Headroom?
Headroom is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to compress tool outputs, logs, files, and rag chunks before they reach the llm. 60-95% fewer tokens, same answers. library, proxy, mcp server.
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
This server falls under the Knowledge & Memory category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- Compress tool outputs, logs, files, and RAG chunks before th
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx headroomConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Headroom
Headroom is a context compression library, proxy, and MCP server that reduces the token count of tool outputs, logs, files, and RAG chunks by 60-95% before they reach an LLM, while preserving answer quality. It works in three deployment modes: as a Python/TypeScript library you call inline, as a drop-in HTTP proxy that intercepts requests to any LLM API, or as an MCP server exposing headroom_compress, headroom_retrieve, and headroom_stats tools. Engineering teams and AI agent developers use Headroom to reduce costs, fit more context into limited windows, and speed up inference on large codebases or log-heavy workflows.
Prerequisites
- Python 3.10+ for the Python library or MCP server mode
- Node.js 18+ for the TypeScript/npm package
- Optional: Apple Silicon Mac for GPU-accelerated memory embedder (pytorch_mps)
- Optional: HuggingFace model access for local embedding models (ONNX Runtime)
- An MCP client such as Claude Desktop or Claude Code to use the MCP server mode
Install Headroom
Install the Python package with all optional extras for full MCP and compression functionality. Requires Python 3.10 or higher. For TypeScript projects, install the npm package instead.
# Python (full install)
pip install "headroom-ai[all]"
# Python with pipx
pipx install --python python3.13 "headroom-ai[all]"
# TypeScript/Node
npm install headroom-aiQuick start: wrap your AI CLI tool
The fastest way to see Headroom working is to wrap an existing AI CLI tool. Headroom intercepts the context, compresses it, and forwards it transparently.
headroom wrap claude
headroom perf # view compression savings after a sessionInstall the MCP server
Run the built-in install command to register Headroom as an MCP server with your configured MCP clients. This exposes the headroom_compress, headroom_retrieve, and headroom_stats tools.
headroom mcp installConfigure your MCP client manually (alternative)
If you prefer manual configuration, add the Headroom MCP server directly to your Claude Desktop or other MCP client config file.
{
"mcpServers": {
"headroom": {
"command": "headroom",
"args": ["mcp"],
"env": {
"HEADROOM_EMBEDDER_RUNTIME": "pytorch_mps"
}
}
}
}Use Headroom as a proxy for zero-code integration
Start Headroom in proxy mode to intercept and compress requests from any application that targets an LLM API. Point your app at the proxy port instead of the real API endpoint.
headroom proxy --port 8787
# Then point your app at http://localhost:8787 instead of api.anthropic.comUse Headroom in Python code
For library-mode integration, import compress() and call it on your message list before sending to an LLM. It returns a compressed version of the messages array.
from headroom import compress
# Compress messages before sending to Claude
compressed = await compress(messages, model='claude-3-5-sonnet')
response = client.messages.create(model='claude-3-5-sonnet', messages=compressed)Headroom Examples
Client configuration
Claude Desktop configuration to use Headroom as an MCP server, enabling the compression tools in your AI sessions.
{
"mcpServers": {
"headroom": {
"command": "headroom",
"args": ["mcp"],
"env": {
"HEADROOM_EMBEDDER_RUNTIME": "pytorch_mps"
}
}
}
}Prompts to try
Example requests to make once the Headroom MCP server is connected to your AI client.
- "Compress this 50,000 line log file before analyzing it for errors"
- "Use headroom_compress to reduce this JSON tool output and then summarize it"
- "Show me the headroom_stats for this session — how many tokens have been saved?"
- "Retrieve the most relevant sections from this large document about the error I'm seeing"Troubleshooting Headroom
Installation fails with Python version error
Headroom requires Python 3.10 or higher. Check your version with 'python3 --version' and upgrade if needed, or use pyenv to manage multiple versions.
ONNX Runtime not found when using embedding-based compression
Set ORT_STRATEGY=system and ORT_LIB_LOCATION to the path of your onnxruntime shared library. Alternatively, install with 'pip install headroom-ai[ml]' which pulls in the correct ONNX Runtime wheel for your platform.
Proxy mode does not intercept requests from my application
Ensure your application's API base URL is set to http://localhost:8787 (or the custom port you specified). The proxy expects standard OpenAI-compatible API request format.
Frequently Asked Questions about Headroom
What is Headroom?
Headroom is a Model Context Protocol (MCP) server that compress tool outputs, logs, files, and rag chunks before they reach the llm. 60-95% fewer tokens, same answers. library, proxy, mcp server. It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Headroom?
Follow the installation instructions on the Headroom GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with Headroom?
Headroom works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Headroom free to use?
Yes, Headroom is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.
Headroom Alternatives — Similar Knowledge & Memory Servers
Looking for alternatives to Headroom? Here are other popular knowledge & memory servers you can use with Claude, Cursor, and VS Code.
MemPalace
★ 52.6kA local AI memory system that stores all conversations verbatim and organizes them into navigable structures. It provides 19 MCP tools for AI assistants to search and retrieve past decisions, debugging sessions, and architecture debates automatically
Kratos
★ 25.7k🏛️ Memory System for AI Coding Tools - Never explain your codebase again. MCP server with perfect project isolation, 95.8% context accuracy, and the Four Pillars Framework.
Context Mode
★ 15.4kAn MCP server that preserves LLM context by intercepting large data outputs and returning only concise summaries or relevant sections. It enables efficient sandboxed code execution, file processing, and documentation indexing across multiple programm
Memu
★ 13.7kMemory for 24/7 proactive agents like OpenClaw.
MemOS
★ 9.3kMemOS (Memory Operating System) is a memory management operating system designed for AI applications. Its goal is: to enable your AI system to have long-term memory like a human, not only remembering what users have said but also actively invoking, u
Everos
★ 5.4kBuild, evaluate, and integrate long-term memory for self-evolving agents.
Browse More Knowledge & Memory MCP Servers
Explore all knowledge & memory servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Headroom in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Headroom?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.