Houtini
MCP server that saves Claude Code tokens by delegating bounded tasks to local or cloud LLMs. Works with LM Studio, Ollama, vLLM, DeepSeek, Groq, Cerebras.
What is Houtini?
Houtini is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcp server that saves claude code tokens by delegating bounded tasks to local or cloud llms. works with lm studio, ollama, vllm, deepseek, groq, cerebras.
MCP server that saves Claude Code tokens by delegating bounded tasks to local or cloud LLMs. Works with LM Studio, Ollama, vLLM, DeepSeek, Groq, Cerebras.
This server falls under the Coding Agents category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- MCP server that saves Claude Code tokens by delegating bound
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx houtini-lmConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Houtini
Houtini LM is an MCP server that reduces Claude Code token consumption by delegating bounded, repetitive tasks to local or cloud LLMs running on LM Studio, Ollama, vLLM, llama.cpp, OpenRouter, DeepSeek, Groq, or Cerebras. It exposes tools for chat, code analysis, multi-file code review, text embeddings, and provider discovery — all communicating with any OpenAI-compatible API endpoint. Developers use it to offload boilerplate generation, code review, format conversion, and other grunt work to cheaper or faster local models while keeping Claude focused on architecture and complex reasoning.
Prerequisites
- Node.js 18+ installed
- A running OpenAI-compatible LLM backend: LM Studio, Ollama, vLLM, llama.cpp (local), or an API key for Groq, DeepSeek, Cerebras, or OpenRouter (cloud)
- npx available (comes with Node.js)
- Claude Code or another MCP-compatible client
Set up your LLM backend
Start your local LLM server (e.g. LM Studio on port 1234, Ollama on port 11434) or obtain an API key from a cloud provider like Groq, DeepSeek, or OpenRouter.
# Example: start LM Studio server on default port
# (done via LM Studio GUI: Local Server > Start Server)
# Example: start Ollama
ollama serveAdd Houtini LM to Claude Code
Use the claude mcp add command to register the server. This is the recommended installation method for Claude Code.
claude mcp add houtini-lm -- npx -y @houtini/lmConfigure environment variables for your provider
Set HOUTINI_LM_ENDPOINT_URL to point at your LLM backend. For cloud providers that require authentication, set HOUTINI_LM_API_KEY. Optionally set HOUTINI_LM_MODEL to pin a specific model.
# For LM Studio (default, often no key needed)
export HOUTINI_LM_ENDPOINT_URL="http://localhost:1234"
# For Groq
export HOUTINI_LM_ENDPOINT_URL="https://api.groq.com/openai/v1"
export HOUTINI_LM_API_KEY="gsk_your_groq_key"
export HOUTINI_LM_MODEL="llama-3.1-8b-instant"
# For Ollama
export HOUTINI_LM_ENDPOINT_URL="http://localhost:11434/v1"Add to MCP client configuration (alternative method)
If not using claude mcp add, register Houtini LM in your claude_desktop_config.json manually.
{
"mcpServers": {
"houtini-lm": {
"command": "npx",
"args": ["-y", "@houtini/lm"],
"env": {
"HOUTINI_LM_ENDPOINT_URL": "http://localhost:1234",
"HOUTINI_LM_API_KEY": "",
"HOUTINI_LM_MODEL": ""
}
}
}
}Verify the connection with the discover tool
Ask your AI assistant to call the discover tool, which performs a health check and reports the connected model's capabilities and measured performance.
Houtini Examples
Client configuration
Example claude_desktop_config.json for Houtini LM pointing at a local LM Studio instance.
{
"mcpServers": {
"houtini-lm": {
"command": "npx",
"args": ["-y", "@houtini/lm"],
"env": {
"HOUTINI_LM_ENDPOINT_URL": "http://localhost:1234",
"HOUTINI_LM_MODEL": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF"
}
}
}
}Prompts to try
Example prompts for delegating tasks to a local or cloud LLM via Houtini.
- "Use the local LLM to generate boilerplate CRUD functions for a User model in TypeScript"
- "Delegate a code review of this file to the local model and summarize the findings"
- "Use Houtini to convert this Python function to Go"
- "Run the discover tool to check what model is connected and its performance stats"
- "List all models currently available on the local LLM server"
- "Generate embeddings for this text using the local embedding model"Troubleshooting Houtini
Connection refused to HOUTINI_LM_ENDPOINT_URL
Ensure your LLM backend is running and listening on the configured port. For LM Studio, start the Local Server from the GUI. For Ollama, run 'ollama serve'. Verify with: curl http://localhost:1234/v1/models
Model not found or empty model list
Run the list_models tool to see what is available. For LM Studio, load a model in the GUI before starting the server. For Ollama, pull a model first: ollama pull llama3.1. If HOUTINI_LM_MODEL is set to a model that is not loaded, clear the variable to let Houtini auto-detect.
API key errors with cloud providers (Groq, DeepSeek, OpenRouter)
Set HOUTINI_LM_API_KEY to your provider's API key and HOUTINI_LM_ENDPOINT_URL to the provider's OpenAI-compatible base URL. For Groq: https://api.groq.com/openai/v1. For OpenRouter: https://openrouter.ai/api/v1. For DeepSeek: https://api.deepseek.com/v1.
Frequently Asked Questions about Houtini
What is Houtini?
Houtini is a Model Context Protocol (MCP) server that mcp server that saves claude code tokens by delegating bounded tasks to local or cloud llms. works with lm studio, ollama, vllm, deepseek, groq, cerebras. It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Houtini?
Follow the installation instructions on the Houtini GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with Houtini?
Houtini works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Houtini free to use?
Yes, Houtini is open source and available under the MIT License license. You can use it freely in both personal and commercial projects.
Houtini Alternatives — Similar Coding Agents Servers
Looking for alternatives to Houtini? Here are other popular coding agents servers you can use with Claude, Cursor, and VS Code.
Dify
★ 142.2kProduction-ready platform for agentic workflow development.
Ruflo
★ 54.0k🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integrat
Goose
★ 45.7kan open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
Antigravity Awesome Skills
★ 38.3kInstallable GitHub library of 1,400+ agentic skills for Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity, and more. Includes installer CLI, bundles, workflows, and official/community skill collections.
AgentScope
★ 25.5kBuild and run agents you can see, understand and trust.
Serena
★ 24.5kA coding agent toolkit that provides IDE-like semantic code retrieval and editing tools, enabling LLMs to efficiently navigate and modify codebases using symbol-level operations instead of basic file reading and string replacements.
Browse More Coding Agents MCP Servers
Explore all coding agents servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Houtini in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Houtini?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.