Houtini

v1.0.0Coding Agentsstable

MCP server that saves Claude Code tokens by delegating bounded tasks to local or cloud LLMs. Works with LM Studio, Ollama, vLLM, DeepSeek, Groq, Cerebras.

ai-agentsclaudeclaude-mcpcode-generationdeveloper-tool
Share:
90
Stars
0
Downloads
0
Weekly
0/5

What is Houtini?

Houtini is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcp server that saves claude code tokens by delegating bounded tasks to local or cloud llms. works with lm studio, ollama, vllm, deepseek, groq, cerebras.

MCP server that saves Claude Code tokens by delegating bounded tasks to local or cloud LLMs. Works with LM Studio, Ollama, vLLM, DeepSeek, Groq, Cerebras.

This server falls under the Coding Agents category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • MCP server that saves Claude Code tokens by delegating bound

Use Cases

Delegate bounded coding tasks to local or cloud LLMs to save tokens.
Integrate with LM Studio, Ollama, vLLM, or Groq for inference.
Optimize token usage by offloading work to specialized models.
houtini-ai

Maintainer

LicenseMIT License
Languagejavascript
Versionv1.0.0
UpdatedMay 21, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx houtini-lm

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use Houtini

Houtini LM is an MCP server that reduces Claude Code token consumption by delegating bounded, repetitive tasks to local or cloud LLMs running on LM Studio, Ollama, vLLM, llama.cpp, OpenRouter, DeepSeek, Groq, or Cerebras. It exposes tools for chat, code analysis, multi-file code review, text embeddings, and provider discovery — all communicating with any OpenAI-compatible API endpoint. Developers use it to offload boilerplate generation, code review, format conversion, and other grunt work to cheaper or faster local models while keeping Claude focused on architecture and complex reasoning.

Prerequisites

  • Node.js 18+ installed
  • A running OpenAI-compatible LLM backend: LM Studio, Ollama, vLLM, llama.cpp (local), or an API key for Groq, DeepSeek, Cerebras, or OpenRouter (cloud)
  • npx available (comes with Node.js)
  • Claude Code or another MCP-compatible client
1

Set up your LLM backend

Start your local LLM server (e.g. LM Studio on port 1234, Ollama on port 11434) or obtain an API key from a cloud provider like Groq, DeepSeek, or OpenRouter.

# Example: start LM Studio server on default port
# (done via LM Studio GUI: Local Server > Start Server)

# Example: start Ollama
ollama serve
2

Add Houtini LM to Claude Code

Use the claude mcp add command to register the server. This is the recommended installation method for Claude Code.

claude mcp add houtini-lm -- npx -y @houtini/lm
3

Configure environment variables for your provider

Set HOUTINI_LM_ENDPOINT_URL to point at your LLM backend. For cloud providers that require authentication, set HOUTINI_LM_API_KEY. Optionally set HOUTINI_LM_MODEL to pin a specific model.

# For LM Studio (default, often no key needed)
export HOUTINI_LM_ENDPOINT_URL="http://localhost:1234"

# For Groq
export HOUTINI_LM_ENDPOINT_URL="https://api.groq.com/openai/v1"
export HOUTINI_LM_API_KEY="gsk_your_groq_key"
export HOUTINI_LM_MODEL="llama-3.1-8b-instant"

# For Ollama
export HOUTINI_LM_ENDPOINT_URL="http://localhost:11434/v1"
4

Add to MCP client configuration (alternative method)

If not using claude mcp add, register Houtini LM in your claude_desktop_config.json manually.

{
  "mcpServers": {
    "houtini-lm": {
      "command": "npx",
      "args": ["-y", "@houtini/lm"],
      "env": {
        "HOUTINI_LM_ENDPOINT_URL": "http://localhost:1234",
        "HOUTINI_LM_API_KEY": "",
        "HOUTINI_LM_MODEL": ""
      }
    }
  }
}
5

Verify the connection with the discover tool

Ask your AI assistant to call the discover tool, which performs a health check and reports the connected model's capabilities and measured performance.

Houtini Examples

Client configuration

Example claude_desktop_config.json for Houtini LM pointing at a local LM Studio instance.

{
  "mcpServers": {
    "houtini-lm": {
      "command": "npx",
      "args": ["-y", "@houtini/lm"],
      "env": {
        "HOUTINI_LM_ENDPOINT_URL": "http://localhost:1234",
        "HOUTINI_LM_MODEL": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF"
      }
    }
  }
}

Prompts to try

Example prompts for delegating tasks to a local or cloud LLM via Houtini.

- "Use the local LLM to generate boilerplate CRUD functions for a User model in TypeScript"
- "Delegate a code review of this file to the local model and summarize the findings"
- "Use Houtini to convert this Python function to Go"
- "Run the discover tool to check what model is connected and its performance stats"
- "List all models currently available on the local LLM server"
- "Generate embeddings for this text using the local embedding model"

Troubleshooting Houtini

Connection refused to HOUTINI_LM_ENDPOINT_URL

Ensure your LLM backend is running and listening on the configured port. For LM Studio, start the Local Server from the GUI. For Ollama, run 'ollama serve'. Verify with: curl http://localhost:1234/v1/models

Model not found or empty model list

Run the list_models tool to see what is available. For LM Studio, load a model in the GUI before starting the server. For Ollama, pull a model first: ollama pull llama3.1. If HOUTINI_LM_MODEL is set to a model that is not loaded, clear the variable to let Houtini auto-detect.

API key errors with cloud providers (Groq, DeepSeek, OpenRouter)

Set HOUTINI_LM_API_KEY to your provider's API key and HOUTINI_LM_ENDPOINT_URL to the provider's OpenAI-compatible base URL. For Groq: https://api.groq.com/openai/v1. For OpenRouter: https://openrouter.ai/api/v1. For DeepSeek: https://api.deepseek.com/v1.

Frequently Asked Questions about Houtini

What is Houtini?

Houtini is a Model Context Protocol (MCP) server that mcp server that saves claude code tokens by delegating bounded tasks to local or cloud llms. works with lm studio, ollama, vllm, deepseek, groq, cerebras. It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Houtini?

Follow the installation instructions on the Houtini GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Houtini?

Houtini works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Houtini free to use?

Yes, Houtini is open source and available under the MIT License license. You can use it freely in both personal and commercial projects.

Browse More Coding Agents MCP Servers

Explore all coding agents servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "houtini-lm": { "command": "npx", "args": ["-y", "houtini-lm"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Houtini?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides