How do I install Haiku RAG MCP Server?

Install via pip: pip install pip install haiku-rag. Then configure your AI client to use this server.

What category is Haiku RAG MCP Server?

Haiku RAG is categorized under Knowledge & Memory. Browse more servers in these categories on MCPgee.

Haiku RAG

Name: Haiku Rag MCP Server
Author: ggozad

v0.48.1•Knowledge & Memory•stable

Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling

aidoclinglancedbmcpmcp-server

527

Stars

Downloads

Weekly

0/5

View on GitHub

What is Haiku RAG?

Haiku RAG is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to opinionated agentic rag powered by lancedb, pydantic ai, and docling

Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling

This server falls under the Knowledge & Memory category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

Opinionated agentic RAG powered by LanceDB, Pydantic AI, and

Use Cases

Agentic RAG with LanceDB

Document processing with Docling

LLM-powered retrieval

ggozad

Maintainer

LicenseMIT

Languagepython

Versionv0.48.1

UpdatedMay 21, 2026

Statushealthy

Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

View Source Browse All Servers

Installation

PIP

pip install haiku-rag

Manual Installation

pip install haiku-rag

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms

ThroughputMedium

Resource Usage

Memory UsageLow

CPU UsageLow

How to Set Up and Use Haiku RAG

Haiku RAG is an opinionated, local-first retrieval-augmented generation system built on LanceDB, Pydantic AI, and Docling that exposes its document management, search, and QA capabilities as an MCP server for AI assistants like Claude Desktop. It ingests PDFs, web pages, and other documents with structure-aware chunking, supports hybrid vector and full-text search with Reciprocal Rank Fusion, provides citation-backed question answering, multimodal search over embedded figures, and a conversational chat TUI — all without requiring an external database server. You would use it to give your AI assistant a persistent, searchable knowledge base from your own documents.

Prerequisites

Python 3.12 or newer
An embedding provider configured: Ollama (local, free), OpenAI, VoyageAI, LM Studio, or vLLM
pip or uv package manager
An MCP-compatible client such as Claude Desktop
Optional: a QA model API key (OpenAI, Anthropic, or any Pydantic AI-supported provider) for the ask and analyze commands

Install the haiku-rag package

Install the full package which includes Docling for document parsing, all embedding provider adapters, and rerankers. Python 3.12+ is required. Using uv is recommended for fast dependency resolution.

pip install haiku-rag
# or with uv:
uv pip install haiku-rag

Configure an embedding provider

Haiku RAG requires an embedding provider before indexing any documents. The quickest local option is Ollama. Install Ollama, pull a model, and set the environment variables haiku-rag expects.

# Example: use Ollama with nomic-embed-text
brew install ollama
ollama pull nomic-embed-text

export HAIKU_RAG_EMBEDDING_PROVIDER=ollama
export HAIKU_RAG_EMBEDDING_MODEL=nomic-embed-text

Index your first document

Use the haiku-rag CLI to add a PDF, local file, or URL to your LanceDB knowledge base. Docling parses the document structure and haiku-rag chunks and embeds it automatically.

haiku-rag add-src paper.pdf
haiku-rag add-src https://arxiv.org/pdf/1706.03762

Test search and QA from the command line

Verify the indexed content is searchable before connecting an MCP client. The ask command returns answers with page-number citations from the indexed documents.

haiku-rag search "attention mechanism"
haiku-rag ask "What datasets were used for evaluation?"
haiku-rag analyze "How many documents mention transformers?"

Start the MCP server and add it to Claude Desktop

Run haiku-rag in MCP stdio mode and register it in your Claude Desktop configuration. The server exposes tools for document management, hybrid search, citation-backed QA, and analysis.

# Test the MCP server manually:
haiku-rag mcp --stdio

# Add to ~/Library/Application Support/Claude/claude_desktop_config.json

Haiku RAG Examples

Client configuration

Claude Desktop configuration for the haiku-rag MCP server running in stdio mode. The server reads embedding provider settings from environment variables set at launch.

{
  "mcpServers": {
    "haiku-rag": {
      "command": "haiku-rag",
      "args": ["mcp", "--stdio"],
      "env": {
        "HAIKU_RAG_EMBEDDING_PROVIDER": "ollama",
        "HAIKU_RAG_EMBEDDING_MODEL": "nomic-embed-text"
      }
    }
  }
}

Prompts to try

These prompts exercise the document management, search, QA, and analysis tools haiku-rag exposes through the MCP server.

- "Index this PDF for me: /Users/me/reports/Q1-2025.pdf"
- "Search my knowledge base for information about transformer architectures."
- "What does the paper say about evaluation datasets? Include page citations."
- "How many documents in my knowledge base mention the word 'compliance'?"
- "Analyze all indexed documents and summarize the key findings across them."

Troubleshooting Haiku RAG

Install fails with 'externally-managed-environment' on macOS

Use uv pip install haiku-rag or create a virtual environment first: python3 -m venv ~/.haiku-rag-env && source ~/.haiku-rag-env/bin/activate && pip install haiku-rag. Then point the Claude Desktop config command at the venv's haiku-rag binary.

Embedding errors or empty search results after indexing

Verify your embedding provider is running and reachable. For Ollama, run ollama list to confirm the model is pulled and ollama serve is running. Check that HAIKU_RAG_EMBEDDING_PROVIDER and HAIKU_RAG_EMBEDDING_MODEL match an available provider and model.

The ask command returns answers without citations or with incorrect page numbers

Citations depend on Docling's structure-aware parsing. Very large or scanned PDFs may lose page metadata. Re-index with haiku-rag add-src --force to reprocess the document, and ensure haiku-rag full is installed (not the slim package) for full Docling support.

Frequently Asked Questions about Haiku RAG

What is Haiku RAG?

Haiku RAG is a Model Context Protocol (MCP) server that opinionated agentic rag powered by lancedb, pydantic ai, and docling It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Haiku RAG?

Install via pip with: pip install haiku-rag. Then configure your AI client to connect to this MCP server.

Which AI clients work with Haiku RAG?

Haiku RAG works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Haiku RAG free to use?

Yes, Haiku RAG is open source and available under the MIT license. You can use it freely in both personal and commercial projects.

Learn More About MCP Servers

Getting Started with MCP

Set up your first MCP server in minutes

MCP Setup Guide

Configure MCP in Claude, Cursor & VS Code

All MCP Tutorials

18+ hands-on guides for developers

MCP FAQ

40+ answers about Model Context Protocol

Haiku RAG Alternatives — Similar Knowledge & Memory Servers

Looking for alternatives to Haiku RAG? Here are other popular knowledge & memory servers you can use with Claude, Cursor, and VS Code.

MemPalace

★ 52.6k

A local AI memory system that stores all conversations verbatim and organizes them into navigable structures. It provides 19 MCP tools for AI assistants to search and retrieve past decisions, debugging sessions, and architecture debates automatically

Kratos

★ 25.7k

🏛️ Memory System for AI Coding Tools - Never explain your codebase again. MCP server with perfect project isolation, 95.8% context accuracy, and the Four Pillars Framework.

Context Mode

★ 15.4k

An MCP server that preserves LLM context by intercepting large data outputs and returning only concise summaries or relevant sections. It enables efficient sandboxed code execution, file processing, and documentation indexing across multiple programm

Memu

★ 13.7k

Memory for 24/7 proactive agents like OpenClaw.

MemOS

★ 9.3k

MemOS (Memory Operating System) is a memory management operating system designed for AI applications. Its goal is: to enable your AI system to have long-term memory like a human, not only remembering what users have said but also actively invoking, u

Everos

★ 5.4k

Build, evaluate, and integrate long-term memory for self-evolving agents.

Browse More Knowledge & Memory MCP Servers

Explore all knowledge & memory servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Knowledge & Memory Browse All Servers

Set Up Haiku RAG in Your Editor

Choose your AI client for step-by-step setup instructions.

🖥️

Claude Desktop

macOS & Windows app

⌨️

Claude Code

CLI & terminal

📝

Cursor

AI-first code editor

💻

VS Code

GitHub Copilot MCP

🏄

Windsurf

Codeium AI editor

🔌

Cline

VS Code extension

Quick Config Preview

{
  "mcpServers": {
    "haiku-rag": {
      "command": "pip",
      "args": ["install", "haiku-rag"]
    }
  }
}

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Haiku RAG?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides

Explore All Servers Read Our Guides