How do I install Modular RAG MCP Server?

Follow the setup instructions on the Modular RAG GitHub repository, then add the server configuration to your AI client.

What category is Modular RAG MCP Server?

Modular RAG is categorized under Knowledge & Memory. Browse more servers in these categories on MCPgee.

Modular RAG

Name: Modular RAG MCP Server
Author: Bye-666

v1.0.0•Knowledge & Memory•stable

A pluggable and observable Retrieval-Augmented Generation framework that exposes hybrid search and document management tools via the Model Context Protocol. It features a complete ingestion pipeline with multi-modal support, automated evaluation usin

modular-rag-mcp-servermcpai-integration

922

Stars

Downloads

Weekly

0/5

View on GitHub

What is Modular RAG?

Modular RAG is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to pluggable and observable retrieval-augmented generation framework that exposes hybrid search and document management tools via the model context protocol. it features a complete ingestion pipeline wit...

This server falls under the Knowledge & Memory category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

A pluggable and observable Retrieval-Augmented Generation fr

Use Cases

A pluggable and observable Retrieval-Augmented Generation framework that exposes

Bye-666

Maintainer

LicenseMIT License

Languagepython

Versionv1.0.0

UpdatedMay 21, 2026

Statushealthy

Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

View Source Browse All Servers

Installation

Manual Installation

npx modular-rag-mcp-server

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms

ThroughputMedium

Resource Usage

Memory UsageLow

CPU UsageLow

How to Set Up and Use Modular RAG

Modular RAG MCP Server is a pluggable Retrieval-Augmented Generation framework that exposes document search and knowledge management as MCP tools, enabling AI models to query a local vector knowledge base using hybrid BM25 and dense-embedding retrieval. It supports multiple LLM backends (OpenAI, Azure, Ollama, DeepSeek) and vector stores (Chroma, Qdrant), includes an observability dashboard, and uses a Ragas-based evaluation framework to measure retrieval quality. Teams building RAG pipelines connect it to Claude or other MCP clients so the AI can retrieve authoritative context from private document collections before generating answers.

Prerequisites

Python 3.10+ and pip installed
An API key for your chosen LLM provider (OpenAI, Azure OpenAI, or DeepSeek) or Ollama running locally
An API key or local instance for your chosen embedding service
An MCP-compatible client such as Claude Desktop or Cursor
Git to clone the repository

Clone the repository and install dependencies

Clone the RAG MCP Server repo, create a Python virtual environment, and install all required packages.

git clone https://github.com/Bye-666/RAG-MCP-SERVER.git
cd RAG-MCP-SERVER
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configure LLM, embeddings, and vector store

Edit config/settings.yaml to set your LLM provider, API key, model name, embedding provider and model, and your vector database choice (chroma or qdrant) with a persist directory.

# config/settings.yaml (key fields)
llm:
  provider: openai          # openai | azure | ollama | deepseek
  api_key: sk-...
  model: gpt-4o
embedding:
  provider: openai
  api_key: sk-...
  model: text-embedding-3-small
vector_db:
  provider: chroma
  persist_directory: ./data/chroma
retrieval:
  top_k: 5
  dense_weight: 0.7
  sparse_weight: 0.3

Ingest your documents

Place your source documents (PDFs, text files, images) in the data/input directory, then run the ingestion pipeline to chunk, embed, and store them in the vector database.

python scripts/ingest.py --input ./data/input

Start the MCP server

Launch the MCP server in stdio mode so your MCP client can connect to it. The server exposes three tools: query_knowledge_hub, list_collections, and get_document_summary.

python -m src.mcp_server

Add the server to your MCP client config

Configure your MCP client (e.g., Claude Desktop) to launch the server process. Update the path to match your virtual environment and cloned directory.

{
  "mcpServers": {
    "modular-rag": {
      "command": "/path/to/RAG-MCP-SERVER/.venv/bin/python",
      "args": ["-m", "src.mcp_server"],
      "cwd": "/path/to/RAG-MCP-SERVER"
    }
  }
}

(Optional) Launch the observability dashboard

Start the Streamlit dashboard to monitor retrieval quality, query history, and Ragas evaluation scores.

streamlit run src/observability/dashboard/app.py

Modular RAG Examples

Client configuration (Claude Desktop)

Full JSON config block to add the Modular RAG MCP Server to Claude Desktop:

{
  "mcpServers": {
    "modular-rag": {
      "command": "/path/to/RAG-MCP-SERVER/.venv/bin/python",
      "args": ["-m", "src.mcp_server"],
      "cwd": "/path/to/RAG-MCP-SERVER"
    }
  }
}

Prompts to try

Once connected, use these prompts to query your knowledge base:

- "What collections are available in the knowledge base?"
- "Search the knowledge hub for information about quarterly revenue trends"
- "Give me a summary of the document titled 'Product Roadmap Q3'"
- "Find the top 5 most relevant passages about data privacy compliance"
- "What does the knowledge base say about onboarding new employees?"

Troubleshooting Modular RAG

ModuleNotFoundError when starting the MCP server

Ensure you activated the virtual environment before running the server: source .venv/bin/activate. In Claude Desktop config, use the full absolute path to the venv Python binary.

No results returned from query_knowledge_hub

Check that your ingestion pipeline ran successfully and that the persist_directory in settings.yaml matches the path used during ingestion. Re-run python scripts/ingest.py if the vector store is empty.

Embedding API errors during ingestion

Verify the embedding provider api_key in config/settings.yaml is valid and the chosen model name is correct for your provider. For Ollama, ensure the Ollama server is running locally on the default port.

Frequently Asked Questions about Modular RAG

What is Modular RAG?

Modular RAG is a Model Context Protocol (MCP) server that pluggable and observable retrieval-augmented generation framework that exposes hybrid search and document management tools via the model context protocol. it features a complete ingestion pipeline with multi-modal support, automated evaluation usin It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Modular RAG?

Follow the installation instructions on the Modular RAG GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Modular RAG?

Modular RAG works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Modular RAG free to use?

Yes, Modular RAG is open source and available under the MIT License license. You can use it freely in both personal and commercial projects.

Learn More About MCP Servers

Getting Started with MCP

Set up your first MCP server in minutes

MCP Setup Guide

Configure MCP in Claude, Cursor & VS Code

All MCP Tutorials

18+ hands-on guides for developers

MCP FAQ

40+ answers about Model Context Protocol

Modular RAG Alternatives — Similar Knowledge & Memory Servers

Looking for alternatives to Modular RAG? Here are other popular knowledge & memory servers you can use with Claude, Cursor, and VS Code.

MemPalace

★ 52.6k

A local AI memory system that stores all conversations verbatim and organizes them into navigable structures. It provides 19 MCP tools for AI assistants to search and retrieve past decisions, debugging sessions, and architecture debates automatically

Kratos

★ 25.7k

🏛️ Memory System for AI Coding Tools - Never explain your codebase again. MCP server with perfect project isolation, 95.8% context accuracy, and the Four Pillars Framework.

Context Mode

★ 15.4k

An MCP server that preserves LLM context by intercepting large data outputs and returning only concise summaries or relevant sections. It enables efficient sandboxed code execution, file processing, and documentation indexing across multiple programm

Memu

★ 13.7k

Memory for 24/7 proactive agents like OpenClaw.

MemOS

★ 9.3k

MemOS (Memory Operating System) is a memory management operating system designed for AI applications. Its goal is: to enable your AI system to have long-term memory like a human, not only remembering what users have said but also actively invoking, u

Everos

★ 5.4k

Build, evaluate, and integrate long-term memory for self-evolving agents.

Browse More Knowledge & Memory MCP Servers

Explore all knowledge & memory servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Knowledge & Memory Browse All Servers

Set Up Modular RAG in Your Editor

Choose your AI client for step-by-step setup instructions.

🖥️

Claude Desktop

macOS & Windows app

⌨️

Claude Code

CLI & terminal

📝

Cursor

AI-first code editor

💻

VS Code

GitHub Copilot MCP

🏄

Windsurf

Codeium AI editor

🔌

Cline

VS Code extension

Quick Config Preview

{
  "mcpServers": {
    "modular-rag-mcp-server": {
      "command": "npx",
      "args": ["-y", "modular-rag-mcp-server"]
    }
  }
}

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Modular RAG?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides

Explore All Servers Read Our Guides