Modular RAG
A pluggable and observable Retrieval-Augmented Generation framework that exposes hybrid search and document management tools via the Model Context Protocol. It features a complete ingestion pipeline with multi-modal support, automated evaluation usin
What is Modular RAG?
Modular RAG is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to pluggable and observable retrieval-augmented generation framework that exposes hybrid search and document management tools via the model context protocol. it features a complete ingestion pipeline wit...
A pluggable and observable Retrieval-Augmented Generation framework that exposes hybrid search and document management tools via the Model Context Protocol. It features a complete ingestion pipeline with multi-modal support, automated evaluation usin
This server falls under the Knowledge & Memory category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- A pluggable and observable Retrieval-Augmented Generation fr
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx modular-rag-mcp-serverConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Modular RAG
Modular RAG MCP Server is a pluggable Retrieval-Augmented Generation framework that exposes document search and knowledge management as MCP tools, enabling AI models to query a local vector knowledge base using hybrid BM25 and dense-embedding retrieval. It supports multiple LLM backends (OpenAI, Azure, Ollama, DeepSeek) and vector stores (Chroma, Qdrant), includes an observability dashboard, and uses a Ragas-based evaluation framework to measure retrieval quality. Teams building RAG pipelines connect it to Claude or other MCP clients so the AI can retrieve authoritative context from private document collections before generating answers.
Prerequisites
- Python 3.10+ and pip installed
- An API key for your chosen LLM provider (OpenAI, Azure OpenAI, or DeepSeek) or Ollama running locally
- An API key or local instance for your chosen embedding service
- An MCP-compatible client such as Claude Desktop or Cursor
- Git to clone the repository
Clone the repository and install dependencies
Clone the RAG MCP Server repo, create a Python virtual environment, and install all required packages.
git clone https://github.com/Bye-666/RAG-MCP-SERVER.git
cd RAG-MCP-SERVER
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtConfigure LLM, embeddings, and vector store
Edit config/settings.yaml to set your LLM provider, API key, model name, embedding provider and model, and your vector database choice (chroma or qdrant) with a persist directory.
# config/settings.yaml (key fields)
llm:
provider: openai # openai | azure | ollama | deepseek
api_key: sk-...
model: gpt-4o
embedding:
provider: openai
api_key: sk-...
model: text-embedding-3-small
vector_db:
provider: chroma
persist_directory: ./data/chroma
retrieval:
top_k: 5
dense_weight: 0.7
sparse_weight: 0.3Ingest your documents
Place your source documents (PDFs, text files, images) in the data/input directory, then run the ingestion pipeline to chunk, embed, and store them in the vector database.
python scripts/ingest.py --input ./data/inputStart the MCP server
Launch the MCP server in stdio mode so your MCP client can connect to it. The server exposes three tools: query_knowledge_hub, list_collections, and get_document_summary.
python -m src.mcp_serverAdd the server to your MCP client config
Configure your MCP client (e.g., Claude Desktop) to launch the server process. Update the path to match your virtual environment and cloned directory.
{
"mcpServers": {
"modular-rag": {
"command": "/path/to/RAG-MCP-SERVER/.venv/bin/python",
"args": ["-m", "src.mcp_server"],
"cwd": "/path/to/RAG-MCP-SERVER"
}
}
}(Optional) Launch the observability dashboard
Start the Streamlit dashboard to monitor retrieval quality, query history, and Ragas evaluation scores.
streamlit run src/observability/dashboard/app.pyModular RAG Examples
Client configuration (Claude Desktop)
Full JSON config block to add the Modular RAG MCP Server to Claude Desktop:
{
"mcpServers": {
"modular-rag": {
"command": "/path/to/RAG-MCP-SERVER/.venv/bin/python",
"args": ["-m", "src.mcp_server"],
"cwd": "/path/to/RAG-MCP-SERVER"
}
}
}Prompts to try
Once connected, use these prompts to query your knowledge base:
- "What collections are available in the knowledge base?"
- "Search the knowledge hub for information about quarterly revenue trends"
- "Give me a summary of the document titled 'Product Roadmap Q3'"
- "Find the top 5 most relevant passages about data privacy compliance"
- "What does the knowledge base say about onboarding new employees?"Troubleshooting Modular RAG
ModuleNotFoundError when starting the MCP server
Ensure you activated the virtual environment before running the server: source .venv/bin/activate. In Claude Desktop config, use the full absolute path to the venv Python binary.
No results returned from query_knowledge_hub
Check that your ingestion pipeline ran successfully and that the persist_directory in settings.yaml matches the path used during ingestion. Re-run python scripts/ingest.py if the vector store is empty.
Embedding API errors during ingestion
Verify the embedding provider api_key in config/settings.yaml is valid and the chosen model name is correct for your provider. For Ollama, ensure the Ollama server is running locally on the default port.
Frequently Asked Questions about Modular RAG
What is Modular RAG?
Modular RAG is a Model Context Protocol (MCP) server that pluggable and observable retrieval-augmented generation framework that exposes hybrid search and document management tools via the model context protocol. it features a complete ingestion pipeline with multi-modal support, automated evaluation usin It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Modular RAG?
Follow the installation instructions on the Modular RAG GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with Modular RAG?
Modular RAG works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Modular RAG free to use?
Yes, Modular RAG is open source and available under the MIT License license. You can use it freely in both personal and commercial projects.
Modular RAG Alternatives — Similar Knowledge & Memory Servers
Looking for alternatives to Modular RAG? Here are other popular knowledge & memory servers you can use with Claude, Cursor, and VS Code.
MemPalace
★ 52.6kA local AI memory system that stores all conversations verbatim and organizes them into navigable structures. It provides 19 MCP tools for AI assistants to search and retrieve past decisions, debugging sessions, and architecture debates automatically
Kratos
★ 25.7k🏛️ Memory System for AI Coding Tools - Never explain your codebase again. MCP server with perfect project isolation, 95.8% context accuracy, and the Four Pillars Framework.
Context Mode
★ 15.4kAn MCP server that preserves LLM context by intercepting large data outputs and returning only concise summaries or relevant sections. It enables efficient sandboxed code execution, file processing, and documentation indexing across multiple programm
Memu
★ 13.7kMemory for 24/7 proactive agents like OpenClaw.
MemOS
★ 9.3kMemOS (Memory Operating System) is a memory management operating system designed for AI applications. Its goal is: to enable your AI system to have long-term memory like a human, not only remembering what users have said but also actively invoking, u
Everos
★ 5.4kBuild, evaluate, and integrate long-term memory for self-evolving agents.
Browse More Knowledge & Memory MCP Servers
Explore all knowledge & memory servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Modular RAG in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Modular RAG?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.