Modular RAG

v1.0.0Knowledge & Memorystable

A pluggable and observable Retrieval-Augmented Generation framework that exposes hybrid search and document management tools via the Model Context Protocol. It features a complete ingestion pipeline with multi-modal support, automated evaluation usin

modular-rag-mcp-servermcpai-integration
Share:
922
Stars
0
Downloads
0
Weekly
0/5

What is Modular RAG?

Modular RAG is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to pluggable and observable retrieval-augmented generation framework that exposes hybrid search and document management tools via the model context protocol. it features a complete ingestion pipeline wit...

A pluggable and observable Retrieval-Augmented Generation framework that exposes hybrid search and document management tools via the Model Context Protocol. It features a complete ingestion pipeline with multi-modal support, automated evaluation usin

This server falls under the Knowledge & Memory category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • A pluggable and observable Retrieval-Augmented Generation fr

Use Cases

A pluggable and observable Retrieval-Augmented Generation framework that exposes
Bye-666

Maintainer

LicenseMIT License
Languagepython
Versionv1.0.0
UpdatedMay 21, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx modular-rag-mcp-server

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use Modular RAG

Modular RAG MCP Server is a pluggable Retrieval-Augmented Generation framework that exposes document search and knowledge management as MCP tools, enabling AI models to query a local vector knowledge base using hybrid BM25 and dense-embedding retrieval. It supports multiple LLM backends (OpenAI, Azure, Ollama, DeepSeek) and vector stores (Chroma, Qdrant), includes an observability dashboard, and uses a Ragas-based evaluation framework to measure retrieval quality. Teams building RAG pipelines connect it to Claude or other MCP clients so the AI can retrieve authoritative context from private document collections before generating answers.

Prerequisites

  • Python 3.10+ and pip installed
  • An API key for your chosen LLM provider (OpenAI, Azure OpenAI, or DeepSeek) or Ollama running locally
  • An API key or local instance for your chosen embedding service
  • An MCP-compatible client such as Claude Desktop or Cursor
  • Git to clone the repository
1

Clone the repository and install dependencies

Clone the RAG MCP Server repo, create a Python virtual environment, and install all required packages.

git clone https://github.com/Bye-666/RAG-MCP-SERVER.git
cd RAG-MCP-SERVER
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
2

Configure LLM, embeddings, and vector store

Edit config/settings.yaml to set your LLM provider, API key, model name, embedding provider and model, and your vector database choice (chroma or qdrant) with a persist directory.

# config/settings.yaml (key fields)
llm:
  provider: openai          # openai | azure | ollama | deepseek
  api_key: sk-...
  model: gpt-4o
embedding:
  provider: openai
  api_key: sk-...
  model: text-embedding-3-small
vector_db:
  provider: chroma
  persist_directory: ./data/chroma
retrieval:
  top_k: 5
  dense_weight: 0.7
  sparse_weight: 0.3
3

Ingest your documents

Place your source documents (PDFs, text files, images) in the data/input directory, then run the ingestion pipeline to chunk, embed, and store them in the vector database.

python scripts/ingest.py --input ./data/input
4

Start the MCP server

Launch the MCP server in stdio mode so your MCP client can connect to it. The server exposes three tools: query_knowledge_hub, list_collections, and get_document_summary.

python -m src.mcp_server
5

Add the server to your MCP client config

Configure your MCP client (e.g., Claude Desktop) to launch the server process. Update the path to match your virtual environment and cloned directory.

{
  "mcpServers": {
    "modular-rag": {
      "command": "/path/to/RAG-MCP-SERVER/.venv/bin/python",
      "args": ["-m", "src.mcp_server"],
      "cwd": "/path/to/RAG-MCP-SERVER"
    }
  }
}
6

(Optional) Launch the observability dashboard

Start the Streamlit dashboard to monitor retrieval quality, query history, and Ragas evaluation scores.

streamlit run src/observability/dashboard/app.py

Modular RAG Examples

Client configuration (Claude Desktop)

Full JSON config block to add the Modular RAG MCP Server to Claude Desktop:

{
  "mcpServers": {
    "modular-rag": {
      "command": "/path/to/RAG-MCP-SERVER/.venv/bin/python",
      "args": ["-m", "src.mcp_server"],
      "cwd": "/path/to/RAG-MCP-SERVER"
    }
  }
}

Prompts to try

Once connected, use these prompts to query your knowledge base:

- "What collections are available in the knowledge base?"
- "Search the knowledge hub for information about quarterly revenue trends"
- "Give me a summary of the document titled 'Product Roadmap Q3'"
- "Find the top 5 most relevant passages about data privacy compliance"
- "What does the knowledge base say about onboarding new employees?"

Troubleshooting Modular RAG

ModuleNotFoundError when starting the MCP server

Ensure you activated the virtual environment before running the server: source .venv/bin/activate. In Claude Desktop config, use the full absolute path to the venv Python binary.

No results returned from query_knowledge_hub

Check that your ingestion pipeline ran successfully and that the persist_directory in settings.yaml matches the path used during ingestion. Re-run python scripts/ingest.py if the vector store is empty.

Embedding API errors during ingestion

Verify the embedding provider api_key in config/settings.yaml is valid and the chosen model name is correct for your provider. For Ollama, ensure the Ollama server is running locally on the default port.

Frequently Asked Questions about Modular RAG

What is Modular RAG?

Modular RAG is a Model Context Protocol (MCP) server that pluggable and observable retrieval-augmented generation framework that exposes hybrid search and document management tools via the model context protocol. it features a complete ingestion pipeline with multi-modal support, automated evaluation usin It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Modular RAG?

Follow the installation instructions on the Modular RAG GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Modular RAG?

Modular RAG works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Modular RAG free to use?

Yes, Modular RAG is open source and available under the MIT License license. You can use it freely in both personal and commercial projects.

Browse More Knowledge & Memory MCP Servers

Explore all knowledge & memory servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "modular-rag-mcp-server": { "command": "npx", "args": ["-y", "modular-rag-mcp-server"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Modular RAG?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides