LLM Search
Querying local documents, powered by LLM
What is LLM Search?
LLM Search is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to querying local documents, powered by llm
Querying local documents, powered by LLM
This server falls under the Search & Data Extraction category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- Querying local documents, powered by LLM
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx llm-searchConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use LLM Search
pyLLMSearch (llm-search) is a local RAG (Retrieval-Augmented Generation) system that exposes an MCP server, allowing AI assistants in Cursor, Windsurf, VS Code, or Claude to query your private document collections using natural language. It supports PDF, Markdown, and DOCX files and combines hybrid dense/sparse search, HyDE hypothetical document embeddings, cross-encoder re-ranking, and optional multi-querying to deliver highly relevant answers without sending your documents to the cloud. Developers and researchers use it to build a private knowledge base that their AI coding assistants can query directly.
Prerequisites
- Python 3.10 or higher
- pip or uv package manager
- An LLM API key (OpenAI API key, or a local model via Ollama/LiteLLM)
- A local folder of documents to index (PDF, Markdown, or DOCX files)
- An MCP-compatible client such as Cursor, Windsurf, or VS Code with Copilot
Install pyLLMSearch
Install the package from PyPI using pip. It is recommended to use a virtual environment to avoid dependency conflicts.
python -m venv .venv
source .venv/bin/activate
pip install llmsearchCreate a documents configuration YAML
Create a YAML configuration file that tells pyLLMSearch where your documents are, which embedding model to use, and how to configure search. Use the sample templates from the repository as a starting point.
# docs_config.yaml
documents:
- path: /path/to/your/documents
extensions: [".pdf", ".md", ".docx"]
embeddings:
model_name: sentence-transformers/all-MiniLM-L6-v2
persist_dir: /path/to/embeddings-store
search:
hyde_enabled: true
reranking_enabled: trueCreate an LLM model configuration YAML
Create a second YAML file that specifies which LLM to use for answering questions. You can use OpenAI, Azure OpenAI, or a local model via LiteLLM and Ollama.
# llm_config.yaml
model:
type: openai
model_name: gpt-4o
api_key: sk-your-openai-api-keyGenerate embeddings from your documents
Run the embedding generation step. This processes your documents and stores vector embeddings in the persist directory you configured. Re-run this step when you add new documents.
llmsearch index create --config docs_config.yamlStart the MCP server
Launch pyLLMSearch as an SSE-based MCP server. Clients like Cursor, Windsurf, and VS Code can connect to it for RAG-powered document queries.
llmsearch app mcp --docs-config docs_config.yaml --llm-config llm_config.yamlConfigure your MCP client
Add the MCP server endpoint to your editor's MCP configuration. The server listens on localhost:8080 by default over SSE.
{
"mcpServers": {
"llm-search": {
"url": "http://localhost:8080/sse"
}
}
}LLM Search Examples
Client configuration
MCP client configuration for connecting to a running pyLLMSearch SSE server. The server must be started separately before the client connects.
{
"mcpServers": {
"llm-search": {
"url": "http://localhost:8080/sse"
}
}
}Prompts to try
Example prompts to use once pyLLMSearch is connected as an MCP server in your editor.
- "Search my documents for information about the authentication flow"
- "What does the onboarding guide say about setting up a new developer account?"
- "Find all mentions of the deprecation policy in my internal documentation"
- "Summarize the key points from my architecture decision records about the database choice"
- "What are the troubleshooting steps documented for connection timeout errors?"Troubleshooting LLM Search
Embedding generation is slow or runs out of memory
Use a smaller embedding model such as sentence-transformers/all-MiniLM-L6-v2 instead of larger models. You can also reduce the chunk size in the documents configuration YAML. For large document sets, run the indexing step on a machine with more RAM or use the incremental update command to index new files only.
MCP server starts but the client cannot connect
Verify the server is listening by running `curl http://localhost:8080/sse`. Check that no firewall or port conflict is blocking port 8080. Ensure you are using the SSE endpoint URL (ending in /sse) and not the base URL in your client configuration.
Search results are irrelevant or the LLM gives wrong answers
Enable HyDE (hyde_enabled: true) and re-ranking (reranking_enabled: true) in your docs_config.yaml. Ensure your documents have been fully re-indexed after configuration changes by running `llmsearch index create` again. Check that the LLM model specified in llm_config.yaml has internet access or is available locally.
Frequently Asked Questions about LLM Search
What is LLM Search?
LLM Search is a Model Context Protocol (MCP) server that querying local documents, powered by llm It connects AI assistants to external tools and data sources through a standardized interface.
How do I install LLM Search?
Follow the installation instructions on the LLM Search GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with LLM Search?
LLM Search works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is LLM Search free to use?
Yes, LLM Search is open source and available under the MIT license. You can use it freely in both personal and commercial projects.
LLM Search Alternatives — Similar Search & Data Extraction Servers
Looking for alternatives to LLM Search? Here are other popular search & data extraction servers you can use with Claude, Cursor, and VS Code.
TrendRadar
★ 58.0kA real-time hotspot monitoring and news aggregation assistant that provides AI-powered analysis of trending topics across multiple platforms via the Model Context Protocol. It enables users to track news and receive automated notifications through va
Scrapling
★ 52.7k🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
PDF Math Translate
★ 33.9k[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
GPT Researcher
★ 27.2kAn autonomous agent that conducts deep research on any data using any LLM providers
Agent Reach
★ 20.1kGive your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees.
Xiaohongshu
★ 13.7kMCP for xiaohongshu.com
Browse More Search & Data Extraction MCP Servers
Explore all search & data extraction servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up LLM Search in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use LLM Search?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.