PDF Extraction
MCP server to extract contents from a PDF file
What is PDF Extraction?
PDF Extraction is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcp server to extract contents from a pdf file
MCP server to extract contents from a PDF file
This server falls under the Search & Data Extraction category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- MCP server to extract contents from a PDF file
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx mcp-pdf-extractionConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use PDF Extraction
The PDF Extraction MCP Server gives Claude and other AI clients the ability to read content from local PDF files, including both text-based PDFs and scanned documents via OCR. It exposes a single focused tool — extract-pdf-contents — that accepts a local file path and an optional page selector, making it straightforward to pull specific pages or entire documents into an AI conversation for summarization, analysis, or data extraction. The server is a Python package built with the MCP SDK and depends on PyMuPDF, pytesseract, and pypdf2 for robust extraction and OCR support.
Prerequisites
- Python 3.11 or higher
- pip package manager
- Tesseract OCR installed on the system for scanned PDF support (brew install tesseract on macOS)
- Claude Desktop or another MCP client that supports stdio transport
Clone the repository
Clone the mcp-pdf-extraction-server repository to your machine.
git clone https://github.com/xraywu/mcp-pdf-extraction-server.git
cd mcp-pdf-extraction-serverCreate a virtual environment and install
Set up an isolated Python environment and install the package in editable mode.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .Find the installed command path
Locate the pdf-extraction binary that was installed. You will need the full path for the MCP client configuration.
which pdf-extraction
# Example output: /opt/homebrew/Caskroom/miniconda/base/bin/pdf-extractionAdd to Claude Code CLI (optional)
If using Claude Code CLI, register the server with the full path found above.
claude mcp add pdf-extraction /full/path/to/pdf-extraction
claude mcp listConfigure Claude Desktop
Add the pdf-extraction server to claude_desktop_config.json using the full absolute path to the installed command.
Restart Claude Desktop and verify
Restart Claude Desktop and open a new session. Type /mcp to confirm the pdf-extraction server shows as connected.
PDF Extraction Examples
Client configuration
Claude Desktop configuration for the PDF Extraction server. Replace the path with the actual output of 'which pdf-extraction' on your system.
{
"mcpServers": {
"pdf-extraction": {
"command": "/opt/homebrew/Caskroom/miniconda/base/bin/pdf-extraction"
}
}
}Prompts to try
Example prompts for extracting and working with PDF content through Claude.
- "Extract the content from the PDF at /Users/me/documents/report.pdf"
- "Read pages 1-3 from /home/user/contracts/agreement.pdf and summarize the key terms"
- "Extract the last page of /tmp/invoice.pdf using page selector -1"
- "Extract all text from /data/scanned-document.pdf (it's a scanned image PDF)"
- "Read /Users/me/research/paper.pdf and list the references section"Troubleshooting PDF Extraction
Server not connecting after being added to Claude Desktop
Make sure you started a completely new Claude session. The path must point to the binary in the same Python environment where you ran pip install. Test the path directly in a terminal: running the binary should hang waiting for input (that is correct behavior for stdio MCP servers).
OCR fails or returns empty text for scanned PDFs
Install Tesseract OCR on your system: brew install tesseract on macOS, or sudo apt-get install tesseract-ocr on Linux. Verify with: tesseract --version. The pytesseract Python package must also be installed (included in requirements.txt).
ModuleNotFoundError when the server starts
The binary must run in the same Python environment where you installed the package. If you used a venv, the binary inside the venv (venv/bin/pdf-extraction) uses that environment automatically. Alternatively, use the Python module form: claude mcp add pdf-extraction /path/to/python -m pdf_extraction.
Frequently Asked Questions about PDF Extraction
What is PDF Extraction?
PDF Extraction is a Model Context Protocol (MCP) server that mcp server to extract contents from a pdf file It connects AI assistants to external tools and data sources through a standardized interface.
How do I install PDF Extraction?
Follow the installation instructions on the PDF Extraction GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with PDF Extraction?
PDF Extraction works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is PDF Extraction free to use?
Yes, PDF Extraction is open source and available under the MIT license. You can use it freely in both personal and commercial projects.
PDF Extraction Alternatives — Similar Search & Data Extraction Servers
Looking for alternatives to PDF Extraction? Here are other popular search & data extraction servers you can use with Claude, Cursor, and VS Code.
TrendRadar
★ 58.0kA real-time hotspot monitoring and news aggregation assistant that provides AI-powered analysis of trending topics across multiple platforms via the Model Context Protocol. It enables users to track news and receive automated notifications through va
Scrapling
★ 52.7k🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
PDF Math Translate
★ 33.9k[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
GPT Researcher
★ 27.2kAn autonomous agent that conducts deep research on any data using any LLM providers
Agent Reach
★ 20.1kGive your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees.
Xiaohongshu
★ 13.7kMCP for xiaohongshu.com
Browse More Search & Data Extraction MCP Servers
Explore all search & data extraction servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up PDF Extraction in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use PDF Extraction?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.