PDF Extraction

v1.0.0Search & Data Extractionstable

MCP server to extract contents from a PDF file

mcp-pdf-extractionmcpai-integration
Share:
29
Stars
0
Downloads
0
Weekly
0/5

What is PDF Extraction?

PDF Extraction is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcp server to extract contents from a pdf file

MCP server to extract contents from a PDF file

This server falls under the Search & Data Extraction category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • MCP server to extract contents from a PDF file

Use Cases

Extract content from PDF files for processing by Claude.
Automate document parsing in AI workflows.
xraywu

Maintainer

LicenseMIT
Languagepython
Versionv1.0.0
UpdatedMay 19, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx mcp-pdf-extraction

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use PDF Extraction

The PDF Extraction MCP Server gives Claude and other AI clients the ability to read content from local PDF files, including both text-based PDFs and scanned documents via OCR. It exposes a single focused tool — extract-pdf-contents — that accepts a local file path and an optional page selector, making it straightforward to pull specific pages or entire documents into an AI conversation for summarization, analysis, or data extraction. The server is a Python package built with the MCP SDK and depends on PyMuPDF, pytesseract, and pypdf2 for robust extraction and OCR support.

Prerequisites

  • Python 3.11 or higher
  • pip package manager
  • Tesseract OCR installed on the system for scanned PDF support (brew install tesseract on macOS)
  • Claude Desktop or another MCP client that supports stdio transport
1

Clone the repository

Clone the mcp-pdf-extraction-server repository to your machine.

git clone https://github.com/xraywu/mcp-pdf-extraction-server.git
cd mcp-pdf-extraction-server
2

Create a virtual environment and install

Set up an isolated Python environment and install the package in editable mode.

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .
3

Find the installed command path

Locate the pdf-extraction binary that was installed. You will need the full path for the MCP client configuration.

which pdf-extraction
# Example output: /opt/homebrew/Caskroom/miniconda/base/bin/pdf-extraction
4

Add to Claude Code CLI (optional)

If using Claude Code CLI, register the server with the full path found above.

claude mcp add pdf-extraction /full/path/to/pdf-extraction
claude mcp list
5

Configure Claude Desktop

Add the pdf-extraction server to claude_desktop_config.json using the full absolute path to the installed command.

6

Restart Claude Desktop and verify

Restart Claude Desktop and open a new session. Type /mcp to confirm the pdf-extraction server shows as connected.

PDF Extraction Examples

Client configuration

Claude Desktop configuration for the PDF Extraction server. Replace the path with the actual output of 'which pdf-extraction' on your system.

{
  "mcpServers": {
    "pdf-extraction": {
      "command": "/opt/homebrew/Caskroom/miniconda/base/bin/pdf-extraction"
    }
  }
}

Prompts to try

Example prompts for extracting and working with PDF content through Claude.

- "Extract the content from the PDF at /Users/me/documents/report.pdf"
- "Read pages 1-3 from /home/user/contracts/agreement.pdf and summarize the key terms"
- "Extract the last page of /tmp/invoice.pdf using page selector -1"
- "Extract all text from /data/scanned-document.pdf (it's a scanned image PDF)"
- "Read /Users/me/research/paper.pdf and list the references section"

Troubleshooting PDF Extraction

Server not connecting after being added to Claude Desktop

Make sure you started a completely new Claude session. The path must point to the binary in the same Python environment where you ran pip install. Test the path directly in a terminal: running the binary should hang waiting for input (that is correct behavior for stdio MCP servers).

OCR fails or returns empty text for scanned PDFs

Install Tesseract OCR on your system: brew install tesseract on macOS, or sudo apt-get install tesseract-ocr on Linux. Verify with: tesseract --version. The pytesseract Python package must also be installed (included in requirements.txt).

ModuleNotFoundError when the server starts

The binary must run in the same Python environment where you installed the package. If you used a venv, the binary inside the venv (venv/bin/pdf-extraction) uses that environment automatically. Alternatively, use the Python module form: claude mcp add pdf-extraction /path/to/python -m pdf_extraction.

Frequently Asked Questions about PDF Extraction

What is PDF Extraction?

PDF Extraction is a Model Context Protocol (MCP) server that mcp server to extract contents from a pdf file It connects AI assistants to external tools and data sources through a standardized interface.

How do I install PDF Extraction?

Follow the installation instructions on the PDF Extraction GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with PDF Extraction?

PDF Extraction works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is PDF Extraction free to use?

Yes, PDF Extraction is open source and available under the MIT license. You can use it freely in both personal and commercial projects.

Browse More Search & Data Extraction MCP Servers

Explore all search & data extraction servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "mcp-pdf-extraction": { "command": "npx", "args": ["-y", "mcp-pdf-extraction"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use PDF Extraction?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides