Mineru Tianshu
天枢 - 企业级 AI 一站式数据预处理平台 | PDF/Office转Markdown | 支持MCP协议AI助手集成 | Vue3+FastAPI全栈方案 | 文档解析 | 多模态信息提取
What is Mineru Tianshu?
Mineru Tianshu is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to 天枢 - 企业级 ai 一站式数据预处理平台 | pdf/office转markdown | 支持mcp协议ai助手集成 | vue3+fastapi全栈方案 | 文档解析 | 多模态信息提取
天枢 - 企业级 AI 一站式数据预处理平台 | PDF/Office转Markdown | 支持MCP协议AI助手集成 | Vue3+FastAPI全栈方案 | 文档解析 | 多模态信息提取
This server falls under the Search & Data Extraction and Cloud Services categories on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- 天枢 - 企业级 AI 一站式数据预处理平台 | PDF/Office转Markdown | 支持MCP协议AI助手集成
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx mineru-tianshuConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Mineru Tianshu
Mineru Tianshu (天枢) is an enterprise-grade AI data preprocessing platform built on MinerU, PaddleOCR-VL, and FastAPI that converts PDFs, Office documents, images, audio, and video into structured Markdown and JSON suitable for AI ingestion. It supports GPU acceleration, parallel processing of large PDFs (auto-split above 500 pages), multi-language OCR across 109+ languages, and exposes an MCP protocol endpoint so AI assistants like Claude can submit documents for processing and retrieve clean structured output.
Prerequisites
- Docker 20.10+ and Docker Compose 2.0+ (recommended deployment method)
- Node.js 18+ and Python 3.8+ (for local development deployment)
- NVIDIA Container Toolkit and CUDA-compatible GPU (optional, for GPU-accelerated OCR)
- An MCP client such as Claude Desktop that supports SSE transport
Clone the repository
Clone the mineru-tianshu repository to your server or local machine.
git clone https://github.com/magicyuan876/mineru-tianshu.git
cd mineru-tianshuDeploy using Docker Compose (recommended)
Use the provided Makefile or deployment script for a one-command setup. This starts the frontend (port 80), backend API (port 8000), worker (port 8001), and MCP server (port 8002).
make setup
# Or on Linux/macOS:
./scripts/docker-setup.sh
# Or on Windows:
scripts\docker-setup.batAlternatively, deploy locally without Docker
For local development, install backend dependencies and start all services. Use the --enable-mcp flag to activate the MCP endpoint.
cd backend
bash install.sh
python start_all.py --enable-mcpStart the frontend
In a separate terminal, install frontend dependencies and start the Vue 3 development server.
cd frontend
npm install
npm run devConfigure Claude Desktop to connect via SSE
Add the MCP server to your Claude Desktop configuration using SSE transport. The MCP server runs on port 8002 by default.
{
"mcpServers": {
"mineru-tianshu": {
"url": "http://localhost:8002/sse",
"transport": "sse"
}
}
}Submit documents for processing through Claude
With the MCP server connected, ask Claude to convert a document. The platform returns Markdown and JSON output with images uploaded to object storage if RustFS is configured.
Mineru Tianshu Examples
Client configuration
Claude Desktop configuration for Mineru Tianshu using SSE transport. The MCP server must be running at localhost:8002.
{
"mcpServers": {
"mineru-tianshu": {
"url": "http://localhost:8002/sse",
"transport": "sse"
}
}
}Prompts to try
Example prompts for converting documents and extracting information through the MCP interface.
- "Convert this PDF to Markdown: /path/to/document.pdf"
- "Extract all tables from this Word document and return them as JSON"
- "Process this scanned PDF and identify all figures and their captions"
- "Convert the uploaded PowerPoint file to Markdown preserving the slide structure"
- "Transcribe the audio from this MP4 file and identify different speakers"Troubleshooting Mineru Tianshu
Docker deployment fails with GPU-related errors
GPU support requires the NVIDIA Container Toolkit. Install it following the NVIDIA documentation, then run 'docker run --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi' to verify. If you do not have a GPU, the platform falls back to CPU processing, which is slower but fully functional.
MCP server at port 8002 is not reachable from Claude Desktop
Confirm the MCP service is running by checking 'docker-compose logs mcp' or 'make logs'. If you are running Claude Desktop on a different machine, replace 'localhost' with the server's IP address. Ensure port 8002 is open in any firewall rules.
Large PDF processing times out or fails
PDFs over 500 pages are automatically split into parallel sub-tasks. Adjust PDF_SPLIT_THRESHOLD_PAGES and PDF_SPLIT_CHUNK_SIZE in your .env file. For memory issues, adjust WORKER_MEMORY_LIMIT (default 16G) to match your available RAM.
Frequently Asked Questions about Mineru Tianshu
What is Mineru Tianshu?
Mineru Tianshu is a Model Context Protocol (MCP) server that 天枢 - 企业级 ai 一站式数据预处理平台 | pdf/office转markdown | 支持mcp协议ai助手集成 | vue3+fastapi全栈方案 | 文档解析 | 多模态信息提取 It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Mineru Tianshu?
Follow the installation instructions on the Mineru Tianshu GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with Mineru Tianshu?
Mineru Tianshu works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Mineru Tianshu free to use?
Yes, Mineru Tianshu is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.
Mineru Tianshu Alternatives — Similar Search & Data Extraction Servers
Looking for alternatives to Mineru Tianshu? Here are other popular search & data extraction servers you can use with Claude, Cursor, and VS Code.
TrendRadar
★ 58.0kA real-time hotspot monitoring and news aggregation assistant that provides AI-powered analysis of trending topics across multiple platforms via the Model Context Protocol. It enables users to track news and receive automated notifications through va
Scrapling
★ 52.7k🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
PDF Math Translate
★ 33.9k[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
GPT Researcher
★ 27.2kAn autonomous agent that conducts deep research on any data using any LLM providers
Agent Reach
★ 20.1kGive your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees.
Xiaohongshu
★ 13.7kMCP for xiaohongshu.com
Browse More Search & Data Extraction MCP Servers
Explore all search & data extraction servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Mineru Tianshu in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Mineru Tianshu?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.