What category is Web Crawl MCP Server?

Web Crawl is categorized under Search & Data Extraction. Browse more servers in these categories on MCPgee.

Web Crawl

Name: mcp-server-webcrawl
Author: pragmar

v1.0.0•Search & Data Extraction•stable

Bridge the gap between your web crawl and AI language models. With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously, extracting insights from your web content. Supports WARC, wget, InterroBot,

archiveboxhttrackinterrobotkatanaknowledgebase

Stars

Downloads

Weekly

0/5

View on GitHub

What is Web Crawl?

Web Crawl is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to bridge the gap between your web crawl and ai language models. with mcp-server-webcrawl, your ai client filters and analyzes web content under your direction or autonomously, extracting insights from y...

This server falls under the Search & Data Extraction category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

Bridge the gap between your web crawl and AI language models

Use Cases

Analyze web content using WARC, wget, and Katana.

Filter and extract insights from your web crawls.

Support InterroBot and ArchiveBox for comprehensive indexing.

pragmar

Maintainer

LicenseNOASSERTION

Languagepython

Versionv1.0.0

UpdatedMay 12, 2026

Statushealthy

Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

View Source Browse All Servers

Installation

NPM

npx -y mcp-server-webcrawl

Manual Installation

npx -y mcp-server-webcrawl

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms

ThroughputMedium

Resource Usage

Memory UsageLow

CPU UsageLow

How to Set Up and Use Web Crawl

mcp-server-webcrawl bridges offline web crawl archives with AI language models, letting your AI client search, filter, and extract insights from crawls you have already collected. It supports multiple crawler formats including WARC, wget mirrors, ArchiveBox exports, HTTrack sites, Katana results, and InterroBot databases. Using a Boolean field-query syntax, the server lets Claude find specific pages by URL pattern, content keywords, HTTP status, MIME type, and more — without re-fetching anything from the live web.

Prerequisites

Python 3.10 or higher
pip to install the package
Claude Desktop or another MCP-compatible client
An existing web crawl in a supported format: WARC, wget, ArchiveBox, HTTrack, Katana, or InterroBot

Install mcp-server-webcrawl via pip

Install the package from PyPI. This provides both the MCP server and an interactive command-line mode for testing.

pip install mcp-server-webcrawl

Test in interactive mode

Before connecting to Claude, verify the server can read your crawl by running it interactively. Point it at the root directory of your crawl archive.

mcp-server-webcrawl --interactive --datasource /path/to/your/crawl

Configure Claude Desktop

Add the server to your Claude Desktop MCP config. The --datasource argument should point to the root of your crawl directory or WARC file.

{
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
      "args": ["--datasource", "/path/to/your/crawl"]
    }
  }
}

Restart Claude Desktop

Quit and reopen Claude Desktop to load the webcrawl server. Claude will now be able to search and analyze your local crawl data.

Run a search query

Ask Claude to search your crawl. The server supports Boolean queries with field prefixes for fine-grained filtering.

Web Crawl Examples

Client configuration

Claude Desktop config pointing mcp-server-webcrawl at a local WARC or wget crawl directory.

{
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
      "args": ["--datasource", "/path/to/your/crawl"]
    }
  }
}

Prompts to try

Example prompts that use the server's Boolean search syntax and field-specific queries.

- "Search my crawl for all HTML pages that mention 'privacy policy' and return 200 status"
- "Find all PDF files in the crawl larger than 1MB"
- "Search for pages containing login forms: query 'form AND (login OR signin) AND status: 200'"
- "Run an SEO audit prompt on my crawl and identify pages missing title tags"
- "Find all broken links in the crawl where status is 404 or 500"

Troubleshooting Web Crawl

Server starts but returns no results for any query

Confirm the --datasource path points to the correct crawl root directory. Run 'mcp-server-webcrawl --interactive --datasource /your/path' in a terminal to test queries directly before using it with Claude.

Unsupported crawler format error

The server supports WARC, wget, ArchiveBox, HTTrack, Katana, SiteOne, and InterroBot formats. Ensure your crawl was produced by a supported tool and that the directory structure is intact. Check the project's GitHub for format-specific setup guides.

mcp-server-webcrawl command not found after pip install

The pip install location may not be on your PATH. Try 'python -m mcp_server_webcrawl --datasource /path/to/crawl' or use 'pip install --user' and add ~/.local/bin to your PATH.

Frequently Asked Questions about Web Crawl

What is Web Crawl?

Web Crawl is a Model Context Protocol (MCP) server that bridge the gap between your web crawl and ai language models. with mcp-server-webcrawl, your ai client filters and analyzes web content under your direction or autonomously, extracting insights from your web content. supports warc, wget, interrobot, It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Web Crawl?

Install via npm with the command: npx -y mcp-server-webcrawl. Then add the server configuration to your AI client's JSON config file (e.g., claude_desktop_config.json or .cursor/mcp.json).

Which AI clients work with Web Crawl?

Web Crawl works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Web Crawl free to use?

Yes, Web Crawl is open source and available under the NOASSERTION license. You can use it freely in both personal and commercial projects.

Learn More About MCP Servers

Getting Started with MCP

Set up your first MCP server in minutes

MCP Setup Guide

Configure MCP in Claude, Cursor & VS Code

All MCP Tutorials

18+ hands-on guides for developers

MCP FAQ

40+ answers about Model Context Protocol

Web Crawl Alternatives — Similar Search & Data Extraction Servers

Looking for alternatives to Web Crawl? Here are other popular search & data extraction servers you can use with Claude, Cursor, and VS Code.

TrendRadar

★ 58.0k

A real-time hotspot monitoring and news aggregation assistant that provides AI-powered analysis of trending topics across multiple platforms via the Model Context Protocol. It enables users to track news and receive automated notifications through va

Scrapling

★ 52.7k

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

PDF Math Translate

★ 33.9k

[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译，支持 Google/DeepL/Ollama/OpenAI 等服务，提供 CLI/GUI/MCP/Docker/Zotero

GPT Researcher

★ 27.2k

An autonomous agent that conducts deep research on any data using any LLM providers

Agent Reach

★ 20.1k

Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees.

Xiaohongshu

★ 13.7k

MCP for xiaohongshu.com

Browse More Search & Data Extraction MCP Servers

Explore all search & data extraction servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Search & Data Extraction Browse All Servers

Set Up Web Crawl in Your Editor

Choose your AI client for step-by-step setup instructions.

🖥️

Claude Desktop

macOS & Windows app

⌨️

Claude Code

CLI & terminal

📝

Cursor

AI-first code editor

💻

VS Code

GitHub Copilot MCP

🏄

Windsurf

Codeium AI editor

🔌

Cline

VS Code extension

Quick Config Preview

{
  "mcpServers": {
    "webcrawl": {
      "command": "npx",
      "args": ["-y", "mcp-server-webcrawl"]
    }
  }
}

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Web Crawl?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides

Explore All Servers Read Our Guides