May 20, 2026
14 min read

Puppeteer MCP Server: Complete Web Scraping Guide

Learn how to use the Puppeteer MCP server for web scraping, screenshots, PDF generation, and browser automation with Claude Desktop and Cursor.

MCPgee Team

MCPgee Team

MCP Expert

PuppeteerWeb ScrapingBrowser AutomationClaude DesktopCursor

What Is the Puppeteer MCP Server?

The Puppeteer MCP server gives AI assistants the ability to control a real web browser. Instead of just fetching HTML like a simple HTTP client, Puppeteer launches an actual Chromium browser instance that can render JavaScript, interact with dynamic content, fill out forms, take screenshots, and generate PDFs. Through MCP (Model Context Protocol), your AI assistant - whether Claude Desktop or Cursor - can drive this browser using natural language commands.

This is fundamentally different from search-based tools like Brave Search or Exa Search. Those tools return search results or pre-processed text. Puppeteer gives the AI full browser control: it can navigate to any URL, wait for content to load, click buttons, scroll pages, extract structured data, and capture visual snapshots. If you have ever used Puppeteer in a Node.js script, imagine replacing all that code with a single sentence to Claude.

Common use cases include scraping product data from e-commerce sites, monitoring web pages for changes, automating form submissions, generating PDF reports from web dashboards, and taking screenshots for documentation or testing. This guide covers all of them with real prompts you can copy and use immediately.

Setting Up Puppeteer MCP Server

Prerequisites

  • Node.js 18+ - verify with node --version
  • Claude Desktop or Cursor - latest version
  • Chromium - Puppeteer downloads its own Chromium binary, so you typically do not need to install Chrome separately

Claude Desktop Setup

Open your claude_desktop_config.json file (see paths below) and add the Puppeteer server:

OS Config Path
macOS ~/Library/Application Support/Claude/claude_desktop_config.json
Windows %APPDATA%\Claude\claude_desktop_config.json
Linux ~/.config/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"]
    }
  }
}

Save the file and restart Claude Desktop completely (quit and reopen, not just close the window).

Cursor Setup

In Cursor, open Settings > MCP Servers and add a new server with the command npx -y @modelcontextprotocol/server-puppeteer. Cursor will start the server automatically when you open a new AI chat.

Claude Code CLI Setup

claude mcp add puppeteer -- npx -y @modelcontextprotocol/server-puppeteer

Use Case 1: Scrape a Product Page

One of the most common uses for Puppeteer MCP is extracting structured data from web pages. Here is how to scrape product information from an e-commerce site:

# Prompt to Claude:
"Navigate to https://example-store.com/products/widget-pro and extract
the product name, price, description, and all customer review ratings.
Return the data as a JSON object."

Claude will use Puppeteer to navigate to the page, wait for the content to render (including JavaScript-loaded reviews), and extract the data into a clean JSON structure. This works even on sites that load content dynamically with React, Vue, or Angular - unlike simple HTTP scraping tools that only see the initial HTML.

For more complex scraping tasks, you can chain multiple pages:

# Multi-page scraping prompt:
"Go to https://example-store.com/category/electronics, extract all
product links from the first page, then visit each product page and
collect the name, price, and rating. Save everything as a CSV table."

Scraping with Specific HTML Selectors

When you need precision, tell Claude exactly which selectors to target. This is especially useful for sites with complex layouts where the AI might grab the wrong element:

# Target specific CSS selectors:
"Navigate to https://example-store.com/products/widget-pro.
Extract the product title from the h1.product-title element,
the price from span.price-current, the original price from
span.price-original, and all review text from div.review-body p elements.
Return as JSON."

# Extract data from a table:
"Go to https://example.com/specs/laptop-pro and extract the entire
specifications table. The table has class 'spec-table' with th elements
for labels and td elements for values. Return as a JSON object with
each spec name as a key."

# Handle nested elements:
"Navigate to https://news.example.com. Extract each article from the
div.article-card elements. For each card, get the title from h2 > a,
the date from span.published-date, the author from span.author-name,
and the summary from p.article-excerpt. Return an array of objects."

You can also target elements by their data attributes, which is often more reliable than class names that may change between deployments:

# Data attribute selectors:
"Go to https://shop.example.com/category/shoes. Find all elements with
data-product-id attributes. For each product, extract the data-product-id
value, the text content of [data-field='name'], and the text content of
[data-field='price']. Return the results as a JSON array."

Handling Infinite Scroll and Lazy Loading

Many modern sites load content as you scroll. Tell Claude to handle this explicitly:

# Infinite scroll scraping:
"Navigate to https://feed.example.com/trending. Scroll down 5 times,
waiting 2 seconds between each scroll for new content to load. Then
extract all post titles and their like counts from the loaded content."

# Lazy-loaded images:
"Go to https://gallery.example.com/portfolio. Scroll through the
entire page so all lazy-loaded images render. Then extract all image
URLs from img.portfolio-item elements."

Use Case 2: Take a Screenshot

Puppeteer can capture full-page or element-specific screenshots, which is invaluable for documentation, testing, and monitoring:

# Full page screenshot:
"Take a screenshot of https://example.com and show it to me"

# Specific viewport size:
"Take a screenshot of https://example.com at 1920x1080 resolution"

# Mobile viewport:
"Take a screenshot of https://example.com as it would appear on
an iPhone 14 (390x844)"

Claude returns the screenshot directly in the conversation. This is extremely useful for quickly checking how a website looks without opening a browser, verifying responsive designs, or documenting the current state of a page before making changes.

Screenshot Types Compared

Puppeteer supports three distinct screenshot modes, and each serves a different purpose:

Screenshot Type What It Captures Best For File Size
Viewport Only the visible area of the browser window Above-the-fold checks, hero section verification Small (200-500 KB)
Full page The entire scrollable page from top to bottom Full page documentation, design review Large (1-10 MB)
Element A single DOM element, cropped to its bounding box Component testing, capturing a specific chart or widget Tiny (10-200 KB)
# Element screenshot - capture just the navigation bar:
"Navigate to https://example.com and take a screenshot of only the
nav.main-header element"

# Full page screenshot for documentation:
"Take a full-page screenshot of https://docs.example.com/api-reference,
capturing the entire scrollable content"

# Viewport screenshot at a specific breakpoint:
"Take a viewport-only screenshot of https://example.com at 768x1024
to check the tablet layout"

Use Case 3: Generate a PDF

Puppeteer excels at converting web pages to PDF format with precise control over formatting:

# Basic PDF generation:
"Navigate to https://example.com/report/q1-2026 and generate a PDF
of the page"

# With formatting options:
"Generate a PDF of https://example.com/invoice/12345 in A4 format
with landscape orientation and include background graphics"

This is particularly useful for generating printable versions of web reports, invoices, and dashboards. The PDF captures the page exactly as it renders in the browser, including CSS styles, charts, and images.

PDF Generation Options

Puppeteer provides extensive control over PDF output. Here are the most useful options you can request through Claude:

Option Values Use Case
Format A4, Letter, Legal, Tabloid Standard document sizes for printing
Orientation Portrait, Landscape Wide tables and dashboards need landscape
Background graphics On / Off Include CSS backgrounds and colors in the PDF
Margins Custom (e.g., 20mm top, 15mm sides) Control whitespace around content
Header / Footer Custom HTML templates Add page numbers, dates, or company logos
Page range e.g., 1-3 or 2 Export only specific pages of a long document
# PDF with custom margins and page numbers:
"Generate a PDF of https://example.com/report/annual-2025 in A4 portrait
format with 25mm margins on all sides, include background graphics,
and add page numbers at the bottom center"

# PDF of just the first section:
"Navigate to https://docs.example.com/api and generate a PDF of only
pages 1 through 3 in Letter format"

# Dashboard PDF in landscape:
"Generate a landscape PDF of https://dashboard.example.com/overview
with background graphics enabled so the charts render correctly"

Use Case 4: Fill and Submit a Form

Puppeteer can interact with forms, filling in fields and clicking buttons just like a human user:

# Form automation prompt:
"Go to https://example.com/contact, fill in the name field with
'Test User', the email field with 'test@example.com', and the
message field with 'This is an automated test'. Then click the
Submit button and tell me what the confirmation page says."

This is useful for testing form submissions, automating repetitive data entry, and verifying that forms work correctly after code changes. Claude can handle multi-step forms, dropdowns, checkboxes, and even CAPTCHAs that use simple image recognition (though complex CAPTCHAs will still block automation).

Use Case 5: Monitor a Website for Changes

You can use Puppeteer MCP to check a website and report on its current state:

# Monitoring prompt:
"Navigate to https://status.example.com and tell me if there are
any incidents or degraded services listed on the page"

# Price monitoring:
"Go to https://store.example.com/product/12345 and tell me what
the current price is. Is there a sale or discount shown?"

# Content change detection:
"Navigate to https://example.com/changelog and extract the most
recent 3 entries. What are the dates and summaries?"

While Puppeteer MCP does not run continuously in the background, you can use it for on-demand checks whenever you want to know the current state of a web page. For continuous monitoring, you would need to set up a separate scheduled task that triggers Claude periodically.

Scheduling Scraping and Monitoring Jobs

For recurring tasks, combine Puppeteer MCP with automation tools outside of Claude. Here are practical approaches:

  • Cron + Claude Code CLI: Write a shell script that calls claude --message "Navigate to https://status.example.com and check for incidents" and schedule it with cron. This gives you periodic monitoring without manual intervention.
  • Node.js scheduler: Use node-cron or node-schedule in a lightweight Node.js script that programmatically calls the Puppeteer MCP server. Store results in a SQLite database for trend analysis.
  • GitHub Actions: Set up a scheduled workflow that runs a Claude Code command to scrape and commit results to a repository. This gives you version-controlled snapshots of web page data over time.
# Example cron job (check every hour):
0 * * * * /usr/local/bin/claude --message "Navigate to https://competitor.example.com/pricing and extract all plan names and prices. Save the result to /tmp/pricing-check.json" >> /var/log/price-monitor.log 2>&1

Dealing with Anti-Bot Detection

Many websites use anti-bot measures that can block or mislead automated browsers. Puppeteer is detectable by default because it sets certain JavaScript properties (like navigator.webdriver = true) and uses recognizable browser fingerprints. Here is how to handle common scenarios:

Common Detection Methods and Workarounds

Detection Method What It Checks Workaround
WebDriver flag navigator.webdriver is true Use headed mode or stealth plugins
User agent string Default headless Chrome user agent Ask Claude to set a standard browser user agent
Rate limiting Too many requests in a short period Add delays between page navigations
CAPTCHA reCAPTCHA, hCaptcha, Turnstile Use headed mode and solve manually, or skip the site
JavaScript challenges Cloudflare JS challenge page Wait longer for the challenge to resolve automatically
IP reputation Known datacenter or VPN IP ranges Use residential proxy or your home IP

For sites protected by Cloudflare or similar services, tell Claude to wait for the challenge to clear before extracting content:

# Handling Cloudflare challenge pages:
"Navigate to https://protected-site.example.com. If you see a
Cloudflare challenge or 'checking your browser' page, wait up to
15 seconds for it to resolve before extracting the page content."

Important: always respect a website's robots.txt and terms of service. Anti-bot measures exist for a reason, and bypassing them may violate the site's terms or applicable laws. Use these techniques only for legitimate purposes like testing your own sites or accessing publicly available data.

Advanced Configuration

Headless vs Headed Mode

By default, Puppeteer runs in headless mode (no visible browser window). If you need to see what Puppeteer is doing - useful for debugging - you can configure headed mode:

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"],
      "env": {
        "PUPPETEER_HEADLESS": "false"
      }
    }
  }
}

In headed mode, a Chrome window will open and you can watch Claude navigate, click, and type in real time. This is helpful for understanding what the AI is doing and for debugging issues with complex web pages.

Custom Chrome Path

If you want Puppeteer to use your existing Chrome installation instead of downloading Chromium:

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"],
      "env": {
        "PUPPETEER_EXECUTABLE_PATH": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
      }
    }
  }
}

Proxy Configuration

To route Puppeteer traffic through a proxy (useful for accessing geo-restricted content or avoiding rate limiting):

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-puppeteer",
        "--proxy-server=http://proxy.example.com:8080"
      ]
    }
  }
}

Error Handling: Timeouts and Navigation Failures

Browser automation is inherently fragile. Pages fail to load, elements disappear, and network connections drop. Understanding common failure modes will save you significant debugging time.

Navigation Timeouts

The default navigation timeout in Puppeteer is 30 seconds. If a page takes longer than that to reach the load event, the operation fails. Common causes include:

  • Heavy third-party scripts: Analytics, ad networks, and social widgets can delay page load significantly. Ask Claude to wait for the domcontentloaded event instead of the full load event for faster results.
  • Stuck network requests: A single slow resource (large image, unresponsive API) can hold up the entire page. Tell Claude to navigate with a shorter timeout and retry if it fails.
  • Redirect chains: Multiple redirects (HTTP to HTTPS, www to non-www, marketing redirects) each add latency. If you know the final URL, navigate directly to it.
# Handle slow pages:
"Navigate to https://slow-site.example.com with a 60-second timeout.
If the page does not load within 60 seconds, take a screenshot of
whatever has loaded so far and report what you can see."

# Wait for specific content instead of full page load:
"Navigate to https://dashboard.example.com. Don't wait for the full
page to load - just wait until the div.main-content element appears,
then extract the data from it."

Element Not Found Errors

When Claude tries to interact with an element that does not exist on the page, the operation fails. This commonly happens when:

  • The page layout changed since you last used the prompt
  • The element is loaded asynchronously and has not appeared yet
  • The element is inside an iframe that Puppeteer cannot see by default
  • The selector has a typo or uses a class name that was minified in production
# Robust scraping with fallbacks:
"Navigate to https://store.example.com/product/123. Try to extract
the price from span.price-current. If that element doesn't exist,
try span.product-price or div.price-display. If none of those work,
take a screenshot so I can see the current page layout."

Network Errors

DNS failures, SSL certificate errors, and connection resets can all prevent navigation. Tell Claude how to handle these gracefully:

# Handle potential network issues:
"Try to navigate to https://example.com/api-status. If the page
fails to load due to a network error, report the specific error
message you received so I can diagnose the issue."

Comparison: Web Scraping MCP Servers

Puppeteer is not the only way to access web content through MCP. Here is how it compares to alternatives:

Feature Puppeteer Playwright Firecrawl Brave Search
JavaScript rendering Yes Yes Yes (cloud) No
Screenshots Yes Yes Yes No
Form interaction Yes Yes No No
PDF generation Yes Yes No No
Runs locally Yes Yes Cloud API Cloud API
Multi-browser support Chrome only Chrome, Firefox, Safari N/A N/A
Cost Free Free Paid API Free tier
Best for Full browser control Cross-browser testing Large-scale crawling Quick web search

When to choose Puppeteer: You need to interact with a specific page (click, type, scroll), take screenshots, generate PDFs, or scrape JavaScript-rendered content. It runs locally and is completely free.

When NOT to use Puppeteer: If you just need to search the web, use Brave Search. If you need to crawl hundreds of pages at scale, Firecrawl or Exa will be faster and more reliable. If you need cross-browser testing, Playwright offers Firefox and Safari support. Puppeteer is also overkill for fetching simple API responses or static HTML pages.

Troubleshooting

Chromium Download Fails

On first run, Puppeteer downloads a Chromium binary (~170 MB). If this fails due to network issues or corporate firewalls, set PUPPETEER_SKIP_DOWNLOAD=true and provide a custom Chrome path via PUPPETEER_EXECUTABLE_PATH.

Page Times Out

Some pages take a long time to load due to heavy JavaScript or slow network requests. The default timeout is usually 30 seconds. If pages consistently time out, the site may be blocking automated browsers. Try using a different user agent or adding a delay between navigation steps.

spawn ENOENT Error

This means Node.js or npx is not found. See our spawn ENOENT troubleshooting guide for detailed fixes on every operating system.

Blank Screenshots

If Puppeteer returns a blank or white screenshot, the page likely has not finished rendering. This is common with single-page applications that render content after the initial page load event fires. Tell Claude to wait for a specific element to appear before capturing the screenshot. Another cause is CSS animations or transitions that have not completed - adding a small delay (1-2 seconds) after navigation usually resolves this.

Memory Issues with Long Sessions

Chromium uses significant RAM, typically 200-400 MB per instance. If you are running multiple scraping tasks in a single Claude session, memory usage can grow over time as browser tabs accumulate. If you notice slowdowns, start a new Claude conversation to get a fresh Puppeteer instance. On machines with limited RAM (4 GB or less), close other memory-intensive applications while using Puppeteer MCP.

Next Steps

Now that you have Puppeteer MCP running, explore these related resources:

Was this helpful?

Share article:

Stay Updated with MCP Insights

Join 5,000+ developers and get weekly insights on MCP development, new server releases, and implementation strategies delivered to your inbox.

We respect your privacy. Unsubscribe at any time.

MCPgee Team

MCPgee Team

We're pioneering the future of Model Context Protocol development with comprehensive guides and tools. Our mission is to make MCP accessible to developers of all skill levels.

Frequently Asked Questions

Related Articles