Tutorial

Web Scraping with Puppeteer MCP Server - Setup & Use Cases

Set up the Puppeteer MCP server for AI-powered web scraping. Covers data extraction, screenshots, PDF generation, and comparison with Firecrawl and Brave Search.

What Is the Puppeteer MCP Server?

The Puppeteer MCP server gives AI assistants direct control over a headless Chrome browser. Instead of just fetching HTML like a simple HTTP client, Puppeteer can render JavaScript, interact with dynamic pages, fill out forms, take screenshots, generate PDFs, and extract data from single-page applications that don't work with traditional scraping tools.

This makes it one of the most powerful MCP servers for web automation. You can ask your AI assistant to "go to this website, screenshot it, and extract all product prices" and it will use Puppeteer to do exactly that - navigating the page, waiting for dynamic content to load, and pulling structured data.

This guide covers setup, common use cases, anti-detection strategies, and how Puppeteer compares to other web-related MCP servers like Brave Search and Firecrawl.

Setup and Configuration

Basic Setup

The Puppeteer MCP server runs via npx and automatically downloads a compatible Chromium binary. Add it to your client configuration:

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"]
    }
  }
}

Advanced Configuration

For more control, configure Chrome launch options via environment variables:

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"],
      "env": {
        "PUPPETEER_CHROME_PORT": "9222",
        "PUPPETEER_HEADLESS": "true",
        "PUPPETEER_LAUNCH_ARGS": "--no-sandbox --disable-setuid-sandbox --disable-dev-shm-usage"
      }
    }
  }
}

The --no-sandbox flag is required on Linux systems and in Docker containers. The --disable-dev-shm-usage flag prevents crashes in memory-constrained environments.

Using an Existing Chrome Instance

If you want Puppeteer to connect to an already-running Chrome browser (useful for maintaining login sessions), start Chrome with remote debugging enabled:

# Start Chrome with remote debugging
google-chrome --remote-debugging-port=9222

# Then configure the MCP server to connect
{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"],
      "env": {
        "PUPPETEER_CHROME_PORT": "9222"
      }
    }
  }
}

Common Use Cases

1. Data Extraction

The most common use case is extracting structured data from websites. Unlike simple HTTP requests, Puppeteer renders JavaScript and waits for dynamic content, making it ideal for modern SPAs.

Example prompts you can use with your AI assistant:

  • "Go to [URL] and extract all product names, prices, and ratings into a table"
  • "Navigate to [URL], click the 'Load More' button 3 times, then extract all article titles and dates"
  • "Visit [URL] and extract the data from the chart/table on the page"

2. Screenshots

Puppeteer can take full-page screenshots or capture specific elements. This is useful for documentation, bug reporting, visual regression testing, and design review.

  • "Take a screenshot of [URL]"
  • "Screenshot the hero section of [URL] at mobile width (375px)"
  • "Take screenshots of [URL] at 3 different viewport sizes: mobile, tablet, desktop"

3. PDF Generation

Convert web pages to PDF documents. Useful for archiving content, generating reports, or creating printable versions of web pages.

  • "Convert [URL] to a PDF"
  • "Generate a PDF of this invoice page with A4 paper size"

4. Form Interaction and Testing

Puppeteer can fill out forms, click buttons, and interact with page elements. This makes it useful for testing web applications or automating repetitive web tasks.

  • "Go to [URL], fill in the search form with 'MCP servers', and extract the results"
  • "Navigate to the login page, enter test credentials, and verify the dashboard loads"

5. Monitoring and Auditing

Use Puppeteer to check website status, verify content, or audit pages for issues:

  • "Check if [URL] loads correctly and report any console errors"
  • "Visit [URL] and tell me if the pricing has changed from [old price]"

Anti-Detection Tips

Many websites detect and block headless browsers. If you're using Puppeteer for legitimate scraping, these techniques help avoid false positives:

  • Use a realistic user agent: Headless Chrome uses a user agent that includes "HeadlessChrome" which many sites block. Configure a standard Chrome user agent string.
  • Set a realistic viewport: Default headless viewport (800x600) is a red flag. Use common resolutions like 1920x1080 or 1440x900.
  • Add delays between actions: Real users don't click buttons in 1ms. Add small random delays between interactions to mimic human behavior.
  • Handle cookies and headers: Accept cookie banners and send standard HTTP headers that real browsers send.
  • Respect robots.txt: Always check and honor the site's robots.txt. Automated scraping that ignores robots.txt may violate terms of service and applicable laws.

Important: Always ensure your web scraping activities comply with the target website's terms of service and applicable laws. Use Puppeteer responsibly and ethically.

Comparison: Puppeteer vs Brave Search vs Firecrawl

There are several MCP servers for web-related tasks. Here's how they compare:

Feature Puppeteer MCP Brave Search MCP Firecrawl MCP
Type Browser automation Search engine API Cloud scraping API
JavaScript rendering Yes (full Chrome) No Yes (cloud)
Screenshots Yes No Yes
Form interaction Yes No No
Runs locally Yes API calls Cloud only
Cost Free (local resources) API key (free tier available) Paid subscription
Best for Full page interaction Web search queries Large-scale scraping

When to use Puppeteer: When you need to interact with pages (click, type, scroll), take screenshots, handle JavaScript-heavy sites, or work without an API key. It's the most versatile option but uses the most local resources.

When to use Brave Search: When you need to search the web and get text results. It's fast, lightweight, and doesn't require a browser. Great for research and fact-checking.

When to use Firecrawl: When you need to scrape many pages at scale with built-in anti-detection. It runs in the cloud so your local machine isn't impacted. Best for bulk data extraction.

You can also use Exa Search for semantic search capabilities that complement Puppeteer's scraping. For database integration with scraped data, see our database MCP servers guide.

Performance Considerations

Puppeteer launches a full Chrome browser, which means significant resource usage:

  • Memory: Each Chrome instance uses 100-300 MB of RAM. If you're running other MCP servers too, monitor total memory usage.
  • CPU: Page rendering is CPU-intensive. Complex pages with heavy JavaScript may spike CPU usage temporarily.
  • Disk: Chromium binary is ~170 MB. Screenshots and PDFs consume additional disk space.
  • Startup time: Chrome takes 2-5 seconds to launch. First requests will be slower than subsequent ones since the browser stays running.

On systems with limited resources (8 GB RAM or less), consider closing the Puppeteer server when not actively using it, or use Brave Search for simple web queries that don't require a full browser.

Troubleshooting

  • Chrome fails to launch on Linux: Install required system dependencies with apt install -y libnss3 libatk-bridge2.0-0 libdrm2 libxkbcommon0 libgbm1. Use the --no-sandbox launch argument.
  • Timeout errors: Puppeteer operations can be slow. See our timeout troubleshooting guide for configuration options.
  • ENOENT errors: The Chromium binary may not have downloaded correctly. Delete the node_modules cache and retry. See spawn ENOENT fix.
  • Pages not rendering: Some sites require specific viewport sizes or user agents. Ask the AI to set these before navigating.

Frequently Asked Questions

Related Guides

Ready to explore MCP servers?

Browse 100+ curated MCP servers
Step-by-step setup tutorials
Community-driven reviews and ratings