What Is the Puppeteer MCP Server?
The Puppeteer MCP server gives AI assistants direct control over a headless Chrome browser. Instead of just fetching HTML like a simple HTTP client, Puppeteer can render JavaScript, interact with dynamic pages, fill out forms, take screenshots, generate PDFs, and extract data from single-page applications that don't work with traditional scraping tools.
This makes it one of the most powerful MCP servers for web automation. You can ask your AI assistant to "go to this website, screenshot it, and extract all product prices" and it will use Puppeteer to do exactly that - navigating the page, waiting for dynamic content to load, and pulling structured data.
This guide covers setup, common use cases, anti-detection strategies, and how Puppeteer compares to other web-related MCP servers like Brave Search and Firecrawl.
Setup and Configuration
Basic Setup
The Puppeteer MCP server runs via npx and automatically downloads a compatible Chromium binary. Add it to your client configuration:
{
"mcpServers": {
"puppeteer": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-puppeteer"]
}
}
}
Advanced Configuration
For more control, configure Chrome launch options via environment variables:
{
"mcpServers": {
"puppeteer": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-puppeteer"],
"env": {
"PUPPETEER_CHROME_PORT": "9222",
"PUPPETEER_HEADLESS": "true",
"PUPPETEER_LAUNCH_ARGS": "--no-sandbox --disable-setuid-sandbox --disable-dev-shm-usage"
}
}
}
}
The --no-sandbox flag is required on Linux systems and in Docker containers. The --disable-dev-shm-usage flag prevents crashes in memory-constrained environments.
Using an Existing Chrome Instance
If you want Puppeteer to connect to an already-running Chrome browser (useful for maintaining login sessions), start Chrome with remote debugging enabled:
# Start Chrome with remote debugging
google-chrome --remote-debugging-port=9222
# Then configure the MCP server to connect
{
"mcpServers": {
"puppeteer": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-puppeteer"],
"env": {
"PUPPETEER_CHROME_PORT": "9222"
}
}
}
}
Common Use Cases
1. Data Extraction
The most common use case is extracting structured data from websites. Unlike simple HTTP requests, Puppeteer renders JavaScript and waits for dynamic content, making it ideal for modern SPAs.
Example prompts you can use with your AI assistant:
- "Go to [URL] and extract all product names, prices, and ratings into a table"
- "Navigate to [URL], click the 'Load More' button 3 times, then extract all article titles and dates"
- "Visit [URL] and extract the data from the chart/table on the page"
2. Screenshots
Puppeteer can take full-page screenshots or capture specific elements. This is useful for documentation, bug reporting, visual regression testing, and design review.
- "Take a screenshot of [URL]"
- "Screenshot the hero section of [URL] at mobile width (375px)"
- "Take screenshots of [URL] at 3 different viewport sizes: mobile, tablet, desktop"
3. PDF Generation
Convert web pages to PDF documents. Useful for archiving content, generating reports, or creating printable versions of web pages.
- "Convert [URL] to a PDF"
- "Generate a PDF of this invoice page with A4 paper size"
4. Form Interaction and Testing
Puppeteer can fill out forms, click buttons, and interact with page elements. This makes it useful for testing web applications or automating repetitive web tasks.
- "Go to [URL], fill in the search form with 'MCP servers', and extract the results"
- "Navigate to the login page, enter test credentials, and verify the dashboard loads"
5. Monitoring and Auditing
Use Puppeteer to check website status, verify content, or audit pages for issues:
- "Check if [URL] loads correctly and report any console errors"
- "Visit [URL] and tell me if the pricing has changed from [old price]"
Anti-Bot Detection and Evasion
Many websites detect and block headless browsers using JavaScript fingerprinting, behavioral analysis, and header inspection. If you are using Puppeteer for legitimate scraping (testing your own sites, authorized data collection, or publicly available information), these techniques help avoid false positive bot detection:
| Detection Method | What It Checks | Countermeasure |
|---|---|---|
| User agent string | "HeadlessChrome" in the UA | Set a standard Chrome UA string |
| Viewport size | Default 800x600 headless viewport | Use 1920x1080 or 1440x900 |
| navigator.webdriver | JS property set to true in headless | Override with evaluateOnNewDocument |
| Click timing | Instantaneous 0ms clicks | Add random 100-500ms delays |
| Missing plugins | Headless reports 0 browser plugins | Inject fake plugin array via JS |
- Handle cookies and headers: Accept cookie banners and send standard HTTP headers that real browsers send, including Accept-Language and Accept-Encoding.
- Respect robots.txt: Always check and honor the site's robots.txt. Automated scraping that ignores robots.txt may violate terms of service and applicable laws.
- Rate limit requests: Do not scrape hundreds of pages per minute from a single domain. Add delays between page navigations to avoid triggering rate limits.
Important: Always ensure your web scraping activities comply with the target website's terms of service and applicable laws. Use Puppeteer responsibly and ethically.
CSS Selector Strategies
When asking the AI to extract data from web pages, providing good selector hints improves extraction reliability. Prefer stable selectors over fragile ones:
- Data attributes first: Elements with
data-testid,data-id, ordata-nameare the most stable because they survive design changes. - ARIA roles: Selectors like
[role="navigation"],[role="main"],[aria-label="Search"]are semantic and stable. - Avoid positional selectors:
div:nth-child(3) > spanbreaks when layout changes. Prefer class names or semantic attributes. - Text content matching: For buttons and links, matching by visible text is often more stable than class-based selectors.
- Wait for dynamic content: SPAs load content after the initial page load. Ask the AI to wait for elements to appear before extracting data.
PDF Generation
Puppeteer excels at converting web pages to high-quality PDFs. The AI can generate PDFs with custom page sizes, margins, headers, and footers. Common use cases include archiving web content, generating reports from dashboards, and creating printable invoices. When asking the AI to generate PDFs, be specific: "Generate a PDF of this page with A4 paper size and 1-inch margins" or "Create a landscape PDF of this dashboard with the navigation bar hidden." Generated PDFs typically range from 100 KB for text-heavy pages to 5 MB for image-heavy pages.
Scheduled and Recurring Scraping
While the Puppeteer MCP server does not include a built-in scheduler, you can combine it with other tools for recurring scraping workflows. Use the Memory MCP server to store previous values and compare on each check. Ask the AI "Visit [URL] and check if the pricing has changed from what we stored last week." For large-scale or very frequent scraping, consider Firecrawl MCP instead, which handles rate limiting and proxy rotation at the infrastructure level.
Comparison: Puppeteer vs Brave Search vs Firecrawl
There are several MCP servers for web-related tasks. Here's how they compare:
| Feature | Puppeteer MCP | Brave Search MCP | Firecrawl MCP |
|---|---|---|---|
| Type | Browser automation | Search engine API | Cloud scraping API |
| JavaScript rendering | Yes (full Chrome) | No | Yes (cloud) |
| Screenshots | Yes | No | Yes |
| Form interaction | Yes | No | No |
| Runs locally | Yes | API calls | Cloud only |
| Cost | Free (local resources) | API key (free tier available) | Paid subscription |
| Best for | Full page interaction | Web search queries | Large-scale scraping |
When to use Puppeteer: When you need to interact with pages (click, type, scroll), take screenshots, handle JavaScript-heavy sites, or work without an API key. It's the most versatile option but uses the most local resources.
When to use Brave Search: When you need to search the web and get text results. It's fast, lightweight, and doesn't require a browser. Great for research and fact-checking.
When to use Firecrawl: When you need to scrape many pages at scale with built-in anti-detection. It runs in the cloud so your local machine isn't impacted. Best for bulk data extraction.
You can also use Exa Search for semantic search capabilities that complement Puppeteer's scraping. For database integration with scraped data, see our database MCP servers guide.
Performance Considerations
Puppeteer launches a full Chrome browser, which means significant resource usage:
- Memory: Each Chrome instance uses 100-300 MB of RAM. If you're running other MCP servers too, monitor total memory usage.
- CPU: Page rendering is CPU-intensive. Complex pages with heavy JavaScript may spike CPU usage temporarily.
- Disk: Chromium binary is ~170 MB. Screenshots and PDFs consume additional disk space.
- Startup time: Chrome takes 2-5 seconds to launch. First requests will be slower than subsequent ones since the browser stays running.
On systems with limited resources (8 GB RAM or less), consider closing the Puppeteer server when not actively using it, or use Brave Search for simple web queries that don't require a full browser.
Troubleshooting
- Chrome fails to launch on Linux: Install required system dependencies with
apt install -y libnss3 libatk-bridge2.0-0 libdrm2 libxkbcommon0 libgbm1. Use the--no-sandboxlaunch argument in your env config. - Timeout errors: Puppeteer operations can be slow, especially on first page load. See our timeout troubleshooting guide for configuration options.
- ENOENT errors: The Chromium binary may not have downloaded correctly. Delete the
node_modulescache and the_npxcache directory, then retry. See spawn ENOENT fix. - Pages not rendering correctly: Some sites require specific viewport sizes or user agents. Ask the AI to set a realistic viewport (1920x1080) and user agent before navigating to the target URL.
- Docker issues: Running Puppeteer in Docker requires additional flags:
--no-sandbox --disable-setuid-sandbox --disable-dev-shm-usage. Add these toPUPPETEER_LAUNCH_ARGSin your server env config. - Windows-specific issues: On Windows, Puppeteer may fail if the Chromium download path contains spaces. See our Windows MCP setup guide for path configuration.
Combining Puppeteer with Other MCP Servers
Puppeteer becomes even more powerful when combined with other MCP servers in your configuration:
- Puppeteer + Memory: Scrape a page, extract key data, and store it in the knowledge graph for cross-session access. "Visit our competitor's pricing page and save the current prices to memory for future comparison."
- Puppeteer + Brave Search: Search the web for relevant pages, then use Puppeteer to visit and scrape specific results. "Search for React component libraries, visit the top 3 results, and compare their feature lists."
- Puppeteer + Filesystem: Take screenshots or generate PDFs and save them to your local project directory. "Screenshot our staging site at 3 viewport sizes and save the images to the test/screenshots folder."
- Puppeteer + Database: Scrape structured data from web pages and insert it into your database. "Extract the product catalog from this page and insert each product into the products table."
For configuring multiple servers together, see our multiple server configuration guide. For memory considerations when running Puppeteer alongside other servers, see our how many servers guide.
Ethical Web Scraping Guidelines
When using Puppeteer MCP for web scraping, follow these ethical guidelines:
- Respect robots.txt: Check the target site's robots.txt before scraping. Most sites document which paths and user agents are allowed or disallowed.
- Honor rate limits: Do not send more than a few requests per minute to any single domain. Excessive requests can overload servers and may result in IP bans.
- Check terms of service: Many websites explicitly prohibit automated scraping in their terms of service. Review these before scraping commercial sites.
- Identify yourself: When possible, use a user agent string that includes your contact information or project name so site operators can reach you if needed.
- Cache results: Avoid re-scraping the same page repeatedly. Cache results locally and only re-scrape when data freshness requires it.
- Avoid scraping personal data: Be especially careful with pages that contain personal information. Comply with GDPR, CCPA, and other data protection regulations applicable to your jurisdiction.