MCPMark
MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
What is MCPMark?
MCPMark is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcpmark is a comprehensive, stress-testing mcp benchmark designed to evaluate model and agent capabilities in real-world mcp use.
MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
This server falls under the Analytics category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- MCPMark is a comprehensive, stress-testing MCP benchmark des
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx mcpmarkConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use MCPMark
MCPMark is a comprehensive stress-testing benchmark suite designed to evaluate the real-world tool-use capabilities of LLM agents against actual MCP servers. It covers five MCP service categories — Notion, GitHub, Filesystem, Postgres, and Playwright — with 127 standard tasks and 50 easier tasks, enabling rigorous pass@k and pass^k measurements. Teams use MCPMark to compare models, validate agent frameworks, and identify failure modes in MCP tool-calling behavior before production deployment.
Prerequisites
- Python 3.10 or higher installed
- Git to clone the repository
- Playwright installed (`playwright install`) for browser-based tasks
- Service credentials for each MCP category you want to benchmark (OpenAI, Notion, GitHub, Postgres as applicable)
- Docker (optional) for containerized evaluation runs
Clone the MCPMark repository
MCPMark is installed from source. Clone the repository and enter the project directory before installing dependencies.
git clone https://github.com/eval-sys/mcpmark.git
cd mcpmarkInstall Python dependencies
Install MCPMark in editable mode so the pipeline module is importable. Then install Playwright browsers for the browser automation task category.
pip install -e .
playwright installConfigure service credentials
Create a `.mcp_env` file at the repository root with credentials for each MCP service you want to benchmark. Only the credentials for services you intend to test are required.
# .mcp_env
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1
# Notion
SOURCE_NOTION_API_KEY=secret_...
EVAL_NOTION_API_KEY=secret_...
EVAL_PARENT_PAGE_TITLE=MCPMark Eval
# GitHub
GITHUB_TOKENS=ghp_...
GITHUB_EVAL_ORG=my-eval-org
# Postgres
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USERNAME=mcpmark
POSTGRES_PASSWORD=secret
# Playwright
PLAYWRIGHT_BROWSER=chromium
PLAYWRIGHT_HEADLESS=trueRun a benchmark evaluation
Execute the pipeline for a specific MCP service and task. The filesystem category requires no external accounts and is the easiest starting point. Use --k to specify the number of runs per task for statistical stability.
python -m pipeline \
--mcp filesystem \
--k 1 \
--models gpt-4o \
--tasks allAggregate and review results
After running evaluations, use the aggregator to compute pass@k scores, success rates per task, and overall benchmark results. Re-running the pipeline automatically skips already-completed tasks.
python -m src.aggregators.aggregate_results --exp-name my_experimentRun with Docker for reproducibility
For CI/CD integration or to ensure a clean environment, use the Docker build scripts. The containerized version has been validated on macOS and Linux.
./build-docker.shMCPMark Examples
Client configuration
MCPMark is a standalone evaluation framework and does not run as an MCP server itself. Configure it in your CI environment using environment variables.
{
"mcpServers": {
"mcpmark": {
"command": "python",
"args": ["-m", "pipeline", "--mcp", "filesystem", "--k", "1", "--models", "claude-3-5-sonnet-latest"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}Prompts to try
MCPMark is a command-line benchmark tool. Use these commands to evaluate different models and services.
- Run filesystem benchmark: `python -m pipeline --mcp filesystem --k 3 --models claude-3-5-sonnet-latest --tasks all`
- Run Notion tasks only: `python -m pipeline --mcp notion --k 1 --models gpt-4o --tasks all`
- Run a specific task: `python -m pipeline --mcp filesystem --k 1 --models gpt-4o --tasks file_property/size_classification`
- Enable conversation compaction: `python -m pipeline --mcp github --k 1 --models gpt-4o --compaction-token 4000`Troubleshooting MCPMark
Playwright browser tasks fail with browser not found errors
Run `playwright install` after installing dependencies to download the browser binaries. Specify the browser type in .mcp_env with `PLAYWRIGHT_BROWSER=chromium` (or firefox/webkit).
Benchmark re-runs repeat already-completed tasks
MCPMark's auto-resume feature only skips tasks when using the same --exp-name across runs. Always pass a consistent --exp-name flag: `python -m pipeline --exp-name my_run --mcp filesystem ...`
GitHub API rate limits causing task failures
Provide multiple GitHub tokens in `GITHUB_TOKENS` as a comma-separated list. MCPMark rotates through them to avoid hitting per-token rate limits.
Frequently Asked Questions about MCPMark
What is MCPMark?
MCPMark is a Model Context Protocol (MCP) server that mcpmark is a comprehensive, stress-testing mcp benchmark designed to evaluate model and agent capabilities in real-world mcp use. It connects AI assistants to external tools and data sources through a standardized interface.
How do I install MCPMark?
Follow the installation instructions on the MCPMark GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with MCPMark?
MCPMark works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is MCPMark free to use?
Yes, MCPMark is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.
MCPMark Alternatives — Similar Analytics Servers
Looking for alternatives to MCPMark? Here are other popular analytics servers you can use with Claude, Cursor, and VS Code.
OpenMetadata
★ 14.0kOpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Superset
★ 10.9kAn MCP server that provides AI assistants with full access to Apache Superset instances, enabling interaction with dashboards, charts, datasets, databases, and SQL execution capabilities.
Horizon
★ 4.4k📡 Your own AI-powered news radar. Generates daily briefings in English & Chinese. | 用 AI 构建你专属的新闻雷达
MCP Server Chart
★ 4.1kEnables generation of 25+ types of charts and data visualizations using AntV, including bar charts, line charts, maps, mind maps, and specialized diagrams like fishbone and sankey charts. Supports both statistical charts and geographic visualizations
Muapi CLI
★ 997Official CLI for muapi.ai — generate images, videos & audio from the terminal. MCP server, 14 AI models, npm + pip installable.
Weather MCP Server
★ 907Weather Data Fetcher MCP server built with Node.js, MCP SDK, and Zod. Provides weather details like temperature and forecast for cities such as Noida and Delhi via a registered tool. Simplifies API integration, enabling structured responses for clien
Browse More Analytics MCP Servers
Explore all analytics servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up MCPMark in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use MCPMark?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.