MCPMark

v1.0.0Analyticsstable

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

agenticbenchmarkeval-sysmcpmcp-servers
Share:
420
Stars
0
Downloads
0
Weekly
0/5

What is MCPMark?

MCPMark is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcpmark is a comprehensive, stress-testing mcp benchmark designed to evaluate model and agent capabilities in real-world mcp use.

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

This server falls under the Analytics category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • MCPMark is a comprehensive, stress-testing MCP benchmark des

Use Cases

Benchmark LLM agent tool-use capabilities on real-world tasks.
Stress-test MCP server implementations comprehensively.
eval-sys

Maintainer

LicenseApache-2.0
Languagepython
Versionv1.0.0
UpdatedMay 20, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx mcpmark

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use MCPMark

MCPMark is a comprehensive stress-testing benchmark suite designed to evaluate the real-world tool-use capabilities of LLM agents against actual MCP servers. It covers five MCP service categories — Notion, GitHub, Filesystem, Postgres, and Playwright — with 127 standard tasks and 50 easier tasks, enabling rigorous pass@k and pass^k measurements. Teams use MCPMark to compare models, validate agent frameworks, and identify failure modes in MCP tool-calling behavior before production deployment.

Prerequisites

  • Python 3.10 or higher installed
  • Git to clone the repository
  • Playwright installed (`playwright install`) for browser-based tasks
  • Service credentials for each MCP category you want to benchmark (OpenAI, Notion, GitHub, Postgres as applicable)
  • Docker (optional) for containerized evaluation runs
1

Clone the MCPMark repository

MCPMark is installed from source. Clone the repository and enter the project directory before installing dependencies.

git clone https://github.com/eval-sys/mcpmark.git
cd mcpmark
2

Install Python dependencies

Install MCPMark in editable mode so the pipeline module is importable. Then install Playwright browsers for the browser automation task category.

pip install -e .
playwright install
3

Configure service credentials

Create a `.mcp_env` file at the repository root with credentials for each MCP service you want to benchmark. Only the credentials for services you intend to test are required.

# .mcp_env
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1

# Notion
SOURCE_NOTION_API_KEY=secret_...
EVAL_NOTION_API_KEY=secret_...
EVAL_PARENT_PAGE_TITLE=MCPMark Eval

# GitHub
GITHUB_TOKENS=ghp_...
GITHUB_EVAL_ORG=my-eval-org

# Postgres
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USERNAME=mcpmark
POSTGRES_PASSWORD=secret

# Playwright
PLAYWRIGHT_BROWSER=chromium
PLAYWRIGHT_HEADLESS=true
4

Run a benchmark evaluation

Execute the pipeline for a specific MCP service and task. The filesystem category requires no external accounts and is the easiest starting point. Use --k to specify the number of runs per task for statistical stability.

python -m pipeline \
  --mcp filesystem \
  --k 1 \
  --models gpt-4o \
  --tasks all
5

Aggregate and review results

After running evaluations, use the aggregator to compute pass@k scores, success rates per task, and overall benchmark results. Re-running the pipeline automatically skips already-completed tasks.

python -m src.aggregators.aggregate_results --exp-name my_experiment
6

Run with Docker for reproducibility

For CI/CD integration or to ensure a clean environment, use the Docker build scripts. The containerized version has been validated on macOS and Linux.

./build-docker.sh

MCPMark Examples

Client configuration

MCPMark is a standalone evaluation framework and does not run as an MCP server itself. Configure it in your CI environment using environment variables.

{
  "mcpServers": {
    "mcpmark": {
      "command": "python",
      "args": ["-m", "pipeline", "--mcp", "filesystem", "--k", "1", "--models", "claude-3-5-sonnet-latest"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Prompts to try

MCPMark is a command-line benchmark tool. Use these commands to evaluate different models and services.

- Run filesystem benchmark: `python -m pipeline --mcp filesystem --k 3 --models claude-3-5-sonnet-latest --tasks all`
- Run Notion tasks only: `python -m pipeline --mcp notion --k 1 --models gpt-4o --tasks all`
- Run a specific task: `python -m pipeline --mcp filesystem --k 1 --models gpt-4o --tasks file_property/size_classification`
- Enable conversation compaction: `python -m pipeline --mcp github --k 1 --models gpt-4o --compaction-token 4000`

Troubleshooting MCPMark

Playwright browser tasks fail with browser not found errors

Run `playwright install` after installing dependencies to download the browser binaries. Specify the browser type in .mcp_env with `PLAYWRIGHT_BROWSER=chromium` (or firefox/webkit).

Benchmark re-runs repeat already-completed tasks

MCPMark's auto-resume feature only skips tasks when using the same --exp-name across runs. Always pass a consistent --exp-name flag: `python -m pipeline --exp-name my_run --mcp filesystem ...`

GitHub API rate limits causing task failures

Provide multiple GitHub tokens in `GITHUB_TOKENS` as a comma-separated list. MCPMark rotates through them to avoid hitting per-token rate limits.

Frequently Asked Questions about MCPMark

What is MCPMark?

MCPMark is a Model Context Protocol (MCP) server that mcpmark is a comprehensive, stress-testing mcp benchmark designed to evaluate model and agent capabilities in real-world mcp use. It connects AI assistants to external tools and data sources through a standardized interface.

How do I install MCPMark?

Follow the installation instructions on the MCPMark GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with MCPMark?

MCPMark works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is MCPMark free to use?

Yes, MCPMark is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Browse More Analytics MCP Servers

Explore all analytics servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "mcpmark": { "command": "npx", "args": ["-y", "mcpmark"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use MCPMark?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides