How do I install MCPMark MCP Server?

Follow the setup instructions on the MCPMark GitHub repository, then add the server configuration to your AI client.

What category is MCPMark MCP Server?

MCPMark is categorized under Analytics. Browse more servers in these categories on MCPgee.

MCPMark

Name: Mcpmark MCP Server
Author: eval-sys

v1.0.0•Analytics•stable

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

agenticbenchmarkeval-sysmcpmcp-servers

420

Stars

Downloads

Weekly

0/5

View on GitHub

What is MCPMark?

MCPMark is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcpmark is a comprehensive, stress-testing mcp benchmark designed to evaluate model and agent capabilities in real-world mcp use.

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

This server falls under the Analytics category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

MCPMark is a comprehensive, stress-testing MCP benchmark des

Use Cases

Benchmark LLM agent tool-use capabilities on real-world tasks.

Stress-test MCP server implementations comprehensively.

eval-sys

Maintainer

LicenseApache-2.0

Languagepython

Versionv1.0.0

UpdatedMay 20, 2026

Statushealthy

Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

View Source Browse All Servers

Installation

Manual Installation

npx mcpmark

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms

ThroughputMedium

Resource Usage

Memory UsageLow

CPU UsageLow

How to Set Up and Use MCPMark

MCPMark is a comprehensive stress-testing benchmark suite designed to evaluate the real-world tool-use capabilities of LLM agents against actual MCP servers. It covers five MCP service categories — Notion, GitHub, Filesystem, Postgres, and Playwright — with 127 standard tasks and 50 easier tasks, enabling rigorous pass@k and pass^k measurements. Teams use MCPMark to compare models, validate agent frameworks, and identify failure modes in MCP tool-calling behavior before production deployment.

Prerequisites

Python 3.10 or higher installed
Git to clone the repository
Playwright installed (`playwright install`) for browser-based tasks
Service credentials for each MCP category you want to benchmark (OpenAI, Notion, GitHub, Postgres as applicable)
Docker (optional) for containerized evaluation runs

Clone the MCPMark repository

MCPMark is installed from source. Clone the repository and enter the project directory before installing dependencies.

git clone https://github.com/eval-sys/mcpmark.git
cd mcpmark

Install Python dependencies

Install MCPMark in editable mode so the pipeline module is importable. Then install Playwright browsers for the browser automation task category.

pip install -e .
playwright install

Configure service credentials

Create a `.mcp_env` file at the repository root with credentials for each MCP service you want to benchmark. Only the credentials for services you intend to test are required.

# .mcp_env
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1

# Notion
SOURCE_NOTION_API_KEY=secret_...
EVAL_NOTION_API_KEY=secret_...
EVAL_PARENT_PAGE_TITLE=MCPMark Eval

# GitHub
GITHUB_TOKENS=ghp_...
GITHUB_EVAL_ORG=my-eval-org

# Postgres
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USERNAME=mcpmark
POSTGRES_PASSWORD=secret

# Playwright
PLAYWRIGHT_BROWSER=chromium
PLAYWRIGHT_HEADLESS=true

Run a benchmark evaluation

Execute the pipeline for a specific MCP service and task. The filesystem category requires no external accounts and is the easiest starting point. Use --k to specify the number of runs per task for statistical stability.

python -m pipeline \
  --mcp filesystem \
  --k 1 \
  --models gpt-4o \
  --tasks all

Aggregate and review results

After running evaluations, use the aggregator to compute pass@k scores, success rates per task, and overall benchmark results. Re-running the pipeline automatically skips already-completed tasks.

python -m src.aggregators.aggregate_results --exp-name my_experiment

Run with Docker for reproducibility

For CI/CD integration or to ensure a clean environment, use the Docker build scripts. The containerized version has been validated on macOS and Linux.

./build-docker.sh

MCPMark Examples

Client configuration

MCPMark is a standalone evaluation framework and does not run as an MCP server itself. Configure it in your CI environment using environment variables.

{
  "mcpServers": {
    "mcpmark": {
      "command": "python",
      "args": ["-m", "pipeline", "--mcp", "filesystem", "--k", "1", "--models", "claude-3-5-sonnet-latest"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Prompts to try

MCPMark is a command-line benchmark tool. Use these commands to evaluate different models and services.

- Run filesystem benchmark: `python -m pipeline --mcp filesystem --k 3 --models claude-3-5-sonnet-latest --tasks all`
- Run Notion tasks only: `python -m pipeline --mcp notion --k 1 --models gpt-4o --tasks all`
- Run a specific task: `python -m pipeline --mcp filesystem --k 1 --models gpt-4o --tasks file_property/size_classification`
- Enable conversation compaction: `python -m pipeline --mcp github --k 1 --models gpt-4o --compaction-token 4000`

Troubleshooting MCPMark

Playwright browser tasks fail with browser not found errors

Run `playwright install` after installing dependencies to download the browser binaries. Specify the browser type in .mcp_env with `PLAYWRIGHT_BROWSER=chromium` (or firefox/webkit).

Benchmark re-runs repeat already-completed tasks

MCPMark's auto-resume feature only skips tasks when using the same --exp-name across runs. Always pass a consistent --exp-name flag: `python -m pipeline --exp-name my_run --mcp filesystem ...`

GitHub API rate limits causing task failures

Provide multiple GitHub tokens in `GITHUB_TOKENS` as a comma-separated list. MCPMark rotates through them to avoid hitting per-token rate limits.

Frequently Asked Questions about MCPMark

What is MCPMark?

MCPMark is a Model Context Protocol (MCP) server that mcpmark is a comprehensive, stress-testing mcp benchmark designed to evaluate model and agent capabilities in real-world mcp use. It connects AI assistants to external tools and data sources through a standardized interface.

How do I install MCPMark?

Follow the installation instructions on the MCPMark GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with MCPMark?

MCPMark works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is MCPMark free to use?

Yes, MCPMark is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Learn More About MCP Servers

Getting Started with MCP

Set up your first MCP server in minutes

MCP Setup Guide

Configure MCP in Claude, Cursor & VS Code

All MCP Tutorials

18+ hands-on guides for developers

MCP FAQ

40+ answers about Model Context Protocol

MCPMark Alternatives — Similar Analytics Servers

Looking for alternatives to MCPMark? Here are other popular analytics servers you can use with Claude, Cursor, and VS Code.

OpenMetadata

★ 14.0k

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

Superset

★ 10.9k

An MCP server that provides AI assistants with full access to Apache Superset instances, enabling interaction with dashboards, charts, datasets, databases, and SQL execution capabilities.

Horizon

★ 4.4k

📡 Your own AI-powered news radar. Generates daily briefings in English & Chinese. | 用 AI 构建你专属的新闻雷达

MCP Server Chart

★ 4.1k

Enables generation of 25+ types of charts and data visualizations using AntV, including bar charts, line charts, maps, mind maps, and specialized diagrams like fishbone and sankey charts. Supports both statistical charts and geographic visualizations

Muapi CLI

★ 997

Official CLI for muapi.ai — generate images, videos & audio from the terminal. MCP server, 14 AI models, npm + pip installable.

Weather MCP Server

★ 907

Weather Data Fetcher MCP server built with Node.js, MCP SDK, and Zod. Provides weather details like temperature and forecast for cities such as Noida and Delhi via a registered tool. Simplifies API integration, enabling structured responses for clien

Browse More Analytics MCP Servers

Explore all analytics servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Analytics Browse All Servers

Set Up MCPMark in Your Editor

Choose your AI client for step-by-step setup instructions.

🖥️

Claude Desktop

macOS & Windows app

⌨️

Claude Code

CLI & terminal

📝

Cursor

AI-first code editor

💻

VS Code

GitHub Copilot MCP

🏄

Windsurf

Codeium AI editor

🔌

Cline

VS Code extension

Quick Config Preview

{
  "mcpServers": {
    "mcpmark": {
      "command": "npx",
      "args": ["-y", "mcpmark"]
    }
  }
}

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use MCPMark?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides

Explore All Servers Read Our Guides