How do I install MCP Bench MCP Server?

Follow the setup instructions on the MCP Bench GitHub repository, then add the server configuration to your AI client.

What category is MCP Bench MCP Server?

MCP Bench is categorized under Analytics. Browse more servers in these categories on MCPgee.

MCP Bench

Name: Mcp Bench MCP Server
Author: Accenture

v1.0.0•Analytics•stable

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

mcp-benchmcpai-integration

484

Stars

Downloads

Weekly

0/5

View on GitHub

What is MCP Bench?

MCP Bench is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcp-bench: benchmarking tool-using llm agents with complex real-world tasks via mcp servers

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

This server falls under the Analytics category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex R

Use Cases

Benchmark tool-using LLM agents on real-world complex tasks.

Accenture

Maintainer

LicenseMIT

Languagepython

Versionv1.0.0

UpdatedMay 18, 2026

Statushealthy

Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

View Source Browse All Servers

Installation

Manual Installation

npx mcp-bench

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms

ThroughputMedium

Resource Usage

Memory UsageLow

CPU UsageLow

How to Set Up and Use MCP Bench

MCP-Bench is an open-source benchmarking framework from Accenture that evaluates the performance of tool-using LLM agents on complex, real-world tasks delivered through MCP servers. It provides 28 domain-specific MCP tool servers — spanning biomedical data, cryptocurrency analytics, geographic mapping, academic paper search, weather forecasting, and more — and uses an LLM-as-judge approach (with o4-mini as the default judge) to assess task completion, tool usage correctness, and planning effectiveness. AI researchers and enterprise teams evaluating which LLM or agent architecture to deploy for tool-augmented workflows can use MCP-Bench to run reproducible, multi-dimensional comparisons across models like GPT-5, o3, and Gemini 2.5 Pro.

Prerequisites

Python 3.10 with Conda (miniconda or anaconda) installed
Git for cloning the repository
An OpenRouter API key (OPENROUTER_API_KEY) for accessing models
Or Azure OpenAI credentials: AZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINT
Bash shell for running the MCP server installer script

Clone the repository

Clone the MCP-Bench repository from Accenture's GitHub organization.

git clone https://github.com/Accenture/mcp-bench.git
cd mcp-bench

Create and activate a Conda environment

Create a dedicated Python 3.10 environment to isolate MCP-Bench dependencies.

conda create -n mcpbench python=3.10
conda activate mcpbench

Install the MCP servers

Run the provided installer script from the mcp_servers directory to set up all 28 domain-specific MCP tool servers.

cd mcp_servers && bash ./install.sh && cd ..

Set your API credentials

Export your LLM provider credentials. OpenRouter is the primary supported provider for accessing a wide range of models.

export OPENROUTER_API_KEY=your_openrouter_api_key
# Or for Azure OpenAI:
export AZURE_OPENAI_API_KEY=your_azure_key
export AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/

Run the benchmark against a model

Execute the benchmark runner against a specific model using the single-server task file. Replace 'gpt-oss-20b' with your target model identifier.

python run_benchmark.py --models gpt-oss-20b --tasks-file tasks/mcpbench_tasks_single_runner_format.json

Review results

Inspect the output scores for each evaluation dimension: rule-based schema understanding, LLM-judged task completion, tool usage accuracy, and planning effectiveness. Compare results across multiple models.

MCP Bench Examples

Client configuration

Claude Desktop config for connecting to the MCP-Bench evaluation server.

{
  "mcpServers": {
    "mcp-bench": {
      "command": "npx",
      "args": ["mcp-bench"],
      "env": {
        "OPENROUTER_API_KEY": "your_openrouter_api_key"
      }
    }
  }
}

Prompts to try

Commands and queries for running and analyzing benchmarks with MCP-Bench.

- "Run the benchmark on gpt-4o using the single-server task set"
- "python run_benchmark.py --models o3 --tasks-file tasks/mcpbench_tasks_single_runner_format.json"
- "Compare benchmark results between gemini-2.5-pro and gpt-oss-120b"
- "Run benchmarks on multi-server tasks to test cross-domain tool coordination"
- "Show me the top scoring models on the biomedical and geographic tool categories"

Troubleshooting MCP Bench

install.sh fails with missing pip packages or permission errors

Ensure the mcpbench Conda environment is activated before running the installer ('conda activate mcpbench'). If individual MCP server installs fail, check the install.sh script for per-server requirements and install them manually.

Benchmark runner fails with 'Invalid API key' or 401 errors

Verify OPENROUTER_API_KEY or your Azure credentials are exported in the active shell. For OpenRouter, ensure your account has credits and the model you specified is available at openrouter.ai/models.

LLM judge returns inconsistent scores between runs

This is expected behavior with LLM-as-judge evaluation — scores have some variance. Run multiple benchmark passes and average results for reliable comparison. The framework uses o4-mini as the default judge model.

Frequently Asked Questions about MCP Bench

What is MCP Bench?

MCP Bench is a Model Context Protocol (MCP) server that mcp-bench: benchmarking tool-using llm agents with complex real-world tasks via mcp servers It connects AI assistants to external tools and data sources through a standardized interface.

How do I install MCP Bench?

Follow the installation instructions on the MCP Bench GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with MCP Bench?

MCP Bench works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is MCP Bench free to use?

Yes, MCP Bench is open source and available under the MIT license. You can use it freely in both personal and commercial projects.

Learn More About MCP Servers

Getting Started with MCP

Set up your first MCP server in minutes

MCP Setup Guide

Configure MCP in Claude, Cursor & VS Code

All MCP Tutorials

18+ hands-on guides for developers

MCP FAQ

40+ answers about Model Context Protocol

MCP Bench Alternatives — Similar Analytics Servers

Looking for alternatives to MCP Bench? Here are other popular analytics servers you can use with Claude, Cursor, and VS Code.

OpenMetadata

★ 14.0k

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

Superset

★ 10.9k

An MCP server that provides AI assistants with full access to Apache Superset instances, enabling interaction with dashboards, charts, datasets, databases, and SQL execution capabilities.

Horizon

★ 4.4k

📡 Your own AI-powered news radar. Generates daily briefings in English & Chinese. | 用 AI 构建你专属的新闻雷达

MCP Server Chart

★ 4.1k

Enables generation of 25+ types of charts and data visualizations using AntV, including bar charts, line charts, maps, mind maps, and specialized diagrams like fishbone and sankey charts. Supports both statistical charts and geographic visualizations

Muapi CLI

★ 997

Official CLI for muapi.ai — generate images, videos & audio from the terminal. MCP server, 14 AI models, npm + pip installable.

Weather MCP Server

★ 907

Weather Data Fetcher MCP server built with Node.js, MCP SDK, and Zod. Provides weather details like temperature and forecast for cities such as Noida and Delhi via a registered tool. Simplifies API integration, enabling structured responses for clien

Browse More Analytics MCP Servers

Explore all analytics servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Analytics Browse All Servers

Set Up MCP Bench in Your Editor

Choose your AI client for step-by-step setup instructions.

🖥️

Claude Desktop

macOS & Windows app

⌨️

Claude Code

CLI & terminal

📝

Cursor

AI-first code editor

💻

VS Code

GitHub Copilot MCP

🏄

Windsurf

Codeium AI editor

🔌

Cline

VS Code extension

Quick Config Preview

{
  "mcpServers": {
    "mcp-bench": {
      "command": "npx",
      "args": ["-y", "mcp-bench"]
    }
  }
}

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use MCP Bench?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides

Explore All Servers Read Our Guides