MCP Bench

v1.0.0Analyticsstable

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

mcp-benchmcpai-integration
Share:
484
Stars
0
Downloads
0
Weekly
0/5

What is MCP Bench?

MCP Bench is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcp-bench: benchmarking tool-using llm agents with complex real-world tasks via mcp servers

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

This server falls under the Analytics category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex R

Use Cases

Benchmark tool-using LLM agents on real-world complex tasks.
Accenture

Maintainer

LicenseMIT
Languagepython
Versionv1.0.0
UpdatedMay 18, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx mcp-bench

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use MCP Bench

MCP-Bench is an open-source benchmarking framework from Accenture that evaluates the performance of tool-using LLM agents on complex, real-world tasks delivered through MCP servers. It provides 28 domain-specific MCP tool servers — spanning biomedical data, cryptocurrency analytics, geographic mapping, academic paper search, weather forecasting, and more — and uses an LLM-as-judge approach (with o4-mini as the default judge) to assess task completion, tool usage correctness, and planning effectiveness. AI researchers and enterprise teams evaluating which LLM or agent architecture to deploy for tool-augmented workflows can use MCP-Bench to run reproducible, multi-dimensional comparisons across models like GPT-5, o3, and Gemini 2.5 Pro.

Prerequisites

  • Python 3.10 with Conda (miniconda or anaconda) installed
  • Git for cloning the repository
  • An OpenRouter API key (OPENROUTER_API_KEY) for accessing models
  • Or Azure OpenAI credentials: AZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINT
  • Bash shell for running the MCP server installer script
1

Clone the repository

Clone the MCP-Bench repository from Accenture's GitHub organization.

git clone https://github.com/Accenture/mcp-bench.git
cd mcp-bench
2

Create and activate a Conda environment

Create a dedicated Python 3.10 environment to isolate MCP-Bench dependencies.

conda create -n mcpbench python=3.10
conda activate mcpbench
3

Install the MCP servers

Run the provided installer script from the mcp_servers directory to set up all 28 domain-specific MCP tool servers.

cd mcp_servers && bash ./install.sh && cd ..
4

Set your API credentials

Export your LLM provider credentials. OpenRouter is the primary supported provider for accessing a wide range of models.

export OPENROUTER_API_KEY=your_openrouter_api_key
# Or for Azure OpenAI:
export AZURE_OPENAI_API_KEY=your_azure_key
export AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
5

Run the benchmark against a model

Execute the benchmark runner against a specific model using the single-server task file. Replace 'gpt-oss-20b' with your target model identifier.

python run_benchmark.py --models gpt-oss-20b --tasks-file tasks/mcpbench_tasks_single_runner_format.json
6

Review results

Inspect the output scores for each evaluation dimension: rule-based schema understanding, LLM-judged task completion, tool usage accuracy, and planning effectiveness. Compare results across multiple models.

MCP Bench Examples

Client configuration

Claude Desktop config for connecting to the MCP-Bench evaluation server.

{
  "mcpServers": {
    "mcp-bench": {
      "command": "npx",
      "args": ["mcp-bench"],
      "env": {
        "OPENROUTER_API_KEY": "your_openrouter_api_key"
      }
    }
  }
}

Prompts to try

Commands and queries for running and analyzing benchmarks with MCP-Bench.

- "Run the benchmark on gpt-4o using the single-server task set"
- "python run_benchmark.py --models o3 --tasks-file tasks/mcpbench_tasks_single_runner_format.json"
- "Compare benchmark results between gemini-2.5-pro and gpt-oss-120b"
- "Run benchmarks on multi-server tasks to test cross-domain tool coordination"
- "Show me the top scoring models on the biomedical and geographic tool categories"

Troubleshooting MCP Bench

install.sh fails with missing pip packages or permission errors

Ensure the mcpbench Conda environment is activated before running the installer ('conda activate mcpbench'). If individual MCP server installs fail, check the install.sh script for per-server requirements and install them manually.

Benchmark runner fails with 'Invalid API key' or 401 errors

Verify OPENROUTER_API_KEY or your Azure credentials are exported in the active shell. For OpenRouter, ensure your account has credits and the model you specified is available at openrouter.ai/models.

LLM judge returns inconsistent scores between runs

This is expected behavior with LLM-as-judge evaluation — scores have some variance. Run multiple benchmark passes and average results for reliable comparison. The framework uses o4-mini as the default judge model.

Frequently Asked Questions about MCP Bench

What is MCP Bench?

MCP Bench is a Model Context Protocol (MCP) server that mcp-bench: benchmarking tool-using llm agents with complex real-world tasks via mcp servers It connects AI assistants to external tools and data sources through a standardized interface.

How do I install MCP Bench?

Follow the installation instructions on the MCP Bench GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with MCP Bench?

MCP Bench works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is MCP Bench free to use?

Yes, MCP Bench is open source and available under the MIT license. You can use it freely in both personal and commercial projects.

Browse More Analytics MCP Servers

Explore all analytics servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "mcp-bench": { "command": "npx", "args": ["-y", "mcp-bench"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use MCP Bench?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides