MCP Bench
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
What is MCP Bench?
MCP Bench is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcp-bench: benchmarking tool-using llm agents with complex real-world tasks via mcp servers
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
This server falls under the Analytics category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex R
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx mcp-benchConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use MCP Bench
MCP-Bench is an open-source benchmarking framework from Accenture that evaluates the performance of tool-using LLM agents on complex, real-world tasks delivered through MCP servers. It provides 28 domain-specific MCP tool servers — spanning biomedical data, cryptocurrency analytics, geographic mapping, academic paper search, weather forecasting, and more — and uses an LLM-as-judge approach (with o4-mini as the default judge) to assess task completion, tool usage correctness, and planning effectiveness. AI researchers and enterprise teams evaluating which LLM or agent architecture to deploy for tool-augmented workflows can use MCP-Bench to run reproducible, multi-dimensional comparisons across models like GPT-5, o3, and Gemini 2.5 Pro.
Prerequisites
- Python 3.10 with Conda (miniconda or anaconda) installed
- Git for cloning the repository
- An OpenRouter API key (OPENROUTER_API_KEY) for accessing models
- Or Azure OpenAI credentials: AZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINT
- Bash shell for running the MCP server installer script
Clone the repository
Clone the MCP-Bench repository from Accenture's GitHub organization.
git clone https://github.com/Accenture/mcp-bench.git
cd mcp-benchCreate and activate a Conda environment
Create a dedicated Python 3.10 environment to isolate MCP-Bench dependencies.
conda create -n mcpbench python=3.10
conda activate mcpbenchInstall the MCP servers
Run the provided installer script from the mcp_servers directory to set up all 28 domain-specific MCP tool servers.
cd mcp_servers && bash ./install.sh && cd ..Set your API credentials
Export your LLM provider credentials. OpenRouter is the primary supported provider for accessing a wide range of models.
export OPENROUTER_API_KEY=your_openrouter_api_key
# Or for Azure OpenAI:
export AZURE_OPENAI_API_KEY=your_azure_key
export AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/Run the benchmark against a model
Execute the benchmark runner against a specific model using the single-server task file. Replace 'gpt-oss-20b' with your target model identifier.
python run_benchmark.py --models gpt-oss-20b --tasks-file tasks/mcpbench_tasks_single_runner_format.jsonReview results
Inspect the output scores for each evaluation dimension: rule-based schema understanding, LLM-judged task completion, tool usage accuracy, and planning effectiveness. Compare results across multiple models.
MCP Bench Examples
Client configuration
Claude Desktop config for connecting to the MCP-Bench evaluation server.
{
"mcpServers": {
"mcp-bench": {
"command": "npx",
"args": ["mcp-bench"],
"env": {
"OPENROUTER_API_KEY": "your_openrouter_api_key"
}
}
}
}Prompts to try
Commands and queries for running and analyzing benchmarks with MCP-Bench.
- "Run the benchmark on gpt-4o using the single-server task set"
- "python run_benchmark.py --models o3 --tasks-file tasks/mcpbench_tasks_single_runner_format.json"
- "Compare benchmark results between gemini-2.5-pro and gpt-oss-120b"
- "Run benchmarks on multi-server tasks to test cross-domain tool coordination"
- "Show me the top scoring models on the biomedical and geographic tool categories"Troubleshooting MCP Bench
install.sh fails with missing pip packages or permission errors
Ensure the mcpbench Conda environment is activated before running the installer ('conda activate mcpbench'). If individual MCP server installs fail, check the install.sh script for per-server requirements and install them manually.
Benchmark runner fails with 'Invalid API key' or 401 errors
Verify OPENROUTER_API_KEY or your Azure credentials are exported in the active shell. For OpenRouter, ensure your account has credits and the model you specified is available at openrouter.ai/models.
LLM judge returns inconsistent scores between runs
This is expected behavior with LLM-as-judge evaluation — scores have some variance. Run multiple benchmark passes and average results for reliable comparison. The framework uses o4-mini as the default judge model.
Frequently Asked Questions about MCP Bench
What is MCP Bench?
MCP Bench is a Model Context Protocol (MCP) server that mcp-bench: benchmarking tool-using llm agents with complex real-world tasks via mcp servers It connects AI assistants to external tools and data sources through a standardized interface.
How do I install MCP Bench?
Follow the installation instructions on the MCP Bench GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with MCP Bench?
MCP Bench works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is MCP Bench free to use?
Yes, MCP Bench is open source and available under the MIT license. You can use it freely in both personal and commercial projects.
MCP Bench Alternatives — Similar Analytics Servers
Looking for alternatives to MCP Bench? Here are other popular analytics servers you can use with Claude, Cursor, and VS Code.
OpenMetadata
★ 14.0kOpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Superset
★ 10.9kAn MCP server that provides AI assistants with full access to Apache Superset instances, enabling interaction with dashboards, charts, datasets, databases, and SQL execution capabilities.
Horizon
★ 4.4k📡 Your own AI-powered news radar. Generates daily briefings in English & Chinese. | 用 AI 构建你专属的新闻雷达
MCP Server Chart
★ 4.1kEnables generation of 25+ types of charts and data visualizations using AntV, including bar charts, line charts, maps, mind maps, and specialized diagrams like fishbone and sankey charts. Supports both statistical charts and geographic visualizations
Muapi CLI
★ 997Official CLI for muapi.ai — generate images, videos & audio from the terminal. MCP server, 14 AI models, npm + pip installable.
Weather MCP Server
★ 907Weather Data Fetcher MCP server built with Node.js, MCP SDK, and Zod. Provides weather details like temperature and forecast for cities such as Noida and Delhi via a registered tool. Simplifies API integration, enabling structured responses for clien
Browse More Analytics MCP Servers
Explore all analytics servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up MCP Bench in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use MCP Bench?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.