MCP Evals
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.
What is MCP Evals?
MCP Evals is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to node.js package and github action for evaluating mcp (model context protocol) tool implementations using llm-based scoring. this helps ensure your mcp server's tools are working correctly and performi...
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.
This server falls under the Developer Tools category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- A Node.js package and GitHub Action for evaluating MCP (Mode
Use Cases
Maintainer
Works with
Installation
NPM
npx -y mcp-evalsManual Installation
npx -y mcp-evalsConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use MCP Evals
MCP Evals is a Node.js package and GitHub Action that automates quality evaluation of MCP server tool implementations using LLM-based scoring. It runs your tool against predefined scenarios and grades responses on accuracy, completeness, relevance, clarity, and reasoning — producing a numeric score that can gate CI/CD pipelines. Teams building MCP servers use it to catch regressions, validate new tools before release, and monitor tool quality over time with optional Prometheus/Grafana/Jaeger observability.
Prerequisites
- Node.js 20 or later
- npm or yarn
- An API key for either OpenAI (OPENAI_API_KEY) or Anthropic (ANTHROPIC_API_KEY) to power the LLM judge
- A working MCP server implementation to evaluate
- Optional: Docker for running the Prometheus/Grafana/Jaeger monitoring stack
Install the mcp-evals package
Add mcp-evals as a dev dependency in your MCP server project, or install it globally.
npm install --save-dev mcp-evalsSet your LLM provider API key
Export the API key for the model you want to use as the LLM judge. The default model is gpt-4.
# For OpenAI (default):
export OPENAI_API_KEY=sk-...
# For Anthropic:
export ANTHROPIC_API_KEY=sk-ant-...Create an evaluation configuration file
Write an evals.ts (TypeScript) or evals.yaml (YAML) file that defines the scenarios and expected behavior for each tool you want to evaluate.
Run the evaluation via CLI
Execute the evaluator by passing the path to your evals config and your MCP server implementation. Results are printed to stdout with a score per scenario.
npx mcp-eval path/to/evals.ts path/to/server.tsAdd to GitHub Actions CI (optional)
Add the mcp-evals GitHub Action to your workflow so evaluations run automatically on every pull request.
- uses: mclenhard/[email protected]
with:
evals_path: ./evals.ts
server_path: ./src/server.ts
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}Enable observability (optional)
Initialize the metrics module in your test harness to emit Prometheus metrics and OpenTelemetry traces for long-running eval suites.
import { metrics } from 'mcp-evals';
metrics.initialize(9090, {
enableTracing: true,
otelEndpoint: 'http://localhost:4318/v1/traces'
});MCP Evals Examples
Client configuration (running the evaluator)
MCP Evals is a CLI/CI tool, not an MCP server itself. This shows how to run it as part of a package.json script.
{
"scripts": {
"eval": "npx mcp-eval evals.ts src/server.ts"
},
"devDependencies": {
"mcp-evals": "latest"
}
}Prompts and CLI commands to try
Typical commands and scenarios for evaluating an MCP server's tools with mcp-evals.
- "npx mcp-eval evals.ts server.ts" — run all evaluation scenarios and print scores
- "npx mcp-eval evals.ts server.ts --model claude-3-5-sonnet-20241022" — use a specific Anthropic model as judge
- "Check if the weather tool returns accurate location data for New York."
- "Verify that the search tool returns relevant results for 'machine learning'."Troubleshooting MCP Evals
Evaluation fails with 'Invalid API key' error
Make sure OPENAI_API_KEY or ANTHROPIC_API_KEY is exported in your shell or set as a GitHub Actions secret. The default model is gpt-4, so an OpenAI key is needed unless you explicitly specify an Anthropic model.
GitHub Action fails with 'server_path not found'
Ensure the server_path in your workflow points to the compiled or source entry point of your MCP server, relative to the repository root. Run `npm run build` in a prior step if the server needs compilation.
Scores are consistently low even for correct tool responses
Review your evals.ts scenario definitions. The LLM judge scores against the expected behavior you specify — if the expected output is too strict or ambiguously worded, scores will be artificially low. Iterate on the scenario descriptions.
Frequently Asked Questions about MCP Evals
What is MCP Evals?
MCP Evals is a Model Context Protocol (MCP) server that node.js package and github action for evaluating mcp (model context protocol) tool implementations using llm-based scoring. this helps ensure your mcp server's tools are working correctly and performing well. It connects AI assistants to external tools and data sources through a standardized interface.
How do I install MCP Evals?
Install via npm with the command: npx -y mcp-evals. Then add the server configuration to your AI client's JSON config file (e.g., claude_desktop_config.json or .cursor/mcp.json).
Which AI clients work with MCP Evals?
MCP Evals works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is MCP Evals free to use?
Yes, MCP Evals is open source and available under the MIT license. You can use it freely in both personal and commercial projects.
MCP Evals Alternatives — Similar Developer Tools Servers
Looking for alternatives to MCP Evals? Here are other popular developer tools servers you can use with Claude, Cursor, and VS Code.
Ecc
★ 188.2kThe agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
Javaguide
★ 155.8kJava 面试 & 后端通用面试指南,覆盖计算机基础、数据库、分布式、高并发、系统设计与 AI 应用开发
Gemini CLI
★ 104.5kA secure MCP server that wraps the Google Gemini CLI, allowing clients to query Gemini models using local OAuth sessions without requiring an API key. It provides tools for model interaction and diagnostics with built-in protection against command in
Awesome MCP Servers
★ 87.3k⭐ Curated list of Model Context Protocol (MCP) servers - tools that extend Claude Desktop, Cursor, Windsurf, and other MCP clients with custom capabilities.
MCP Servers
★ 86.0kModel Context Protocol Servers
CC Switch
★ 77.5kA cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io
Browse More Developer Tools MCP Servers
Explore all developer tools servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up MCP Evals in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use MCP Evals?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.