MCP Evals

v2.0.1Developer Toolsstable

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

aievalsmcpevaluationgithub-actions
Share:
128
Stars
0
Downloads
0
Weekly
0/5

What is MCP Evals?

MCP Evals is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to node.js package and github action for evaluating mcp (model context protocol) tool implementations using llm-based scoring. this helps ensure your mcp server's tools are working correctly and performi...

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

This server falls under the Developer Tools category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • A Node.js package and GitHub Action for evaluating MCP (Mode

Use Cases

Evaluate MCP tool implementations
Score tool performance with LLMs
Validate MCP server functionality
mclenhard

Maintainer

LicenseMIT
Languagetypescript
Versionv2.0.1
UpdatedMay 15, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

NPM

npx -y mcp-evals

Manual Installation

npx -y mcp-evals

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use MCP Evals

MCP Evals is a Node.js package and GitHub Action that automates quality evaluation of MCP server tool implementations using LLM-based scoring. It runs your tool against predefined scenarios and grades responses on accuracy, completeness, relevance, clarity, and reasoning — producing a numeric score that can gate CI/CD pipelines. Teams building MCP servers use it to catch regressions, validate new tools before release, and monitor tool quality over time with optional Prometheus/Grafana/Jaeger observability.

Prerequisites

  • Node.js 20 or later
  • npm or yarn
  • An API key for either OpenAI (OPENAI_API_KEY) or Anthropic (ANTHROPIC_API_KEY) to power the LLM judge
  • A working MCP server implementation to evaluate
  • Optional: Docker for running the Prometheus/Grafana/Jaeger monitoring stack
1

Install the mcp-evals package

Add mcp-evals as a dev dependency in your MCP server project, or install it globally.

npm install --save-dev mcp-evals
2

Set your LLM provider API key

Export the API key for the model you want to use as the LLM judge. The default model is gpt-4.

# For OpenAI (default):
export OPENAI_API_KEY=sk-...

# For Anthropic:
export ANTHROPIC_API_KEY=sk-ant-...
3

Create an evaluation configuration file

Write an evals.ts (TypeScript) or evals.yaml (YAML) file that defines the scenarios and expected behavior for each tool you want to evaluate.

4

Run the evaluation via CLI

Execute the evaluator by passing the path to your evals config and your MCP server implementation. Results are printed to stdout with a score per scenario.

npx mcp-eval path/to/evals.ts path/to/server.ts
5

Add to GitHub Actions CI (optional)

Add the mcp-evals GitHub Action to your workflow so evaluations run automatically on every pull request.

- uses: mclenhard/[email protected]
  with:
    evals_path: ./evals.ts
    server_path: ./src/server.ts
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
6

Enable observability (optional)

Initialize the metrics module in your test harness to emit Prometheus metrics and OpenTelemetry traces for long-running eval suites.

import { metrics } from 'mcp-evals';
metrics.initialize(9090, {
  enableTracing: true,
  otelEndpoint: 'http://localhost:4318/v1/traces'
});

MCP Evals Examples

Client configuration (running the evaluator)

MCP Evals is a CLI/CI tool, not an MCP server itself. This shows how to run it as part of a package.json script.

{
  "scripts": {
    "eval": "npx mcp-eval evals.ts src/server.ts"
  },
  "devDependencies": {
    "mcp-evals": "latest"
  }
}

Prompts and CLI commands to try

Typical commands and scenarios for evaluating an MCP server's tools with mcp-evals.

- "npx mcp-eval evals.ts server.ts" — run all evaluation scenarios and print scores
- "npx mcp-eval evals.ts server.ts --model claude-3-5-sonnet-20241022" — use a specific Anthropic model as judge
- "Check if the weather tool returns accurate location data for New York."
- "Verify that the search tool returns relevant results for 'machine learning'."

Troubleshooting MCP Evals

Evaluation fails with 'Invalid API key' error

Make sure OPENAI_API_KEY or ANTHROPIC_API_KEY is exported in your shell or set as a GitHub Actions secret. The default model is gpt-4, so an OpenAI key is needed unless you explicitly specify an Anthropic model.

GitHub Action fails with 'server_path not found'

Ensure the server_path in your workflow points to the compiled or source entry point of your MCP server, relative to the repository root. Run `npm run build` in a prior step if the server needs compilation.

Scores are consistently low even for correct tool responses

Review your evals.ts scenario definitions. The LLM judge scores against the expected behavior you specify — if the expected output is too strict or ambiguously worded, scores will be artificially low. Iterate on the scenario descriptions.

Frequently Asked Questions about MCP Evals

What is MCP Evals?

MCP Evals is a Model Context Protocol (MCP) server that node.js package and github action for evaluating mcp (model context protocol) tool implementations using llm-based scoring. this helps ensure your mcp server's tools are working correctly and performing well. It connects AI assistants to external tools and data sources through a standardized interface.

How do I install MCP Evals?

Install via npm with the command: npx -y mcp-evals. Then add the server configuration to your AI client's JSON config file (e.g., claude_desktop_config.json or .cursor/mcp.json).

Which AI clients work with MCP Evals?

MCP Evals works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is MCP Evals free to use?

Yes, MCP Evals is open source and available under the MIT license. You can use it freely in both personal and commercial projects.

Browse More Developer Tools MCP Servers

Explore all developer tools servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "mcp-evals": { "command": "npx", "args": ["-y", "mcp-evals"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use MCP Evals?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides