What category is MCP Evals MCP Server?

MCP Evals is categorized under Developer Tools. Browse more servers in these categories on MCPgee.

MCP Evals

Name: Mcp Evals MCP Server
Author: mclenhard

v2.0.1•Developer Tools•stable

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

aievalsmcpevaluationgithub-actions

128

Stars

Downloads

Weekly

0/5

View on GitHub

What is MCP Evals?

MCP Evals is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to node.js package and github action for evaluating mcp (model context protocol) tool implementations using llm-based scoring. this helps ensure your mcp server's tools are working correctly and performi...

This server falls under the Developer Tools category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

A Node.js package and GitHub Action for evaluating MCP (Mode

Use Cases

Evaluate MCP tool implementations

Score tool performance with LLMs

Validate MCP server functionality

mclenhard

Maintainer

LicenseMIT

Languagetypescript

Versionv2.0.1

UpdatedMay 15, 2026

Statushealthy

Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

View Source Browse All Servers

Installation

NPM

npx -y mcp-evals

Manual Installation

npx -y mcp-evals

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms

ThroughputMedium

Resource Usage

Memory UsageLow

CPU UsageLow

How to Set Up and Use MCP Evals

MCP Evals is a Node.js package and GitHub Action that automates quality evaluation of MCP server tool implementations using LLM-based scoring. It runs your tool against predefined scenarios and grades responses on accuracy, completeness, relevance, clarity, and reasoning — producing a numeric score that can gate CI/CD pipelines. Teams building MCP servers use it to catch regressions, validate new tools before release, and monitor tool quality over time with optional Prometheus/Grafana/Jaeger observability.

Prerequisites

Node.js 20 or later
npm or yarn
An API key for either OpenAI (OPENAI_API_KEY) or Anthropic (ANTHROPIC_API_KEY) to power the LLM judge
A working MCP server implementation to evaluate
Optional: Docker for running the Prometheus/Grafana/Jaeger monitoring stack

Install the mcp-evals package

Add mcp-evals as a dev dependency in your MCP server project, or install it globally.

npm install --save-dev mcp-evals

Set your LLM provider API key

Export the API key for the model you want to use as the LLM judge. The default model is gpt-4.

# For OpenAI (default):
export OPENAI_API_KEY=sk-...

# For Anthropic:
export ANTHROPIC_API_KEY=sk-ant-...

Create an evaluation configuration file

Write an evals.ts (TypeScript) or evals.yaml (YAML) file that defines the scenarios and expected behavior for each tool you want to evaluate.

Run the evaluation via CLI

Execute the evaluator by passing the path to your evals config and your MCP server implementation. Results are printed to stdout with a score per scenario.

npx mcp-eval path/to/evals.ts path/to/server.ts

Add to GitHub Actions CI (optional)

Add the mcp-evals GitHub Action to your workflow so evaluations run automatically on every pull request.

- uses: mclenhard/[email protected]
  with:
    evals_path: ./evals.ts
    server_path: ./src/server.ts
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Enable observability (optional)

Initialize the metrics module in your test harness to emit Prometheus metrics and OpenTelemetry traces for long-running eval suites.

import { metrics } from 'mcp-evals';
metrics.initialize(9090, {
  enableTracing: true,
  otelEndpoint: 'http://localhost:4318/v1/traces'
});

MCP Evals Examples

Client configuration (running the evaluator)

MCP Evals is a CLI/CI tool, not an MCP server itself. This shows how to run it as part of a package.json script.

{
  "scripts": {
    "eval": "npx mcp-eval evals.ts src/server.ts"
  },
  "devDependencies": {
    "mcp-evals": "latest"
  }
}

Prompts and CLI commands to try

Typical commands and scenarios for evaluating an MCP server's tools with mcp-evals.

- "npx mcp-eval evals.ts server.ts" — run all evaluation scenarios and print scores
- "npx mcp-eval evals.ts server.ts --model claude-3-5-sonnet-20241022" — use a specific Anthropic model as judge
- "Check if the weather tool returns accurate location data for New York."
- "Verify that the search tool returns relevant results for 'machine learning'."

Troubleshooting MCP Evals

Evaluation fails with 'Invalid API key' error

Make sure OPENAI_API_KEY or ANTHROPIC_API_KEY is exported in your shell or set as a GitHub Actions secret. The default model is gpt-4, so an OpenAI key is needed unless you explicitly specify an Anthropic model.

GitHub Action fails with 'server_path not found'

Ensure the server_path in your workflow points to the compiled or source entry point of your MCP server, relative to the repository root. Run `npm run build` in a prior step if the server needs compilation.

Scores are consistently low even for correct tool responses

Review your evals.ts scenario definitions. The LLM judge scores against the expected behavior you specify — if the expected output is too strict or ambiguously worded, scores will be artificially low. Iterate on the scenario descriptions.

Frequently Asked Questions about MCP Evals

What is MCP Evals?

MCP Evals is a Model Context Protocol (MCP) server that node.js package and github action for evaluating mcp (model context protocol) tool implementations using llm-based scoring. this helps ensure your mcp server's tools are working correctly and performing well. It connects AI assistants to external tools and data sources through a standardized interface.

How do I install MCP Evals?

Install via npm with the command: npx -y mcp-evals. Then add the server configuration to your AI client's JSON config file (e.g., claude_desktop_config.json or .cursor/mcp.json).

Which AI clients work with MCP Evals?

MCP Evals works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is MCP Evals free to use?

Yes, MCP Evals is open source and available under the MIT license. You can use it freely in both personal and commercial projects.

Learn More About MCP Servers

Getting Started with MCP

Set up your first MCP server in minutes

MCP Setup Guide

Configure MCP in Claude, Cursor & VS Code

All MCP Tutorials

18+ hands-on guides for developers

MCP FAQ

40+ answers about Model Context Protocol

MCP Evals Alternatives — Similar Developer Tools Servers

Looking for alternatives to MCP Evals? Here are other popular developer tools servers you can use with Claude, Cursor, and VS Code.

Ecc

★ 188.2k

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

Javaguide

★ 155.8k

Java 面试 & 后端通用面试指南，覆盖计算机基础、数据库、分布式、高并发、系统设计与 AI 应用开发

Gemini CLI

★ 104.5k

A secure MCP server that wraps the Google Gemini CLI, allowing clients to query Gemini models using local OAuth sessions without requiring an API key. It provides tools for model interaction and diagnostics with built-in protection against command in

Awesome MCP Servers

★ 87.3k

⭐ Curated list of Model Context Protocol (MCP) servers - tools that extend Claude Desktop, Cursor, Windsurf, and other MCP clients with custom capabilities.

MCP Servers

★ 86.0k

Model Context Protocol Servers

CC Switch

★ 77.5k

A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io

Browse More Developer Tools MCP Servers

Explore all developer tools servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Developer Tools Browse All Servers

Set Up MCP Evals in Your Editor

Choose your AI client for step-by-step setup instructions.

🖥️

Claude Desktop

macOS & Windows app

⌨️

Claude Code

CLI & terminal

📝

Cursor

AI-first code editor

💻

VS Code

GitHub Copilot MCP

🏄

Windsurf

Codeium AI editor

🔌

Cline

VS Code extension

Quick Config Preview

{
  "mcpServers": {
    "mcp-evals": {
      "command": "npx",
      "args": ["-y", "mcp-evals"]
    }
  }
}

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use MCP Evals?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides

Explore All Servers Read Our Guides