How do I install SRE Agent MCP Server?

Follow the setup instructions on the SRE Agent GitHub repository, then add the server configuration to your AI client.

What category is SRE Agent MCP Server?

SRE Agent is categorized under Monitoring & Observability. Browse more servers in these categories on MCPgee.

SRE Agent

Name: Sre Agent MCP Server
Author: martinimarcello00

v1.0.0•Monitoring & Observability•stable

Autonomous agent for Kubernetes incident detection, diagnosis, and mitigation using LLMs and modular workflows. Integrates LangChain, LangGraph, and MCP servers to enable automated SRE tasks in cloud-native environments.

aiopsautomated-diagnosis-systemautonomous-agentscloud-monitoringcloud-native

Stars

Downloads

Weekly

0/5

View on GitHub

What is SRE Agent?

SRE Agent is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to autonomous agent for kubernetes incident detection, diagnosis, and mitigation using llms and modular workflows. integrates langchain, langgraph, and mcp servers to enable automated sre tasks in cloud-...

This server falls under the Monitoring & Observability category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

Autonomous agent for Kubernetes incident detection, diagnosi

Use Cases

Kubernetes incident detection and mitigation

Autonomous cloud-native problem diagnosis

martinimarcello00

Maintainer

LicenseMIT

Languagejupyter notebook

Versionv1.0.0

UpdatedMay 5, 2026

Statushealthy

Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

View Source Browse All Servers

Installation

Manual Installation

npx sre-agent

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms

ThroughputMedium

Resource Usage

Memory UsageLow

CPU UsageLow

How to Set Up and Use SRE Agent

SRE Agent is an autonomous multi-agent system for automated Kubernetes incident response, combining LangChain, LangGraph, and a custom MCP server to diagnose and mitigate faults in cloud-native environments. It implements a Divide and Conquer strategy with parallel RCA Worker agents guided by a topology-aware Planner and a hybrid Triage Agent grounded in the Four Golden Signals (Latency, Errors, Traffic, Saturation). The custom MCP server standardizes access to observability tools including Prometheus, Jaeger, and the Kubernetes API, while reducing context window usage through pre-digested data summaries.

Prerequisites

Python 3.13 or higher and Poetry package manager installed
Docker and Kind (Kubernetes in Docker) for spinning up test clusters
Make utility for AIOpsLab commands
OpenAI API key (used for GPT-based LLM reasoning in the agent)
AIOpsLab framework available for fault injection benchmarks (optional but needed for automated experiments)

Clone the repository

Clone the SRE-agent repository to your local machine.

git clone https://github.com/martinimarcello00/SRE-agent.git
cd SRE-agent

Install dependencies with Poetry

Use Poetry to install all project dependencies including LangChain, LangGraph, and the custom MCP server package.

poetry install

Configure environment variables

Copy the example environment file and add your API keys. At minimum, set your OpenAI API key for the LLM reasoning components.

cp .env.example .env
# Edit .env and set:
# OPENAI_API_KEY=your_openai_api_key_here

Start the MCP server for observability tools

The custom MCP server in the MCP-server/ directory exposes Prometheus metrics, Jaeger traces, and the Kubernetes API as MCP tools. Start it before running the agent.

cd MCP-server
poetry run python -m mcp_server

Run the SRE agent interactively

Launch the multi-agent system using LangGraph Studio for a visual development experience, or run the agent script directly for a specific experiment scenario.

# Option A: LangGraph Studio (visual dev UI)
cd sre-agent
poetry run langgraph dev

# Option B: Direct script
python sre-agent/sre-agent.py

Run automated experiments (optional)

Use the automated_experiment.py script to run batch experiments: it provisions a Kind cluster, injects faults via AIOpsLab, runs the agent, evaluates results, and cleans up.

python automated_experiment.py

SRE Agent Examples

Client configuration

MCP client configuration for the SRE Agent's custom observability MCP server running locally over stdio.

{
  "mcpServers": {
    "sre-observability": {
      "command": "poetry",
      "args": ["run", "python", "-m", "mcp_server"],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here"
      }
    }
  }
}

Prompts to try

Example interactions for driving automated Kubernetes incident diagnosis through the SRE agent.

- "Detect any anomalies in the Hotel Reservation service cluster and report the root cause"
- "Query Prometheus for error rate spikes in the last 10 minutes across all services"
- "Check Jaeger traces for high-latency requests in the payment service"
- "List all unhealthy pods in the default namespace and suggest mitigation steps"
- "Run a full RCA on the current cluster topology starting from the Four Golden Signals"

Troubleshooting SRE Agent

Poetry install fails with Python version mismatch

This project requires Python 3.13+. Install the correct version and set it for the project with 'poetry env use python3.13' before running 'poetry install'.

MCP server cannot connect to Prometheus or Kubernetes API

Ensure your Kind cluster is running ('kind get clusters') and that kubectl is configured to point to it ('kubectl config current-context'). Prometheus must be deployed in the cluster and accessible at the configured endpoint in .env.

Agent produces hallucinated root cause diagnoses

The Triage Agent uses deterministic heuristics on Four Golden Signals to ground diagnoses. If results seem unreliable, ensure Prometheus metrics are being scraped correctly and that the Datagraph topology file reflects the actual cluster service dependencies.

Frequently Asked Questions about SRE Agent

What is SRE Agent?

SRE Agent is a Model Context Protocol (MCP) server that autonomous agent for kubernetes incident detection, diagnosis, and mitigation using llms and modular workflows. integrates langchain, langgraph, and mcp servers to enable automated sre tasks in cloud-native environments. It connects AI assistants to external tools and data sources through a standardized interface.

How do I install SRE Agent?

Follow the installation instructions on the SRE Agent GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with SRE Agent?

SRE Agent works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is SRE Agent free to use?

Yes, SRE Agent is open source and available under the MIT license. You can use it freely in both personal and commercial projects.

Learn More About MCP Servers

Getting Started with MCP

Set up your first MCP server in minutes

MCP Setup Guide

Configure MCP in Claude, Cursor & VS Code

All MCP Tutorials

18+ hands-on guides for developers

MCP FAQ

40+ answers about Model Context Protocol

SRE Agent Alternatives — Similar Monitoring & Observability Servers

Looking for alternatives to SRE Agent? Here are other popular monitoring & observability servers you can use with Claude, Cursor, and VS Code.

Netdata

★ 78.9k

Real-time infrastructure monitoring with metrics, logs, alerts, and ML-based anomaly detection.

Kubeshark

★ 11.9k

eBPF-powered network observability for Kubernetes. Indexes L4/L7 traffic with full K8s context, decrypts TLS without keys. Queryable by AI agents via MCP and humans via dashboard.

Mission Control

★ 4.9k

Self-hosted AI agent orchestration platform: dispatch tasks, run multi-agent workflows, monitor spend, and govern operations from one mission control dashboard.

Grafana

★ 3.0k

This MCP server enables natural-language querying of Grafana logs by automatically detecting log sources and service labels. It provides read-only access to log data with intelligent caching for efficient repeat queries.

Sentrux

★ 2.4k

Real-time architectural sensor that helps AI agents close the feedback loop, enabling recursive self-improvement of code quality. Pure Rust.

OpenInference

★ 986

OpenTelemetry Instrumentation for AI Observability

Browse More Monitoring & Observability MCP Servers

Explore all monitoring & observability servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Monitoring & Observability Browse All Servers

Set Up SRE Agent in Your Editor

Choose your AI client for step-by-step setup instructions.

🖥️

Claude Desktop

macOS & Windows app

⌨️

Claude Code

CLI & terminal

📝

Cursor

AI-first code editor

💻

VS Code

GitHub Copilot MCP

🏄

Windsurf

Codeium AI editor

🔌

Cline

VS Code extension

Quick Config Preview

{
  "mcpServers": {
    "sre-agent": {
      "command": "npx",
      "args": ["-y", "sre-agent"]
    }
  }
}

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use SRE Agent?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides

Explore All Servers Read Our Guides