Use Case

Best MCP Servers for DevOps (2026)

Discover the best MCP servers for DevOps workflows. Manage Docker containers, orchestrate Kubernetes clusters, plan Terraform infrastructure, monitor with Grafana, and automate CI/CD with GitHub - all from your AI editor.

Why DevOps Engineers Need MCP Servers

DevOps is inherently a multi-tool discipline. On any given day, a DevOps engineer might check container health in Docker, review pod status in Kubernetes, plan infrastructure changes in Terraform, monitor dashboards in Grafana, and investigate alerts in Datadog - all while managing CI/CD pipelines in GitHub. Context switching between these tools is the single biggest productivity drain.

MCP servers eliminate this context switching by connecting your AI assistant directly to your infrastructure tools. Instead of opening five different dashboards, you can ask your AI to check the status of your staging deployment, review the Terraform plan for a new VPC, and investigate why a Kubernetes pod is crash-looping - all in a single conversation.

This guide covers nine essential MCP servers for DevOps, organized by workflow stage: container management, infrastructure as code, cloud providers, CI/CD, and observability. We include detailed workflows for incident response, deployment pipelines, infrastructure audits, and monitoring setup, complete with the actual commands your AI will generate.

Docker MCP Server - Container Management

The Docker MCP server gives your AI direct access to your Docker environment. It can list running containers, inspect logs, start and stop services, and even help you build optimized Dockerfiles.

What It Does for DevOps

  • Lists, inspects, starts, and stops Docker containers
  • Reads container logs for debugging and troubleshooting
  • Analyzes Dockerfiles and suggests optimizations for size and security
  • Manages Docker Compose stacks for multi-service applications

Configuration Example

{
  "mcpServers": {
    "docker": {
      "command": "npx",
      "args": ["-y", "@anthropic/docker-mcp"]
    }
  }
}

Real Usage Prompt

"Show me all running containers, check if the api-gateway container is healthy, and show the last 50 lines of logs from the payment-service container."

Kubernetes MCP Server - Cluster Operations

For teams running production workloads on Kubernetes, the Kubernetes MCP server brings cluster operations into your AI editor. Check pod status, describe deployments, read events, and troubleshoot issues without switching to a terminal or dashboard.

What It Does for DevOps

  • Lists pods, deployments, services, and other resources across namespaces
  • Describes resources with full detail for troubleshooting
  • Reads pod logs and cluster events to diagnose failures
  • Helps write and review Kubernetes manifests

Configuration Example

{
  "mcpServers": {
    "kubernetes": {
      "command": "npx",
      "args": ["-y", "@anthropic/kubernetes-mcp"],
      "env": {
        "KUBECONFIG": "/Users/devops/.kube/config"
      }
    }
  }
}

Real Usage Prompt

"Check the status of all pods in the production namespace. If any pods are in CrashLoopBackOff, show me their logs and recent events."

Terraform MCP Server - Infrastructure as Code

The Terraform MCP server connects your AI to your Terraform configurations and state. It can review plans, suggest resource configurations, check for drift, and help you write secure, efficient infrastructure code.

What It Does for DevOps

  • Reviews Terraform plan output and explains proposed changes
  • Suggests resource configurations based on best practices
  • Detects security issues in infrastructure definitions
  • Helps write Terraform modules with proper variable management

Real Usage Prompt

"Review this Terraform plan for our new VPC setup. Flag any security concerns, check that the CIDR ranges do not overlap with existing VPCs, and suggest cost optimizations."

Cloud Provider MCP Servers - AWS, GCP and Azure

Each major cloud provider has its own MCP server that connects your AI to cloud-specific services and APIs.

AWS MCP Server

The AWS MCP server provides access to AWS services including EC2, S3, Lambda, RDS, and more. It can check instance status, list S3 buckets, review Lambda function configurations, and help debug CloudWatch logs.

GCP MCP Server

The GCP MCP server connects to Google Cloud Platform services. It can manage Compute Engine instances, query BigQuery datasets, check Cloud Run service status, and review IAM policies.

Azure MCP Server

The Azure MCP server provides access to Azure services including Virtual Machines, Azure Functions, Cosmos DB, and Azure DevOps. It helps manage resources across subscriptions and resource groups.

Real Usage Prompt

"Check the health of our production EC2 instances in us-east-1, list any S3 buckets with public access enabled, and show me the last 5 Lambda invocation errors for the payment-processor function."

GitHub MCP Server - CI/CD Pipelines

The GitHub MCP server connects your AI to your repositories, pull requests, issues, and - critically - your CI/CD pipelines via GitHub Actions. It can check workflow runs, review PRs, and help debug failed builds.

What It Does for DevOps

  • Lists and inspects GitHub Actions workflow runs
  • Reviews pull requests with full diff context
  • Checks build and deployment status across branches
  • Helps debug failed CI/CD pipeline steps

Real Usage Prompt

"Check the status of the latest CI/CD runs on the main branch. If any have failed, show me the failing step logs and suggest a fix."

Grafana MCP Server - Monitoring and Dashboards

The Grafana MCP server brings your monitoring dashboards into your AI conversation. Query metrics, check alert status, and analyze trends without navigating complex dashboard UIs.

What It Does for DevOps

  • Queries Prometheus/Grafana metrics by service, time range, and metric name
  • Checks active alerts and their severity
  • Analyzes metric trends to identify anomalies
  • Lists dashboards and panels for quick reference

Real Usage Prompt

"Check if there are any active critical alerts in Grafana. Then show me the CPU and memory usage trends for the api-gateway service over the last 24 hours."

Datadog MCP Server - Observability

For teams using Datadog for observability, the Datadog MCP server provides access to metrics, traces, logs, and monitors directly from your AI editor.

What It Does for DevOps

  • Queries metrics and creates ad-hoc metric queries
  • Lists and inspects monitor status and alert history
  • Searches logs across services and time ranges
  • Analyzes APM traces for latency issues

Real Usage Prompt

"Show me all triggered monitors in Datadog. For any critical monitors, pull the related logs from the last hour and help me identify the root cause."

MCP Server Comparison for DevOps

Server Category Best For Setup Difficulty
Docker Containers Local development Easy
Kubernetes Orchestration Production clusters Medium
Terraform IaC Infrastructure planning Medium
AWS Cloud AWS resources Medium
GCP Cloud GCP resources Medium
Azure Cloud Azure resources Medium
GitHub CI/CD Pipeline management Easy
Grafana Monitoring Metrics and alerts Medium
Datadog Observability Full-stack observability Medium

Incident Response Workflow

Incident response is where MCP servers deliver the most dramatic time savings. When a production alert fires at 2 AM, the last thing you want is to open six different dashboards while your brain is still booting up. With MCP servers, you can investigate the incident through a single AI conversation that queries all your infrastructure tools simultaneously.

Step 1: Assess the Alert

Start by understanding what triggered the alert. Use Grafana MCP or Datadog MCP to check the current alert status and recent metric changes.

"Show me all critical alerts in Grafana that fired in the last 30 minutes. For each alert, show the metric that triggered it, the threshold, and the current value."

Your AI might respond with something like: "There are 2 critical alerts: (1) api-gateway p99 latency at 4,200ms (threshold: 2,000ms), triggered 12 minutes ago. (2) payment-service error rate at 8.3% (threshold: 1%), triggered 8 minutes ago."

Step 2: Check Infrastructure Health

Immediately check the health of the affected services using Kubernetes MCP. Your AI will generate and execute the appropriate kubectl commands behind the scenes.

"Check the status of all pods in the production namespace related to api-gateway and payment-service. Are any pods restarting? What do the recent events show?"

Behind the scenes, your AI runs commands equivalent to:

kubectl get pods -n production -l app=api-gateway
kubectl get pods -n production -l app=payment-service
kubectl get events -n production --sort-by='.lastTimestamp' --field-selector reason=BackOff
kubectl describe pod payment-service-7d8f6b5c4-x9k2m -n production

Step 3: Correlate with Recent Changes

Check GitHub MCP for recent deployments that might have caused the issue.

"Show me the last 5 merged PRs on the main branch of the api-gateway and payment-service repositories. Were any of them deployed in the last 2 hours?"

Step 4: Deep Dive into Logs

Use Datadog MCP to search logs for the root cause.

"Search Datadog logs for ERROR level entries from payment-service in the last 30 minutes. Group by error message and show the count for each. Include a sample stack trace for the most common error."

Step 5: Remediate

Based on the investigation, take action. If a recent deployment caused the issue, use Kubernetes MCP to initiate a rollback.

"Show me the rollout history for the payment-service deployment in the production namespace. What was the previous image version? Generate the kubectl command to rollback to the previous version."

Your AI generates:

kubectl rollout history deployment/payment-service -n production
kubectl rollout undo deployment/payment-service -n production
kubectl rollout status deployment/payment-service -n production

Deployment Pipeline Workflow

A well-structured deployment pipeline catches issues before they reach production. MCP servers let you build an AI-assisted deployment checklist that queries every layer of your stack before, during, and after deployment.

Pre-Deployment Checks

Before deploying, use multiple MCP servers to verify readiness across your entire stack.

"Run our pre-deployment checklist: (1) Check GitHub Actions - are all CI checks green on the release/v2.4.0 branch? (2) Check Kubernetes - do we have enough available resources in the production cluster for a rolling update? (3) Check Grafana - is the current error rate below 0.1% and p99 latency below 500ms? (4) Check Terraform - is there any infrastructure drift in the production workspace?"

Canary Deployment Monitoring

During a canary deployment, MCP servers let you monitor the canary in real time and compare its metrics against the stable version.

"Monitor the canary deployment of payment-service v2.4.0 in the production namespace. Compare the canary pod's error rate and p99 latency against the stable pods over the last 15 minutes. Alert me if the canary's error rate exceeds 2x the stable rate."

Post-Deployment Verification

After deployment, run a comprehensive health check across all systems.

"Run post-deployment verification: (1) All pods in the production namespace are Running and Ready. (2) No new error-level logs in the last 5 minutes. (3) Grafana metrics show error rate and latency are within normal ranges. (4) No triggered alerts. Report any issues."

Infrastructure Audit Workflow

Regular infrastructure audits catch security vulnerabilities, cost inefficiencies, and configuration drift before they become problems. MCP servers make it possible to audit your entire infrastructure stack in a single AI conversation.

Security Audit

Use AWS MCP (or your cloud provider's server) to scan for common security misconfigurations.

"Audit our AWS security posture: (1) List all S3 buckets and flag any with public access. (2) Check EC2 security groups for rules that allow 0.0.0.0/0 on any port other than 80 and 443. (3) List IAM users that have not rotated their access keys in 90 days. (4) Check for RDS instances that are publicly accessible."

Cost Optimization Audit

"Analyze our AWS infrastructure for cost optimization: (1) Find EC2 instances with average CPU utilization below 10% over the last 30 days - these are candidates for rightsizing. (2) List EBS volumes that are not attached to any instance. (3) Find S3 buckets with no lifecycle policy that have more than 100GB of data. (4) Check for idle Elastic Load Balancers with zero connections in the last 7 days."

Kubernetes Audit

Use Kubernetes MCP to audit your cluster configuration.

"Audit the production Kubernetes cluster: (1) Find pods running without resource limits set. (2) List deployments with only 1 replica (no high availability). (3) Check for pods using the 'latest' image tag. (4) Find services of type LoadBalancer that might be exposable. (5) List namespaces with no network policies defined."

Your AI generates the equivalent of:

kubectl get pods -A -o json | jq '.items[] | select(.spec.containers[].resources.limits == null)'
kubectl get deployments -A -o json | jq '.items[] | select(.spec.replicas == 1)'
kubectl get pods -A -o json | jq '.items[] | select(.spec.containers[].image | endswith(":latest"))'
kubectl get networkpolicies -A

Monitoring Setup with MCP

Setting up monitoring for a new service involves defining what to measure, configuring dashboards, and setting alert thresholds. MCP servers help you design monitoring setups by analyzing your existing infrastructure and recommending metrics based on best practices.

Designing a Monitoring Strategy

Start by having your AI inspect the service you want to monitor using Kubernetes MCP and Docker MCP.

"Inspect the new user-auth service deployment in the staging namespace. What container ports are exposed? What health check endpoints are configured? Based on this service's architecture, recommend a monitoring setup following the RED method (Rate, Errors, Duration) and the USE method (Utilization, Saturation, Errors)."

Alert Threshold Recommendations

Use Grafana MCP to analyze existing services and recommend alert thresholds for the new service.

"Query the p99 latency and error rate for our existing authentication services over the last 30 days. Based on the baseline, recommend alert thresholds for the new user-auth service. Use 3x the average as the warning threshold and 5x as the critical threshold."

Complete DevOps Server Stack Configuration

Here is a complete configuration that connects six DevOps MCP servers for a comprehensive infrastructure management setup. This is what a production-ready DevOps MCP configuration looks like:

{
  "mcpServers": {
    "docker": {
      "command": "npx",
      "args": ["-y", "@anthropic/docker-mcp"]
    },
    "kubernetes": {
      "command": "npx",
      "args": ["-y", "@anthropic/kubernetes-mcp"],
      "env": {
        "KUBECONFIG": "/home/devops/.kube/config"
      }
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@anthropic/github-mcp"],
      "env": {
        "GITHUB_TOKEN": "ghp_your-token-here"
      }
    },
    "grafana": {
      "command": "npx",
      "args": ["-y", "@anthropic/grafana-mcp"],
      "env": {
        "GRAFANA_URL": "https://grafana.yourcompany.com",
        "GRAFANA_API_KEY": "your-grafana-api-key"
      }
    },
    "datadog": {
      "command": "npx",
      "args": ["-y", "@anthropic/datadog-mcp"],
      "env": {
        "DD_API_KEY": "your-datadog-api-key",
        "DD_APP_KEY": "your-datadog-app-key"
      }
    },
    "aws": {
      "command": "npx",
      "args": ["-y", "@aws-labs/mcp"],
      "env": {
        "AWS_PROFILE": "production-readonly",
        "AWS_REGION": "us-east-1"
      }
    }
  }
}

Note the use of a read-only AWS profile. For production infrastructure, always use read-only credentials with your MCP servers. This prevents accidental modifications while still giving you full visibility into your infrastructure.

Real Commands AI Generates

One of the most valuable aspects of DevOps MCP servers is that your AI generates real, production-ready commands. Here are examples of what Claude generates when you ask common DevOps questions:

Kubernetes Troubleshooting

When you ask "Why is the payment service returning 503 errors?", your AI generates and executes:

kubectl get pods -n production -l app=payment-service -o wide
kubectl logs payment-service-7d8f6b5c4-x9k2m -n production --tail=100
kubectl describe pod payment-service-7d8f6b5c4-x9k2m -n production
kubectl get events -n production --field-selector involvedObject.name=payment-service-7d8f6b5c4-x9k2m
kubectl get hpa -n production payment-service

Terraform Infrastructure Planning

When you ask "Plan a new Redis cluster for our staging environment", your AI generates Terraform code like:

resource "aws_elasticache_replication_group" "staging_redis" {
  replication_group_id = "staging-redis"
  description          = "Redis cluster for staging environment"
  node_type            = "cache.t3.medium"
  num_cache_clusters   = 2
  port                 = 6379
  subnet_group_name    = aws_elasticache_subnet_group.staging.name
  security_group_ids   = [aws_security_group.redis_staging.id]

  automatic_failover_enabled = true
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
}

Docker Debugging

When you ask "The dbt container keeps crashing, what is wrong?", your AI generates:

docker ps -a --filter name=dbt
docker logs dbt-runner --tail 200
docker inspect dbt-runner --format '{{.State.ExitCode}}'
docker inspect dbt-runner --format '{{.State.OOMKilled}}'
docker stats dbt-runner --no-stream

End-to-End DevOps Workflow

Here is a realistic deployment workflow using multiple MCP servers:

  1. Pre-deploy check: Use GitHub MCP to verify all CI checks pass on the release branch. Confirm no open blockers or failing tests.
  2. Infrastructure review: Use Terraform MCP to review the infrastructure plan for the new release. Verify no unexpected resource changes.
  3. Deploy: Trigger deployment via GitHub Actions and monitor with Kubernetes MCP. Watch the rolling update progress in real time.
  4. Verify: Check Grafana dashboards for latency spikes and error rate changes. Compare metrics against the pre-deployment baseline.
  5. Investigate: If issues arise, use Datadog MCP to search logs and traces for root cause. Correlate with the specific code changes in the deployment.
  6. Rollback: If needed, use Kubernetes MCP to check rollout history and rollback the deployment. Verify the rollback completes successfully.

Getting Started

MCP servers for DevOps work best in code-oriented editors:

  • Cursor - The top choice for DevOps engineers who live in their editor. MCP servers integrate directly with your terminal workflow.
  • VS Code - Excellent with its terminal integration for running infrastructure commands. The MCP extension works alongside your existing DevOps extensions.
  • Claude Desktop - Good for reviewing and planning, less ideal for hands-on operations. Best for managers and team leads who need infrastructure visibility without running commands.

Start with Docker MCP and GitHub MCP - they cover the most common daily tasks with minimal configuration overhead. Add Kubernetes MCP when you manage clusters, and Grafana or Datadog for monitoring visibility. For data pipeline-specific DevOps workflows, see our data engineering guide.

Frequently Asked Questions

Ready to set up MCP for DevOps?

Browse our server directory, read setup guides for your editor, and start building your devops workflow today.

Free & Open SourceSetup GuidesWorks with All Editors