How do I install PySpark MCP Server?

Follow the setup instructions on the PySpark GitHub repository, then add the server configuration to your AI client.

What category is PySpark MCP Server?

PySpark is categorized under Data Science & ML. Browse more servers in these categories on MCPgee.

PySpark

Name: Pyspark MCP Server
Author: SemyonSinchenko

v1.0.0•Data Science & ML•stable

MPC Server for PySpark inpired by the LakeSail

pysparkmcpai-integration

Stars

Downloads

Weekly

0/5

View on GitHub

What is PySpark?

PySpark is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mpc server for pyspark inpired by the lakesail

MPC Server for PySpark inpired by the LakeSail

This server falls under the Data Science & ML category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

MPC Server for PySpark inpired by the LakeSail

Use Cases

Execute PySpark data processing jobs through MCP interface. Analyze big data using distributed computing frameworks.

SemyonSinchenko

Maintainer

LicenseApache-2.0

Languagepython

Versionv1.0.0

UpdatedApr 8, 2026

Statushealthy

Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

View Source Browse All Servers

Installation

Manual Installation

npx pyspark

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms

ThroughputMedium

Resource Usage

Memory UsageLow

CPU UsageLow

How to Set Up and Use PySpark

The PySpark MCP server bridges Apache Spark's distributed computing capabilities with AI assistants via the Model Context Protocol. It provides 14 tools for SQL query analysis, logical plan inspection, catalog and schema discovery, and result size estimation — all without requiring the AI to manage a Spark session directly. Data engineers and analysts can interrogate large datasets and explore warehouse schemas through natural language while the server handles Spark connectivity.

Prerequisites

Python 3.9 or higher with pip
Apache Spark installed and JAVA_HOME configured (Java 8 or 11)
pip install pyspark-mcp to install the server package
An MCP client such as Claude Desktop or Claude Code

Install the pyspark-mcp package

Install the server from PyPI. This pulls in the MCP SDK, PySpark, and all required dependencies.

pip install pyspark-mcp

Start the MCP server

Launch the server in HTTP mode pointing at your Spark master. The default local mode uses all available CPU cores.

pyspark-mcp --master "local[*]" --host 127.0.0.1 --port 8090

Register the server with Claude Code

Add the running HTTP server to your MCP client using the transport HTTP option.

claude mcp add --transport http pyspark-mcp http://127.0.0.1:8090/mcp

Optional: connect to a real Spark cluster

Point the server at an existing Spark master or YARN cluster and tune driver memory via --conf flags.

pyspark-mcp --master "spark://spark-master:7077" --conf spark.driver.memory=4g

Verify connectivity

Ask your AI assistant for the current PySpark version or list available databases to confirm the server is responding.

PySpark Examples

Client configuration

For HTTP transport, use the claude mcp add command. For stdio embedding, use the JSON config below.

{
  "mcpServers": {
    "pyspark": {
      "command": "pyspark-mcp",
      "args": ["--master", "local[*]", "--host", "127.0.0.1", "--port", "8090"]
    }
  }
}

Prompts to try

Prompts that use the catalog inspection and query analysis tools.

- "List all databases available in the current Spark catalog"
- "Show me the schema and column descriptions for the sales.orders table"
- "Analyze this SQL query and tell me its logical plan: SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id"
- "Estimate how many rows this query will return before I run it"
- "What version of PySpark is this server running?"

Troubleshooting PySpark

Server fails to start with Java-related error

Ensure JAVA_HOME is set to a Java 8 or 11 installation. Run 'java -version' to confirm Java is on your PATH.

claude mcp add returns connection refused

Make sure pyspark-mcp is running and listening on the port before registering it. Check with 'curl http://127.0.0.1:8090/mcp'.

Queries against external tables fail with data source errors

Pass additional --packages or --jars flags when starting the server to include the required Spark connector JARs (e.g., for Delta Lake or Iceberg).

Frequently Asked Questions about PySpark

What is PySpark?

PySpark is a Model Context Protocol (MCP) server that mpc server for pyspark inpired by the lakesail It connects AI assistants to external tools and data sources through a standardized interface.

How do I install PySpark?

Follow the installation instructions on the PySpark GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with PySpark?

PySpark works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is PySpark free to use?

Yes, PySpark is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Learn More About MCP Servers

Getting Started with MCP

Set up your first MCP server in minutes

MCP Setup Guide

Configure MCP in Claude, Cursor & VS Code

All MCP Tutorials

18+ hands-on guides for developers

MCP FAQ

40+ answers about Model Context Protocol

PySpark Alternatives — Similar Data Science & ML Servers

Looking for alternatives to PySpark? Here are other popular data science & ml servers you can use with Claude, Cursor, and VS Code.

Ultrarag

★ 5.6k

A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

RocketRide

★ 3.1k

📇 🏠 - MCP server that exposes RocketRide AI pipelines as t

Aix Db

★ 2.1k

Aix-DB 基于 LangChain/LangGraph 框架，结合 MCP Skills 多智能体协作架构，实现自然语言到数据洞察的端到端转换。

NeMo Data Designer

★ 1.9k

🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.

PaperBanana

★ 1.7k

Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.

MiniMax

★ 1.5k

Bridges MiniMax AI capabilities to the Model Context Protocol, enabling AI agents to perform image understanding, text-to-image generation, and speech synthesis. It provides a standardized interface for accessing MiniMax's core tools via JSON-RPC.

Browse More Data Science & ML MCP Servers

Explore all data science & ml servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Data Science & ML Browse All Servers

Set Up PySpark in Your Editor

Choose your AI client for step-by-step setup instructions.

🖥️

Claude Desktop

macOS & Windows app

⌨️

Claude Code

CLI & terminal

📝

Cursor

AI-first code editor

💻

VS Code

GitHub Copilot MCP

🏄

Windsurf

Codeium AI editor

🔌

Cline

VS Code extension

Quick Config Preview

{
  "mcpServers": {
    "pyspark": {
      "command": "npx",
      "args": ["-y", "pyspark"]
    }
  }
}

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use PySpark?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides

Explore All Servers Read Our Guides