Luma
Multi-Model Visual Understanding MCP Server, GLM-4.6V, DeepSeek-OCR (free), and Qwen3-VL-Flash. Provide visual processing capabilities for AI coding models that do not support image understanding.多模型视觉理解MCP服务器,GLM-4.6V、DeepSeek-OCR(免费)和Qwen3-VL-Flash
What is Luma?
Luma is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to multi-model visual understanding mcp server, glm-4.6v, deepseek-ocr (free), and qwen3-vl-flash. provide visual processing capabilities for ai coding models that do not support image understanding.多模型视...
Multi-Model Visual Understanding MCP Server, GLM-4.6V, DeepSeek-OCR (free), and Qwen3-VL-Flash. Provide visual processing capabilities for AI coding models that do not support image understanding.多模型视觉理解MCP服务器,GLM-4.6V、DeepSeek-OCR(免费)和Qwen3-VL-Flash
This server falls under the Data Science & ML category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- Multi-Model Visual Understanding MCP Server, GLM-4.6V, DeepS
Use Cases
Maintainer
Works with
Installation
NPM
npx -y luma-mcpManual Installation
npx -y luma-mcpConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Luma
Luma MCP is a multi-model visual understanding server that adds image analysis capabilities to AI coding assistants that lack native vision support. It routes image understanding requests to your choice of visual model — including GLM-4.6V via Zhipu AI, DeepSeek-OCR for free OCR tasks via SiliconFlow, Qwen3-VL-Flash via DashScope, and others — through a single unified `image_understand` tool that accepts local file paths, HTTPS URLs, or base64 data URIs.
Prerequisites
- Node.js 18 or higher installed
- An API key for at least one supported vision provider: Zhipu AI (ZHIPU_API_KEY), SiliconFlow (SILICONFLOW_API_KEY for free DeepSeek-OCR), Alibaba DashScope (DASHSCOPE_API_KEY for Qwen), Volcengine, or Hunyuan
- An MCP-compatible client such as Claude Desktop, Cursor, or Claude Code
Choose your vision model provider
Decide which visual model provider to use. SiliconFlow with DeepSeek-OCR is free and good for text extraction. Zhipu GLM-4.6V and Qwen3-VL-Flash are strong general-purpose vision models.
Obtain an API key from your chosen provider
Sign up and get an API key: Zhipu AI at https://open.bigmodel.cn, SiliconFlow at https://cloud.siliconflow.cn (free tier available), Alibaba DashScope at https://dashscope.aliyuncs.com.
Test the server with npx
Verify the package runs correctly by launching it with your chosen provider's environment variables.
MODEL_PROVIDER=siliconflow SILICONFLOW_API_KEY=your_key npx -y luma-mcpConfigure Claude Desktop or your MCP client
Add the server to your client's config file with the appropriate environment variables for your chosen provider.
Restart your MCP client
Close and reopen your client to load the new MCP server. The image_understand tool will now be available.
Luma Examples
Client configuration
Example using SiliconFlow with DeepSeek-OCR (free). Swap MODEL_PROVIDER and the API key env var for other providers.
{
"mcpServers": {
"luma-mcp": {
"command": "npx",
"args": ["-y", "luma-mcp"],
"env": {
"MODEL_PROVIDER": "siliconflow",
"SILICONFLOW_API_KEY": "your_siliconflow_api_key"
}
}
}
}Prompts to try
Use these prompts in Claude to analyze images via Luma MCP.
- "Use image_understand on ./screenshot.png to describe the UI layout and main components"
- "Extract all text from this image: ./invoice.jpg"
- "Analyze the error in this screenshot and suggest a fix: ./error_screen.png"
- "What accessibility issues do you see in this UI screenshot? https://example.com/design.png"
- "Parse the table data from this image: ./data_table.png"Troubleshooting Luma
image_understand returns an error about unsupported model or invalid API key
Verify that MODEL_PROVIDER matches exactly one of the supported values: zhipu, siliconflow, qwen, volcengine, hunyuan, or custom. Then confirm the corresponding API key variable is set correctly (e.g., SILICONFLOW_API_KEY for siliconflow).
Local file path images fail to load
Use absolute file paths rather than relative ones. The MCP server process may have a different working directory than your project. For example, use /Users/yourname/project/screenshot.png instead of ./screenshot.png.
Response is truncated for images with a lot of text
Increase the MAX_TOKENS environment variable from its default of 8192. For dense documents, set MAX_TOKENS=16384 or higher, depending on what the chosen model supports.
Frequently Asked Questions about Luma
What is Luma?
Luma is a Model Context Protocol (MCP) server that multi-model visual understanding mcp server, glm-4.6v, deepseek-ocr (free), and qwen3-vl-flash. provide visual processing capabilities for ai coding models that do not support image understanding.多模型视觉理解mcp服务器,glm-4.6v、deepseek-ocr(免费)和qwen3-vl-flash It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Luma?
Install via npm with the command: npx -y luma-mcp. Then add the server configuration to your AI client's JSON config file (e.g., claude_desktop_config.json or .cursor/mcp.json).
Which AI clients work with Luma?
Luma works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Luma free to use?
Yes, Luma is open source and available under the MIT license. You can use it freely in both personal and commercial projects.
Luma Alternatives — Similar Data Science & ML Servers
Looking for alternatives to Luma? Here are other popular data science & ml servers you can use with Claude, Cursor, and VS Code.
Ultrarag
★ 5.6kA Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines
RocketRide
★ 3.1k📇 🏠 - MCP server that exposes RocketRide AI pipelines as t
Aix Db
★ 2.1kAix-DB 基于 LangChain/LangGraph 框架,结合 MCP Skills 多智能体协作架构,实现自然语言到数据洞察的端到端转换。
NeMo Data Designer
★ 1.9k🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.
PaperBanana
★ 1.7kOpen source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.
MiniMax
★ 1.5kBridges MiniMax AI capabilities to the Model Context Protocol, enabling AI agents to perform image understanding, text-to-image generation, and speech synthesis. It provides a standardized interface for accessing MiniMax's core tools via JSON-RPC.
Browse More Data Science & ML MCP Servers
Explore all data science & ml servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Luma in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Luma?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.