Lemonade
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
What is Lemonade?
Lemonade is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to lemonade helps users discover and run local ai apps by serving optimized llms right from their own gpus and npus. join our discord: https://discord.gg/5xxzkmu8zk
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
This server falls under the Cloud Services category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- Lemonade helps users discover and run local AI apps by servi
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx lemonadeConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Lemonade
Lemonade is a local AI server that runs optimized LLMs, image generation models, and speech models directly on your own GPU or NPU — providing OpenAI and Anthropic-compatible API endpoints at http://localhost:13305 so any tool that calls cloud AI can use your local hardware instead. It supports AMD Ryzen AI NPUs (XDNA2), AMD Radeon GPUs, NVIDIA GPUs (Turing+), Apple Silicon, and x86_64/ARM64 CPUs, with a library of 100+ models spanning chat, coding, image generation (SDXL-Turbo), and speech (Whisper, kokoro). Developers who want free, private, offline AI inference without changing their existing tooling will find Lemonade a drop-in local replacement for cloud API calls.
Prerequisites
- Windows 11, macOS, or a supported Linux distro (Ubuntu 24.04+, Fedora 43+, Arch, Debian Trixie+, or Snap-supported distro)
- A supported GPU or NPU: AMD Radeon, AMD Ryzen AI (XDNA2), NVIDIA Turing or newer, or Apple Silicon
- At least 8 GB RAM (16+ GB recommended for larger models)
- An MCP-compatible client such as Claude Desktop
- Claude Code CLI installed if using the `lemonade launch claude` integration
Download and install Lemonade Server
Visit the Lemonade download page and pick the installer for your platform. On Windows install the .msi package; on macOS install the .pkg; on Linux use the distro-specific package or Snap.
# Visit for platform-specific installers:
https://lemonade-server.ai/install_options.htmlStart the Lemonade daemon
After installation, start the background server daemon. It will listen on port 13305 and expose OpenAI-compatible endpoints.
lemondBrowse and pull a model
Use the lemonade CLI to list available models and download one. Models are stored locally and served from your hardware.
lemonade list
lemonade pull Qwen3.5-35B-A3B-GGUFTest the local API endpoint
Confirm the server is running by sending a chat completion request to the OpenAI-compatible endpoint. Replace the model name with one you have pulled.
curl http://localhost:13305/api/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "Qwen3.5-35B-A3B-GGUF", "messages": [{"role": "user", "content": "Hello"}]}'Launch Claude Code against your local model
Lemonade provides a `launch claude` command that automatically configures Claude Code CLI to use your local Lemonade server instead of Anthropic's cloud.
lemonade launch claude --model Qwen3.5-35B-A3B-GGUFAdd Lemonade as an MCP server in Claude Desktop
Point Claude Desktop at the local Lemonade server by configuring it as an MCP server in claude_desktop_config.json.
{
"mcpServers": {
"lemonade": {
"command": "npx",
"args": ["lemonade"]
}
}
}Lemonade Examples
Client configuration
Claude Desktop MCP configuration for Lemonade. The server bridges MCP tool calls to the local Lemonade inference engine.
{
"mcpServers": {
"lemonade": {
"command": "npx",
"args": ["lemonade"]
}
}
}Prompts to try
Prompts to use with Lemonade's local AI capabilities.
- "List the models I currently have downloaded in Lemonade"
- "Generate an image of a sunset over mountains using SDXL-Turbo"
- "Transcribe this audio file using Whisper-Large-v3-Turbo"
- "Chat with Qwen3.5 about my Python code and suggest optimizations"
- "What hardware backends does Lemonade detect on this machine?"Troubleshooting Lemonade
lemond daemon does not start or port 13305 is already in use
Check if another process is occupying port 13305 with `lsof -i :13305` (macOS/Linux) or `netstat -ano | findstr 13305` (Windows). Kill the conflicting process or configure Lemonade to use a different port via its configuration file.
GPU is not detected and models run slowly on CPU
Run `lemonade backends` to see which hardware backends are active. Install the GPU driver for your hardware (ROCm for AMD, CUDA for NVIDIA, or the Ryzen AI NPU driver package). Reinstall Lemonade after driver installation.
Model pull fails or is very slow
Models are downloaded from the Lemonade model registry. Ensure you have a stable internet connection and sufficient disk space (models range from 2 GB to 70 GB+). Check available space with `df -h` and retry the pull command.
Frequently Asked Questions about Lemonade
What is Lemonade?
Lemonade is a Model Context Protocol (MCP) server that lemonade helps users discover and run local ai apps by serving optimized llms right from their own gpus and npus. join our discord: https://discord.gg/5xxzkmu8zk It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Lemonade?
Follow the installation instructions on the Lemonade GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with Lemonade?
Lemonade works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Lemonade free to use?
Yes, Lemonade is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.
Lemonade Alternatives — Similar Cloud Services Servers
Looking for alternatives to Lemonade? Here are other popular cloud services servers you can use with Claude, Cursor, and VS Code.
Open WebUI
★ 138.2kUser-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Anything LLM
★ 60.4kThe all-in-one AI productivity accelerator. On device and privacy first with no annoying setup or configuration.
LocalAI
★ 46.4kLocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
Nacos
★ 33.0kan easy-to-use dynamic service discovery, configuration and service management platform for building AI cloud native applications.
Xiaozhi ESP32
★ 26.7k本项目为xiaozhi-esp32提供后端服务,帮助您快速搭建ESP32设备控制服务器。Backend service for xiaozhi-esp32, helps you quickly build an ESP32 device control server.
Gateway
★ 11.8kA blazing fast AI Gateway with integrated guardrails. Route to 1,600+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
Browse More Cloud Services MCP Servers
Explore all cloud services servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Lemonade in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Lemonade?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.