Lemonade

v1.0.0Cloud Servicesstable

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

aiamdgenaigpullama
Share:
4,084
Stars
0
Downloads
0
Weekly
0/5

What is Lemonade?

Lemonade is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to lemonade helps users discover and run local ai apps by serving optimized llms right from their own gpus and npus. join our discord: https://discord.gg/5xxzkmu8zk

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

This server falls under the Cloud Services category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • Lemonade helps users discover and run local AI apps by servi

Use Cases

Discover and run local LLMs on GPUs and NPUs
Serve optimized LLMs from personal hardware
lemonade-sdk

Maintainer

LicenseApache-2.0
Languagec++
Versionv1.0.0
UpdatedMay 22, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx lemonade

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use Lemonade

Lemonade is a local AI server that runs optimized LLMs, image generation models, and speech models directly on your own GPU or NPU — providing OpenAI and Anthropic-compatible API endpoints at http://localhost:13305 so any tool that calls cloud AI can use your local hardware instead. It supports AMD Ryzen AI NPUs (XDNA2), AMD Radeon GPUs, NVIDIA GPUs (Turing+), Apple Silicon, and x86_64/ARM64 CPUs, with a library of 100+ models spanning chat, coding, image generation (SDXL-Turbo), and speech (Whisper, kokoro). Developers who want free, private, offline AI inference without changing their existing tooling will find Lemonade a drop-in local replacement for cloud API calls.

Prerequisites

  • Windows 11, macOS, or a supported Linux distro (Ubuntu 24.04+, Fedora 43+, Arch, Debian Trixie+, or Snap-supported distro)
  • A supported GPU or NPU: AMD Radeon, AMD Ryzen AI (XDNA2), NVIDIA Turing or newer, or Apple Silicon
  • At least 8 GB RAM (16+ GB recommended for larger models)
  • An MCP-compatible client such as Claude Desktop
  • Claude Code CLI installed if using the `lemonade launch claude` integration
1

Download and install Lemonade Server

Visit the Lemonade download page and pick the installer for your platform. On Windows install the .msi package; on macOS install the .pkg; on Linux use the distro-specific package or Snap.

# Visit for platform-specific installers:
https://lemonade-server.ai/install_options.html
2

Start the Lemonade daemon

After installation, start the background server daemon. It will listen on port 13305 and expose OpenAI-compatible endpoints.

lemond
3

Browse and pull a model

Use the lemonade CLI to list available models and download one. Models are stored locally and served from your hardware.

lemonade list
lemonade pull Qwen3.5-35B-A3B-GGUF
4

Test the local API endpoint

Confirm the server is running by sending a chat completion request to the OpenAI-compatible endpoint. Replace the model name with one you have pulled.

curl http://localhost:13305/api/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "Qwen3.5-35B-A3B-GGUF", "messages": [{"role": "user", "content": "Hello"}]}'
5

Launch Claude Code against your local model

Lemonade provides a `launch claude` command that automatically configures Claude Code CLI to use your local Lemonade server instead of Anthropic's cloud.

lemonade launch claude --model Qwen3.5-35B-A3B-GGUF
6

Add Lemonade as an MCP server in Claude Desktop

Point Claude Desktop at the local Lemonade server by configuring it as an MCP server in claude_desktop_config.json.

{
  "mcpServers": {
    "lemonade": {
      "command": "npx",
      "args": ["lemonade"]
    }
  }
}

Lemonade Examples

Client configuration

Claude Desktop MCP configuration for Lemonade. The server bridges MCP tool calls to the local Lemonade inference engine.

{
  "mcpServers": {
    "lemonade": {
      "command": "npx",
      "args": ["lemonade"]
    }
  }
}

Prompts to try

Prompts to use with Lemonade's local AI capabilities.

- "List the models I currently have downloaded in Lemonade"
- "Generate an image of a sunset over mountains using SDXL-Turbo"
- "Transcribe this audio file using Whisper-Large-v3-Turbo"
- "Chat with Qwen3.5 about my Python code and suggest optimizations"
- "What hardware backends does Lemonade detect on this machine?"

Troubleshooting Lemonade

lemond daemon does not start or port 13305 is already in use

Check if another process is occupying port 13305 with `lsof -i :13305` (macOS/Linux) or `netstat -ano | findstr 13305` (Windows). Kill the conflicting process or configure Lemonade to use a different port via its configuration file.

GPU is not detected and models run slowly on CPU

Run `lemonade backends` to see which hardware backends are active. Install the GPU driver for your hardware (ROCm for AMD, CUDA for NVIDIA, or the Ryzen AI NPU driver package). Reinstall Lemonade after driver installation.

Model pull fails or is very slow

Models are downloaded from the Lemonade model registry. Ensure you have a stable internet connection and sufficient disk space (models range from 2 GB to 70 GB+). Check available space with `df -h` and retry the pull command.

Frequently Asked Questions about Lemonade

What is Lemonade?

Lemonade is a Model Context Protocol (MCP) server that lemonade helps users discover and run local ai apps by serving optimized llms right from their own gpus and npus. join our discord: https://discord.gg/5xxzkmu8zk It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Lemonade?

Follow the installation instructions on the Lemonade GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Lemonade?

Lemonade works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Lemonade free to use?

Yes, Lemonade is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Browse More Cloud Services MCP Servers

Explore all cloud services servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "lemonade": { "command": "npx", "args": ["-y", "lemonade"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Lemonade?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides