How do I install Semantic Router MCP Server?

Follow the setup instructions on the Semantic Router GitHub repository, then add the server configuration to your AI client.

What category is Semantic Router MCP Server?

Semantic Router is categorized under Cloud Services. Browse more servers in these categories on MCPgee.

Semantic Router

Name: Semantic Router MCP Server
Author: vllm-project

v1.0.0•Cloud Services•stable

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

ai-gatewaybert-classificationfine-tuninggolanghuggingface-candle

4,209

Stars

Downloads

Weekly

0/5

View on GitHub

What is Semantic Router?

Semantic Router is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to system level intelligent router for mixture-of-models at cloud, data center and edge

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

This server falls under the Cloud Services category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

System Level Intelligent Router for Mixture-of-Models at Clo

Use Cases

Intelligent routing for mixture-of-models inference

System-level router for cloud and edge AI

vllm-project

Maintainer

LicenseApache-2.0

Languagego

Versionv1.0.0

UpdatedMay 21, 2026

Statushealthy

Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

View Source Browse All Servers

Installation

Manual Installation

npx semantic-router

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms

ThroughputMedium

Resource Usage

Memory UsageLow

CPU UsageLow

How to Set Up and Use Semantic Router

Semantic Router (from the vLLM project) is a system-level intelligent routing layer for Mixture-of-Models deployments that directs inference requests to the most appropriate model — across cloud, data center, and edge — based on semantic analysis using BERT classification, fine-tuned HuggingFace Transformers, and Rust-backed Candle inference. It provides LLM safety features including jailbreak detection, sensitive data leak prevention, and hallucination identification, while also optimizing token economics by routing low-complexity queries to cheaper models and high-complexity ones to frontier models. Platform engineers and AI infrastructure teams use it to reduce inference costs and improve safety across multi-model production deployments.

Prerequisites

Go 1.21+ or a compatible runtime for the router binary
Kubernetes (for production deployment) or Docker for local use
Access to HuggingFace models or a local Candle-compatible BERT model checkpoint
At least one LLM backend (vLLM, OpenAI-compatible endpoint, or local model server)
An MCP-compatible client for interacting with the router's MCP interface

Install the Semantic Router

Use the official installer script to download and install the semantic-router binary. This is the recommended quickstart method for local evaluation.

curl -fsSL https://vllm-semantic-router.com/install.sh | bash

Configure your model backends

Define the available models and their routing rules in the router configuration. Each model entry specifies its endpoint URL, cost tier, capability level, and the semantic categories it should handle.

# Example router config (router.yaml):
models:
  - name: gpt-4o
    endpoint: https://api.openai.com/v1
    tier: frontier
    categories: [complex-reasoning, code]
  - name: gpt-4o-mini
    endpoint: https://api.openai.com/v1
    tier: economy
    categories: [simple-qa, summarization]

Configure BERT classification model

Semantic Router uses a BERT model for intent classification to determine which backend should handle each request. Specify the HuggingFace model ID or a local path to a fine-tuned checkpoint.

# In router.yaml:
classifier:
  model: bert-base-uncased
  # Or point to a fine-tuned checkpoint:
  # model: /path/to/fine-tuned-bert
  backend: candle  # Uses HuggingFace Candle (Rust) for fast inference

Start the router

Launch the Semantic Router with your configuration file. It exposes an OpenAI-compatible API endpoint that your applications send requests to, and routes them transparently.

semantic-router serve --config router.yaml --port 8080
# Test it:
curl http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "auto", "messages": [{"role": "user", "content": "Hello"}]}'

Configure as an MCP server

Expose the Semantic Router's management and routing capabilities through MCP so AI agents can query routing decisions and configure rules dynamically.

{
  "mcpServers": {
    "semantic-router": {
      "command": "semantic-router",
      "args": ["mcp-server", "--config", "/path/to/router.yaml"],
      "env": {}
    }
  }
}

Semantic Router Examples

Client configuration

MCP client configuration for Claude Desktop to connect to the Semantic Router's MCP interface for managing routing rules.

{
  "mcpServers": {
    "semantic-router": {
      "command": "semantic-router",
      "args": ["mcp-server", "--config", "/path/to/router.yaml"],
      "env": {}
    }
  }
}

Prompts to try

Example prompts for interacting with Semantic Router management via an MCP-enabled AI assistant.

- "Show me the current routing rules and which models are handling what categories"
- "What percentage of requests in the last hour were routed to the economy tier?"
- "Add a routing rule that sends all code-generation requests to the frontier model"
- "Check if any jailbreak attempts were detected in today's traffic"
- "What is the current token savings rate compared to routing everything to the frontier model?"

Troubleshooting Semantic Router

BERT classification model fails to load or reports Candle errors

HuggingFace Candle requires a compatible CPU or GPU. Verify your system supports the required instruction sets (AVX2 on x86_64). If loading a custom fine-tuned checkpoint, ensure it is in the safetensors format supported by Candle. Try switching to a smaller BERT variant like bert-tiny for CPU-only environments.

Router sends all requests to the same model despite routing rules

Check the classifier section of your router.yaml to ensure the BERT model and category definitions match your routing rules. Enable debug logging with '--log-level debug' to see the classification scores for each request and verify categories are being detected correctly.

Kubernetes deployment fails with OOMKilled on the router pod

The BERT classification model can require 1-2GB of memory. Set resource limits in your Kubernetes deployment manifest with at least 2Gi of memory. Consider using a distilled BERT model (distilbert-base-uncased) which uses approximately half the memory with minimal accuracy loss.

Frequently Asked Questions about Semantic Router

What is Semantic Router?

Semantic Router is a Model Context Protocol (MCP) server that system level intelligent router for mixture-of-models at cloud, data center and edge It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Semantic Router?

Follow the installation instructions on the Semantic Router GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Semantic Router?

Semantic Router works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Semantic Router free to use?

Yes, Semantic Router is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Learn More About MCP Servers

Getting Started with MCP

Set up your first MCP server in minutes

MCP Setup Guide

Configure MCP in Claude, Cursor & VS Code

All MCP Tutorials

18+ hands-on guides for developers

MCP FAQ

40+ answers about Model Context Protocol

Semantic Router Alternatives — Similar Cloud Services Servers

Looking for alternatives to Semantic Router? Here are other popular cloud services servers you can use with Claude, Cursor, and VS Code.

Open WebUI

★ 138.2k

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

Anything LLM

★ 60.4k

The all-in-one AI productivity accelerator. On device and privacy first with no annoying setup or configuration.

LocalAI

★ 46.4k

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

Nacos

★ 33.0k

an easy-to-use dynamic service discovery, configuration and service management platform for building AI cloud native applications.

Xiaozhi ESP32

★ 26.7k

本项目为xiaozhi-esp32提供后端服务，帮助您快速搭建ESP32设备控制服务器。Backend service for xiaozhi-esp32, helps you quickly build an ESP32 device control server.

Gateway

★ 11.8k

A blazing fast AI Gateway with integrated guardrails. Route to 1,600+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.

Browse More Cloud Services MCP Servers

Explore all cloud services servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Cloud Services Browse All Servers

Set Up Semantic Router in Your Editor

Choose your AI client for step-by-step setup instructions.

🖥️

Claude Desktop

macOS & Windows app

⌨️

Claude Code

CLI & terminal

📝

Cursor

AI-first code editor

💻

VS Code

GitHub Copilot MCP

🏄

Windsurf

Codeium AI editor

🔌

Cline

VS Code extension

Quick Config Preview

{
  "mcpServers": {
    "semantic-router": {
      "command": "npx",
      "args": ["-y", "semantic-router"]
    }
  }
}

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Semantic Router?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides

Explore All Servers Read Our Guides