Semantic Router
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
What is Semantic Router?
Semantic Router is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to system level intelligent router for mixture-of-models at cloud, data center and edge
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
This server falls under the Cloud Services category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- System Level Intelligent Router for Mixture-of-Models at Clo
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx semantic-routerConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Semantic Router
Semantic Router (from the vLLM project) is a system-level intelligent routing layer for Mixture-of-Models deployments that directs inference requests to the most appropriate model — across cloud, data center, and edge — based on semantic analysis using BERT classification, fine-tuned HuggingFace Transformers, and Rust-backed Candle inference. It provides LLM safety features including jailbreak detection, sensitive data leak prevention, and hallucination identification, while also optimizing token economics by routing low-complexity queries to cheaper models and high-complexity ones to frontier models. Platform engineers and AI infrastructure teams use it to reduce inference costs and improve safety across multi-model production deployments.
Prerequisites
- Go 1.21+ or a compatible runtime for the router binary
- Kubernetes (for production deployment) or Docker for local use
- Access to HuggingFace models or a local Candle-compatible BERT model checkpoint
- At least one LLM backend (vLLM, OpenAI-compatible endpoint, or local model server)
- An MCP-compatible client for interacting with the router's MCP interface
Install the Semantic Router
Use the official installer script to download and install the semantic-router binary. This is the recommended quickstart method for local evaluation.
curl -fsSL https://vllm-semantic-router.com/install.sh | bashConfigure your model backends
Define the available models and their routing rules in the router configuration. Each model entry specifies its endpoint URL, cost tier, capability level, and the semantic categories it should handle.
# Example router config (router.yaml):
models:
- name: gpt-4o
endpoint: https://api.openai.com/v1
tier: frontier
categories: [complex-reasoning, code]
- name: gpt-4o-mini
endpoint: https://api.openai.com/v1
tier: economy
categories: [simple-qa, summarization]Configure BERT classification model
Semantic Router uses a BERT model for intent classification to determine which backend should handle each request. Specify the HuggingFace model ID or a local path to a fine-tuned checkpoint.
# In router.yaml:
classifier:
model: bert-base-uncased
# Or point to a fine-tuned checkpoint:
# model: /path/to/fine-tuned-bert
backend: candle # Uses HuggingFace Candle (Rust) for fast inferenceStart the router
Launch the Semantic Router with your configuration file. It exposes an OpenAI-compatible API endpoint that your applications send requests to, and routes them transparently.
semantic-router serve --config router.yaml --port 8080
# Test it:
curl http://localhost:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "auto", "messages": [{"role": "user", "content": "Hello"}]}'Configure as an MCP server
Expose the Semantic Router's management and routing capabilities through MCP so AI agents can query routing decisions and configure rules dynamically.
{
"mcpServers": {
"semantic-router": {
"command": "semantic-router",
"args": ["mcp-server", "--config", "/path/to/router.yaml"],
"env": {}
}
}
}Semantic Router Examples
Client configuration
MCP client configuration for Claude Desktop to connect to the Semantic Router's MCP interface for managing routing rules.
{
"mcpServers": {
"semantic-router": {
"command": "semantic-router",
"args": ["mcp-server", "--config", "/path/to/router.yaml"],
"env": {}
}
}
}Prompts to try
Example prompts for interacting with Semantic Router management via an MCP-enabled AI assistant.
- "Show me the current routing rules and which models are handling what categories"
- "What percentage of requests in the last hour were routed to the economy tier?"
- "Add a routing rule that sends all code-generation requests to the frontier model"
- "Check if any jailbreak attempts were detected in today's traffic"
- "What is the current token savings rate compared to routing everything to the frontier model?"Troubleshooting Semantic Router
BERT classification model fails to load or reports Candle errors
HuggingFace Candle requires a compatible CPU or GPU. Verify your system supports the required instruction sets (AVX2 on x86_64). If loading a custom fine-tuned checkpoint, ensure it is in the safetensors format supported by Candle. Try switching to a smaller BERT variant like bert-tiny for CPU-only environments.
Router sends all requests to the same model despite routing rules
Check the classifier section of your router.yaml to ensure the BERT model and category definitions match your routing rules. Enable debug logging with '--log-level debug' to see the classification scores for each request and verify categories are being detected correctly.
Kubernetes deployment fails with OOMKilled on the router pod
The BERT classification model can require 1-2GB of memory. Set resource limits in your Kubernetes deployment manifest with at least 2Gi of memory. Consider using a distilled BERT model (distilbert-base-uncased) which uses approximately half the memory with minimal accuracy loss.
Frequently Asked Questions about Semantic Router
What is Semantic Router?
Semantic Router is a Model Context Protocol (MCP) server that system level intelligent router for mixture-of-models at cloud, data center and edge It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Semantic Router?
Follow the installation instructions on the Semantic Router GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with Semantic Router?
Semantic Router works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Semantic Router free to use?
Yes, Semantic Router is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.
Semantic Router Alternatives — Similar Cloud Services Servers
Looking for alternatives to Semantic Router? Here are other popular cloud services servers you can use with Claude, Cursor, and VS Code.
Open WebUI
★ 138.2kUser-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Anything LLM
★ 60.4kThe all-in-one AI productivity accelerator. On device and privacy first with no annoying setup or configuration.
LocalAI
★ 46.4kLocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
Nacos
★ 33.0kan easy-to-use dynamic service discovery, configuration and service management platform for building AI cloud native applications.
Xiaozhi ESP32
★ 26.7k本项目为xiaozhi-esp32提供后端服务,帮助您快速搭建ESP32设备控制服务器。Backend service for xiaozhi-esp32, helps you quickly build an ESP32 device control server.
Gateway
★ 11.8kA blazing fast AI Gateway with integrated guardrails. Route to 1,600+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
Browse More Cloud Services MCP Servers
Explore all cloud services servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Semantic Router in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Semantic Router?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.