Semantic Router

v1.0.0Cloud Servicesstable

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

ai-gatewaybert-classificationfine-tuninggolanghuggingface-candle
Share:
4,209
Stars
0
Downloads
0
Weekly
0/5

What is Semantic Router?

Semantic Router is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to system level intelligent router for mixture-of-models at cloud, data center and edge

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

This server falls under the Cloud Services category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • System Level Intelligent Router for Mixture-of-Models at Clo

Use Cases

Intelligent routing for mixture-of-models inference
System-level router for cloud and edge AI
vllm-project

Maintainer

LicenseApache-2.0
Languagego
Versionv1.0.0
UpdatedMay 21, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx semantic-router

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use Semantic Router

Semantic Router (from the vLLM project) is a system-level intelligent routing layer for Mixture-of-Models deployments that directs inference requests to the most appropriate model — across cloud, data center, and edge — based on semantic analysis using BERT classification, fine-tuned HuggingFace Transformers, and Rust-backed Candle inference. It provides LLM safety features including jailbreak detection, sensitive data leak prevention, and hallucination identification, while also optimizing token economics by routing low-complexity queries to cheaper models and high-complexity ones to frontier models. Platform engineers and AI infrastructure teams use it to reduce inference costs and improve safety across multi-model production deployments.

Prerequisites

  • Go 1.21+ or a compatible runtime for the router binary
  • Kubernetes (for production deployment) or Docker for local use
  • Access to HuggingFace models or a local Candle-compatible BERT model checkpoint
  • At least one LLM backend (vLLM, OpenAI-compatible endpoint, or local model server)
  • An MCP-compatible client for interacting with the router's MCP interface
1

Install the Semantic Router

Use the official installer script to download and install the semantic-router binary. This is the recommended quickstart method for local evaluation.

curl -fsSL https://vllm-semantic-router.com/install.sh | bash
2

Configure your model backends

Define the available models and their routing rules in the router configuration. Each model entry specifies its endpoint URL, cost tier, capability level, and the semantic categories it should handle.

# Example router config (router.yaml):
models:
  - name: gpt-4o
    endpoint: https://api.openai.com/v1
    tier: frontier
    categories: [complex-reasoning, code]
  - name: gpt-4o-mini
    endpoint: https://api.openai.com/v1
    tier: economy
    categories: [simple-qa, summarization]
3

Configure BERT classification model

Semantic Router uses a BERT model for intent classification to determine which backend should handle each request. Specify the HuggingFace model ID or a local path to a fine-tuned checkpoint.

# In router.yaml:
classifier:
  model: bert-base-uncased
  # Or point to a fine-tuned checkpoint:
  # model: /path/to/fine-tuned-bert
  backend: candle  # Uses HuggingFace Candle (Rust) for fast inference
4

Start the router

Launch the Semantic Router with your configuration file. It exposes an OpenAI-compatible API endpoint that your applications send requests to, and routes them transparently.

semantic-router serve --config router.yaml --port 8080
# Test it:
curl http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "auto", "messages": [{"role": "user", "content": "Hello"}]}'
5

Configure as an MCP server

Expose the Semantic Router's management and routing capabilities through MCP so AI agents can query routing decisions and configure rules dynamically.

{
  "mcpServers": {
    "semantic-router": {
      "command": "semantic-router",
      "args": ["mcp-server", "--config", "/path/to/router.yaml"],
      "env": {}
    }
  }
}

Semantic Router Examples

Client configuration

MCP client configuration for Claude Desktop to connect to the Semantic Router's MCP interface for managing routing rules.

{
  "mcpServers": {
    "semantic-router": {
      "command": "semantic-router",
      "args": ["mcp-server", "--config", "/path/to/router.yaml"],
      "env": {}
    }
  }
}

Prompts to try

Example prompts for interacting with Semantic Router management via an MCP-enabled AI assistant.

- "Show me the current routing rules and which models are handling what categories"
- "What percentage of requests in the last hour were routed to the economy tier?"
- "Add a routing rule that sends all code-generation requests to the frontier model"
- "Check if any jailbreak attempts were detected in today's traffic"
- "What is the current token savings rate compared to routing everything to the frontier model?"

Troubleshooting Semantic Router

BERT classification model fails to load or reports Candle errors

HuggingFace Candle requires a compatible CPU or GPU. Verify your system supports the required instruction sets (AVX2 on x86_64). If loading a custom fine-tuned checkpoint, ensure it is in the safetensors format supported by Candle. Try switching to a smaller BERT variant like bert-tiny for CPU-only environments.

Router sends all requests to the same model despite routing rules

Check the classifier section of your router.yaml to ensure the BERT model and category definitions match your routing rules. Enable debug logging with '--log-level debug' to see the classification scores for each request and verify categories are being detected correctly.

Kubernetes deployment fails with OOMKilled on the router pod

The BERT classification model can require 1-2GB of memory. Set resource limits in your Kubernetes deployment manifest with at least 2Gi of memory. Consider using a distilled BERT model (distilbert-base-uncased) which uses approximately half the memory with minimal accuracy loss.

Frequently Asked Questions about Semantic Router

What is Semantic Router?

Semantic Router is a Model Context Protocol (MCP) server that system level intelligent router for mixture-of-models at cloud, data center and edge It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Semantic Router?

Follow the installation instructions on the Semantic Router GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Semantic Router?

Semantic Router works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Semantic Router free to use?

Yes, Semantic Router is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Browse More Cloud Services MCP Servers

Explore all cloud services servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "semantic-router": { "command": "npx", "args": ["-y", "semantic-router"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Semantic Router?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides