SMG LLM Gateway

v1.0.0โ€ขAPIsโ€ขstable

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, W

anthropicanthropic-apichatclaudegemini
Share:
274
Stars
0
Downloads
0
Weekly
0/5

What is SMG LLM Gateway?

SMG LLM Gateway is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to engine-agnostic llm gateway in rust. full openai & anthropic api compatibility across sglang, vllm, trt-llm, openai, gemini & more. industry-first grpc pipeline, kv cache-aware routing, chat history, ...

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, W

This server falls under the APIs category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic

Use Cases

Route requests across multiple LLM providers
Cache tokenization and chat history
Support embeddings and inference
lightseekorg

Maintainer

LicenseApache-2.0
Languagerust
Versionv1.0.0
UpdatedMay 21, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx smg

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use SMG LLM Gateway

SMG (Shepherd Model Gateway) is an engine-agnostic LLM gateway written in Rust that provides full OpenAI and Anthropic API compatibility across multiple backends including SGLang, vLLM, TRT-LLM, OpenAI, and Gemini. It features industry-first gRPC pipelines, eight routing policies (including KV cache-aware routing), pluggable chat history storage, tokenization caching, MCP tool execution support, and WASM plugins for custom extensions. Teams use it to unify LLM access across providers, reduce latency through smart routing, and add observability to AI inference workloads.

Prerequisites

  • Docker installed (for the easiest deployment path), or Rust toolchain for building from source
  • Access to at least one LLM backend (OpenAI API key, a running vLLM instance, SGLang server, or Gemini credentials)
  • Python 3.8+ if using the pip install method
  • An MCP-compatible client to interact with SMG's MCP tool execution endpoint
  • PostgreSQL, Oracle, or Redis (optional) if using persistent chat history storage
1

Pull and run the SMG Docker image

The fastest way to run SMG is via Docker. Pull the latest image and start it with your desired worker URL and routing policy.

docker pull lightseekorg/smg:latest
docker run -p 30000:30000 lightseekorg/smg:latest \
  --worker-urls http://your-llm-backend:8000 \
  --policy round_robin
2

(Alternative) Install via pip

If you prefer a Python install, install SMG via pip and start it from the command line.

pip install smg
smg --worker-urls http://your-llm-backend:8000 --policy cache_aware
3

(Alternative) Build from source with Cargo

For the best performance, build and install the Rust binary directly using Cargo.

cargo install smg
4

Configure routing and backends

SMG supports 8 routing policies. Specify multiple worker URLs for load balancing. Use --enable-mesh and --mesh-peer-urls for multi-node deployments.

smg \
  --worker-urls http://backend1:8000 http://backend2:8000 \
  --policy power_of_two \
  --enable-mesh \
  --mesh-advertise-host gateway1.internal
5

Verify the gateway is running

Send a test request to the OpenAI-compatible chat completions endpoint to confirm SMG is routing correctly.

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'
6

Add SMG as an MCP server

Configure your MCP client to use SMG's MCP tool execution endpoint, pointing the command at the running gateway.

{
  "mcpServers": {
    "smg": {
      "command": "smg",
      "args": ["--worker-urls", "http://localhost:8000", "--policy", "round_robin"],
      "env": {}
    }
  }
}

SMG LLM Gateway Examples

Client configuration

MCP client configuration for using SMG as a gateway to a local LLM backend.

{
  "mcpServers": {
    "smg": {
      "command": "smg",
      "args": [
        "--worker-urls",
        "http://localhost:8000",
        "--policy",
        "cache_aware"
      ],
      "env": {}
    }
  }
}

Prompts to try

Once SMG is running, use these prompts to exercise its routing and gateway capabilities.

- "Route this chat completion request to the least-loaded backend using power_of_two policy"
- "Generate embeddings for this text using the SMG embeddings endpoint"
- "Switch to consistent_hashing routing and explain how it distributes requests"
- "Show the current routing policy and active worker backend health status"

Troubleshooting SMG LLM Gateway

SMG starts but returns 502 or connection refused when calling /v1/chat/completions

Check that the --worker-urls point to a reachable LLM backend. Use `curl http://your-backend:8000/health` to verify the backend is running before starting SMG.

cache_aware routing policy not improving latency

KV cache-aware routing requires backends that report cache statistics. Ensure you are using a compatible inference engine (SGLang or vLLM with cache reporting enabled). Fall back to round_robin for backends that do not support this.

WASM plugins fail to load

WASM plugins must be compiled targeting wasm32-wasi. Verify the plugin file path is correct and accessible. Check SMG logs with RUST_LOG=debug for detailed plugin loading errors.

Frequently Asked Questions about SMG LLM Gateway

What is SMG LLM Gateway?

SMG LLM Gateway is a Model Context Protocol (MCP) server that engine-agnostic llm gateway in rust. full openai & anthropic api compatibility across sglang, vllm, trt-llm, openai, gemini & more. industry-first grpc pipeline, kv cache-aware routing, chat history, tokenization caching, responses api, embeddings, w It connects AI assistants to external tools and data sources through a standardized interface.

How do I install SMG LLM Gateway?

Follow the installation instructions on the SMG LLM Gateway GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with SMG LLM Gateway?

SMG LLM Gateway works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is SMG LLM Gateway free to use?

Yes, SMG LLM Gateway is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Browse More APIs MCP Servers

Explore all apis servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "smg": { "command": "npx", "args": ["-y", "smg"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide โ†’

Ready to use SMG LLM Gateway?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides