Multimodal Agents Course
An MCP Multimodal AI Agent with eyes and ears!
What is Multimodal Agents Course?
Multimodal Agents Course is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcp multimodal ai agent with eyes and ears!
An MCP Multimodal AI Agent with eyes and ears!
This server falls under the Coding Agents category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- An MCP Multimodal AI Agent with eyes and ears!
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx multimodal-agents-courseConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Multimodal Agents Course
Multimodal Agents Course (Kubrick) is an educational MCP-powered project that teaches developers to build AI agents with eyes and ears — combining vision language models, speech processing, embeddings, and tool-calling into a complete multimodal pipeline. The architecture consists of a FastMCP-based MCP server, a FastAPI agent backend, and a React UI, all orchestrated via Docker Compose, with integrations for Groq (fast LLM inference), Pixeltable (multimodal data processing), Opik (LLM observability), and OpenAI-compatible APIs. Students and developers use it as a hands-on course project to learn how to build production-style multimodal AI agents while also getting a working reference implementation.
Prerequisites
- Python 3.11+ and the uv package manager (docs.astral.sh/uv)
- Docker and Docker Compose for running the three-service stack
- OpenAI API key or compatible endpoint for model access
- Groq API key (free tier at console.groq.com) for fast LLM inference
- Opik account for LLM observability (optional but recommended for course labs)
Clone the repository
Download the Kubrick multimodal agents course repository.
git clone https://github.com/the-ai-merge/multimodal-agents-course.git
cd multimodal-agents-courseInstall uv package manager
The project uses uv instead of pip or poetry for dependency management.
curl -LsSf https://astral.sh/uv/install.sh | shConfigure the MCP server
Navigate to the kubrick-mcp directory and follow its README to set up environment variables including your API keys.
cd kubrick-mcp
# Copy and edit the env file:
cp .env.example .env
# Set OPENAI_API_KEY, GROQ_API_KEY, and OPIK_API_KEY in .envConfigure the agent API
Navigate to the kubrick-api directory and configure it similarly with the required API credentials.
cd ../kubrick-api
cp .env.example .env
# Set API keys in .envLaunch the full stack
Return to the repository root and start all three services (MCP server, agent API, and React UI) with a single Make command.
cd ..
make start-kubrickAccess the UI and start the course
The three services run at: MCP Server at http://localhost:9090/, Agent API at http://localhost:8080/, and the React UI at http://localhost:3000/. Open the UI to interact with the multimodal agent and follow the course modules.
Multimodal Agents Course Examples
Client configuration
Claude Desktop JSON configuration for connecting to the Kubrick MCP server running locally.
{
"mcpServers": {
"multimodal-agents-course": {
"command": "npx",
"args": ["multimodal-agents-course"]
}
}
}Prompts to try
Example prompts that exercise the multimodal capabilities of the Kubrick agent.
- "Describe what is in this image" (attach an image file)
- "Transcribe this audio recording and summarize the key points" (attach audio)
- "Search for recent papers about multimodal AI agents and summarize the top 3"
- "Embed this document and find the most semantically similar passage to my query"
- "Analyze the sentiment of these customer reviews using the LLM pipeline"Troubleshooting Multimodal Agents Course
Docker Compose fails to start with port conflicts
Check if ports 8080, 9090, or 3000 are already in use with 'lsof -i :8080' (macOS/Linux) or 'netstat -ano | findstr :8080' (Windows). Stop conflicting services or edit the docker-compose.yml to use different ports.
Groq API calls fail with authentication errors
Ensure GROQ_API_KEY is set in both the kubrick-mcp and kubrick-api .env files. Get a free key from console.groq.com. The key should start with 'gsk_'.
uv sync fails or dependencies cannot be resolved
Make sure you have Python 3.11+ installed and that uv can find it. Run 'uv python install 3.11' to install the correct version via uv itself, then retry 'uv sync' in each service directory.
Frequently Asked Questions about Multimodal Agents Course
What is Multimodal Agents Course?
Multimodal Agents Course is a Model Context Protocol (MCP) server that mcp multimodal ai agent with eyes and ears! It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Multimodal Agents Course?
Follow the installation instructions on the Multimodal Agents Course GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with Multimodal Agents Course?
Multimodal Agents Course works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Multimodal Agents Course free to use?
Yes, Multimodal Agents Course is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.
Multimodal Agents Course Alternatives — Similar Coding Agents Servers
Looking for alternatives to Multimodal Agents Course? Here are other popular coding agents servers you can use with Claude, Cursor, and VS Code.
Dify
★ 142.2kProduction-ready platform for agentic workflow development.
Ruflo
★ 54.0k🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integrat
Goose
★ 45.7kan open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
Antigravity Awesome Skills
★ 38.3kInstallable GitHub library of 1,400+ agentic skills for Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity, and more. Includes installer CLI, bundles, workflows, and official/community skill collections.
AgentScope
★ 25.5kBuild and run agents you can see, understand and trust.
Serena
★ 24.5kA coding agent toolkit that provides IDE-like semantic code retrieval and editing tools, enabling LLMs to efficiently navigate and modify codebases using symbol-level operations instead of basic file reading and string replacements.
Browse More Coding Agents MCP Servers
Explore all coding agents servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Multimodal Agents Course in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Multimodal Agents Course?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.