Multimodal Agents Course

v1.0.0Coding Agentsstable

An MCP Multimodal AI Agent with eyes and ears!

agentembeddingsgroqmcpmcp-client
Share:
558
Stars
0
Downloads
0
Weekly
0/5

What is Multimodal Agents Course?

Multimodal Agents Course is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to mcp multimodal ai agent with eyes and ears!

An MCP Multimodal AI Agent with eyes and ears!

This server falls under the Coding Agents category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • An MCP Multimodal AI Agent with eyes and ears!

Use Cases

Multimodal AI agent with vision
Embeddings and Groq integration
OpenAI multimodal support
the-ai-merge

Maintainer

LicenseApache-2.0
Languagepython
Versionv1.0.0
UpdatedMay 18, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx multimodal-agents-course

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use Multimodal Agents Course

Multimodal Agents Course (Kubrick) is an educational MCP-powered project that teaches developers to build AI agents with eyes and ears — combining vision language models, speech processing, embeddings, and tool-calling into a complete multimodal pipeline. The architecture consists of a FastMCP-based MCP server, a FastAPI agent backend, and a React UI, all orchestrated via Docker Compose, with integrations for Groq (fast LLM inference), Pixeltable (multimodal data processing), Opik (LLM observability), and OpenAI-compatible APIs. Students and developers use it as a hands-on course project to learn how to build production-style multimodal AI agents while also getting a working reference implementation.

Prerequisites

  • Python 3.11+ and the uv package manager (docs.astral.sh/uv)
  • Docker and Docker Compose for running the three-service stack
  • OpenAI API key or compatible endpoint for model access
  • Groq API key (free tier at console.groq.com) for fast LLM inference
  • Opik account for LLM observability (optional but recommended for course labs)
1

Clone the repository

Download the Kubrick multimodal agents course repository.

git clone https://github.com/the-ai-merge/multimodal-agents-course.git
cd multimodal-agents-course
2

Install uv package manager

The project uses uv instead of pip or poetry for dependency management.

curl -LsSf https://astral.sh/uv/install.sh | sh
3

Configure the MCP server

Navigate to the kubrick-mcp directory and follow its README to set up environment variables including your API keys.

cd kubrick-mcp
# Copy and edit the env file:
cp .env.example .env
# Set OPENAI_API_KEY, GROQ_API_KEY, and OPIK_API_KEY in .env
4

Configure the agent API

Navigate to the kubrick-api directory and configure it similarly with the required API credentials.

cd ../kubrick-api
cp .env.example .env
# Set API keys in .env
5

Launch the full stack

Return to the repository root and start all three services (MCP server, agent API, and React UI) with a single Make command.

cd ..
make start-kubrick
6

Access the UI and start the course

The three services run at: MCP Server at http://localhost:9090/, Agent API at http://localhost:8080/, and the React UI at http://localhost:3000/. Open the UI to interact with the multimodal agent and follow the course modules.

Multimodal Agents Course Examples

Client configuration

Claude Desktop JSON configuration for connecting to the Kubrick MCP server running locally.

{
  "mcpServers": {
    "multimodal-agents-course": {
      "command": "npx",
      "args": ["multimodal-agents-course"]
    }
  }
}

Prompts to try

Example prompts that exercise the multimodal capabilities of the Kubrick agent.

- "Describe what is in this image" (attach an image file)
- "Transcribe this audio recording and summarize the key points" (attach audio)
- "Search for recent papers about multimodal AI agents and summarize the top 3"
- "Embed this document and find the most semantically similar passage to my query"
- "Analyze the sentiment of these customer reviews using the LLM pipeline"

Troubleshooting Multimodal Agents Course

Docker Compose fails to start with port conflicts

Check if ports 8080, 9090, or 3000 are already in use with 'lsof -i :8080' (macOS/Linux) or 'netstat -ano | findstr :8080' (Windows). Stop conflicting services or edit the docker-compose.yml to use different ports.

Groq API calls fail with authentication errors

Ensure GROQ_API_KEY is set in both the kubrick-mcp and kubrick-api .env files. Get a free key from console.groq.com. The key should start with 'gsk_'.

uv sync fails or dependencies cannot be resolved

Make sure you have Python 3.11+ installed and that uv can find it. Run 'uv python install 3.11' to install the correct version via uv itself, then retry 'uv sync' in each service directory.

Frequently Asked Questions about Multimodal Agents Course

What is Multimodal Agents Course?

Multimodal Agents Course is a Model Context Protocol (MCP) server that mcp multimodal ai agent with eyes and ears! It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Multimodal Agents Course?

Follow the installation instructions on the Multimodal Agents Course GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Multimodal Agents Course?

Multimodal Agents Course works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Multimodal Agents Course free to use?

Yes, Multimodal Agents Course is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Browse More Coding Agents MCP Servers

Explore all coding agents servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "multimodal-agents-course": { "command": "npx", "args": ["-y", "multimodal-agents-course"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Multimodal Agents Course?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides