Claude Video Vision
Give Claude the ability to watch and understand videos — Claude Code plugin with frame extraction and multimodal audio analysis
What is Claude Video Vision?
Claude Video Vision is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to give claude the ability to watch and understand videos — claude code plugin with frame extraction and multimodal audio analysis
Give Claude the ability to watch and understand videos — Claude Code plugin with frame extraction and multimodal audio analysis
This server falls under the Developer Tools category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- Give Claude the ability to watch and understand videos — Cla
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx claude-video-visionConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use Claude Video Vision
Claude Video Vision is a Claude Code plugin that gives Claude the ability to watch and understand video files and YouTube URLs. It extracts frames via ffmpeg and processes audio through a choice of backends — the free Gemini API, fully local Whisper (whisper.cpp or openai-whisper), or the OpenAI API. Claude receives the frames as images and the audio as a timestamped transcription, making it a perception layer that enables natural video analysis conversations without any preprocessing by the user.
Prerequisites
- Claude Code (the Anthropic CLI) installed and running
- Node.js 20 or higher (for the MCP server)
- ffmpeg installed on your system (brew install ffmpeg on macOS)
- At least one audio backend: a free Gemini API key (ai.google.dev), local whisper.cpp (brew install whisper-cpp on macOS), or an OpenAI API key
- yt-dlp installed if you intend to analyze YouTube URLs (brew install yt-dlp on macOS)
Install the plugin from the Claude Code marketplace
Inside Claude Code, run the plugin marketplace add command followed by the plugin install command. Run each command separately and wait for confirmation before proceeding.
/plugin marketplace add https://github.com/jordanrendric/claude-video-vision
/plugin install claude-video-visionRun the interactive setup wizard
Inside Claude Code, run the setup wizard command. It walks you through backend selection (Gemini, local Whisper, or OpenAI), configures audio processing, sets frame extraction options, and verifies that ffmpeg is available.
/setup-video-visionSet the required environment variable for your chosen backend
Depending on which backend you selected in the wizard, set the corresponding environment variable. For Gemini (free tier, 1500 requests/day): set GEMINI_API_KEY. For OpenAI: set OPENAI_API_KEY. For local Whisper, no key is needed.
# Gemini backend (free tier)
export GEMINI_API_KEY=your_gemini_api_key
# OpenAI backend
export OPENAI_API_KEY=your_openai_api_keyInstall whisper-cpp if using the local backend
For fully offline audio processing, install whisper.cpp. The plugin will automatically download the appropriate Whisper model to ~/.claude-video-vision/models/ on first use.
# macOS
brew install whisper-cppAnalyze a video file or YouTube URL
Use the /watch-video slash command with an optional question. Claude will extract frames and transcribe audio, then answer your question based on both.
/watch-video path/to/video.mp4 "What is being demonstrated in this video?"
/watch-video https://www.youtube.com/watch?v=... "Summarize the key points"Claude Video Vision Examples
Client configuration
Claude Code plugin configuration stored in ~/.claude-video-vision/config.json. The plugin auto-configures this file during the setup wizard, but you can edit it manually.
{
"backend": "local",
"whisper_engine": "cpp",
"whisper_model": "auto",
"frame_mode": "images",
"frame_format": "jpeg",
"frame_resolution": 512,
"default_fps": "auto",
"max_frames": 100,
"session_max_age_days": 7
}Prompts to try
Example prompts using the slash command and conversational modes. Claude adapts frame extraction parameters automatically based on your question.
- "/watch-video lecture.mp4 'summarize this 1 hour lecture'"
- "/watch-video bug-report.mov 'what error appears on screen at 0:45?'"
- "/watch-video demo.mp4 'what programming language and framework are shown?'"
- "Analyze this YouTube tutorial: https://www.youtube.com/watch?v=... and list all commands typed"
- "Take a look at the first 10 seconds of ~/Downloads/intro.mp4"Troubleshooting Claude Video Vision
ffmpeg not found error when analyzing a video
Install ffmpeg using your system package manager: 'brew install ffmpeg' on macOS, 'apt install ffmpeg' on Ubuntu/Debian. Run '/setup-video-vision' again after installation to verify detection.
YouTube download fails or yt-dlp not found
YouTube URL support requires yt-dlp. Install it with 'brew install yt-dlp' on macOS or 'pip install yt-dlp'. Make sure yt-dlp is in your PATH.
Whisper model download is slow or stalls on first use
On first use, the plugin downloads the selected Whisper model to ~/.claude-video-vision/models/. The 'auto' model selection picks the best model for your available RAM. If the download stalls, set whisper_model to 'tiny' or 'base' in the config for a much smaller download.
Frequently Asked Questions about Claude Video Vision
What is Claude Video Vision?
Claude Video Vision is a Model Context Protocol (MCP) server that give claude the ability to watch and understand videos — claude code plugin with frame extraction and multimodal audio analysis It connects AI assistants to external tools and data sources through a standardized interface.
How do I install Claude Video Vision?
Follow the installation instructions on the Claude Video Vision GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with Claude Video Vision?
Claude Video Vision works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is Claude Video Vision free to use?
Yes, Claude Video Vision is open source and available under the MIT license. You can use it freely in both personal and commercial projects.
Claude Video Vision Alternatives — Similar Developer Tools Servers
Looking for alternatives to Claude Video Vision? Here are other popular developer tools servers you can use with Claude, Cursor, and VS Code.
Ecc
★ 188.2kThe agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
Javaguide
★ 155.8kJava 面试 & 后端通用面试指南,覆盖计算机基础、数据库、分布式、高并发、系统设计与 AI 应用开发
Gemini CLI
★ 104.5kA secure MCP server that wraps the Google Gemini CLI, allowing clients to query Gemini models using local OAuth sessions without requiring an API key. It provides tools for model interaction and diagnostics with built-in protection against command in
Awesome MCP Servers
★ 87.3k⭐ Curated list of Model Context Protocol (MCP) servers - tools that extend Claude Desktop, Cursor, Windsurf, and other MCP clients with custom capabilities.
MCP Servers
★ 86.0kModel Context Protocol Servers
CC Switch
★ 77.5kA cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io
Browse More Developer Tools MCP Servers
Explore all developer tools servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up Claude Video Vision in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use Claude Video Vision?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.