Claude Video Vision

v1.0.0Developer Toolsstable

Give Claude the ability to watch and understand videos — Claude Code plugin with frame extraction and multimodal audio analysis

claude-codeclaude-code-pluginffmpeggeminimcp
Share:
664
Stars
0
Downloads
0
Weekly
0/5

What is Claude Video Vision?

Claude Video Vision is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to give claude the ability to watch and understand videos — claude code plugin with frame extraction and multimodal audio analysis

Give Claude the ability to watch and understand videos — Claude Code plugin with frame extraction and multimodal audio analysis

This server falls under the Developer Tools category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • Give Claude the ability to watch and understand videos — Cla

Use Cases

Extract and analyze frames from videos for Claude to understand visual content. Process video files and perform multimodal audio analysis. Integrate video understanding into Claude Code workflows.
jordanrendric

Maintainer

LicenseMIT
Languagetypescript
Versionv1.0.0
UpdatedMay 22, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx claude-video-vision

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use Claude Video Vision

Claude Video Vision is a Claude Code plugin that gives Claude the ability to watch and understand video files and YouTube URLs. It extracts frames via ffmpeg and processes audio through a choice of backends — the free Gemini API, fully local Whisper (whisper.cpp or openai-whisper), or the OpenAI API. Claude receives the frames as images and the audio as a timestamped transcription, making it a perception layer that enables natural video analysis conversations without any preprocessing by the user.

Prerequisites

  • Claude Code (the Anthropic CLI) installed and running
  • Node.js 20 or higher (for the MCP server)
  • ffmpeg installed on your system (brew install ffmpeg on macOS)
  • At least one audio backend: a free Gemini API key (ai.google.dev), local whisper.cpp (brew install whisper-cpp on macOS), or an OpenAI API key
  • yt-dlp installed if you intend to analyze YouTube URLs (brew install yt-dlp on macOS)
1

Install the plugin from the Claude Code marketplace

Inside Claude Code, run the plugin marketplace add command followed by the plugin install command. Run each command separately and wait for confirmation before proceeding.

/plugin marketplace add https://github.com/jordanrendric/claude-video-vision
/plugin install claude-video-vision
2

Run the interactive setup wizard

Inside Claude Code, run the setup wizard command. It walks you through backend selection (Gemini, local Whisper, or OpenAI), configures audio processing, sets frame extraction options, and verifies that ffmpeg is available.

/setup-video-vision
3

Set the required environment variable for your chosen backend

Depending on which backend you selected in the wizard, set the corresponding environment variable. For Gemini (free tier, 1500 requests/day): set GEMINI_API_KEY. For OpenAI: set OPENAI_API_KEY. For local Whisper, no key is needed.

# Gemini backend (free tier)
export GEMINI_API_KEY=your_gemini_api_key

# OpenAI backend
export OPENAI_API_KEY=your_openai_api_key
4

Install whisper-cpp if using the local backend

For fully offline audio processing, install whisper.cpp. The plugin will automatically download the appropriate Whisper model to ~/.claude-video-vision/models/ on first use.

# macOS
brew install whisper-cpp
5

Analyze a video file or YouTube URL

Use the /watch-video slash command with an optional question. Claude will extract frames and transcribe audio, then answer your question based on both.

/watch-video path/to/video.mp4 "What is being demonstrated in this video?"
/watch-video https://www.youtube.com/watch?v=... "Summarize the key points"

Claude Video Vision Examples

Client configuration

Claude Code plugin configuration stored in ~/.claude-video-vision/config.json. The plugin auto-configures this file during the setup wizard, but you can edit it manually.

{
  "backend": "local",
  "whisper_engine": "cpp",
  "whisper_model": "auto",
  "frame_mode": "images",
  "frame_format": "jpeg",
  "frame_resolution": 512,
  "default_fps": "auto",
  "max_frames": 100,
  "session_max_age_days": 7
}

Prompts to try

Example prompts using the slash command and conversational modes. Claude adapts frame extraction parameters automatically based on your question.

- "/watch-video lecture.mp4 'summarize this 1 hour lecture'"
- "/watch-video bug-report.mov 'what error appears on screen at 0:45?'"
- "/watch-video demo.mp4 'what programming language and framework are shown?'"
- "Analyze this YouTube tutorial: https://www.youtube.com/watch?v=... and list all commands typed"
- "Take a look at the first 10 seconds of ~/Downloads/intro.mp4"

Troubleshooting Claude Video Vision

ffmpeg not found error when analyzing a video

Install ffmpeg using your system package manager: 'brew install ffmpeg' on macOS, 'apt install ffmpeg' on Ubuntu/Debian. Run '/setup-video-vision' again after installation to verify detection.

YouTube download fails or yt-dlp not found

YouTube URL support requires yt-dlp. Install it with 'brew install yt-dlp' on macOS or 'pip install yt-dlp'. Make sure yt-dlp is in your PATH.

Whisper model download is slow or stalls on first use

On first use, the plugin downloads the selected Whisper model to ~/.claude-video-vision/models/. The 'auto' model selection picks the best model for your available RAM. If the download stalls, set whisper_model to 'tiny' or 'base' in the config for a much smaller download.

Frequently Asked Questions about Claude Video Vision

What is Claude Video Vision?

Claude Video Vision is a Model Context Protocol (MCP) server that give claude the ability to watch and understand videos — claude code plugin with frame extraction and multimodal audio analysis It connects AI assistants to external tools and data sources through a standardized interface.

How do I install Claude Video Vision?

Follow the installation instructions on the Claude Video Vision GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with Claude Video Vision?

Claude Video Vision works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is Claude Video Vision free to use?

Yes, Claude Video Vision is open source and available under the MIT license. You can use it freely in both personal and commercial projects.

Browse More Developer Tools MCP Servers

Explore all developer tools servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "claude-video-vision": { "command": "npx", "args": ["-y", "claude-video-vision"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide →

Ready to use Claude Video Vision?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides