NeMo Data Designer
๐จ NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.
What is NeMo Data Designer?
NeMo Data Designer is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to ๐จ nemo data designer: generate high-quality synthetic data from scratch or from seed data.
๐จ NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.
This server falls under the Data Science & ML category on MCPgee, the world's largest MCP server directory with 33,000+ servers.
Features
- ๐จ NeMo Data Designer: Generate high-quality synthetic data f
Use Cases
Maintainer
Works with
Installation
Manual Installation
npx datadesignerConfiguration
Configuration Details
claude_desktop_config.json
Performance
Response Metrics
Resource Usage
How to Set Up and Use NeMo Data Designer
NeMo Data Designer is NVIDIA's open-source toolkit for generating high-quality synthetic datasets from scratch or by augmenting existing seed data, using LLMs with dependency-aware field generation, built-in quality validators, and LLM-as-a-judge scoring. It supports multiple AI providers including NVIDIA Build API, OpenAI, and OpenRouter, and integrates directly with Claude Code as a skill for natural-language-driven dataset creation. ML engineers and data scientists use it to build training and evaluation datasets for LLM fine-tuning, benchmarking, and RAG pipeline testing without needing large manual labeling efforts.
Prerequisites
- Python 3.10+ with pip or make available
- An NVIDIA Build API key (nvidia.build account) for NVIDIA-hosted models, OR an OpenAI or OpenRouter API key
- Git for cloning the repository if installing from source
- Optional: Claude Code CLI for using the built-in /data-designer skill
- Optional: Node.js and npx for installing the Claude Code skill via 'npx skills add'
Install NeMo Data Designer
Install via pip for the quickest start, or clone the repository for development and run 'make install'. The pip package name is data-designer.
# Via pip
pip install data-designer
# From source
git clone https://github.com/NVIDIA-NeMo/DataDesigner.git
cd DataDesigner && make installSet your AI provider API keys
Export the API key for your chosen provider. At least one of NVIDIA_API_KEY, OPENAI_API_KEY, or OPENROUTER_API_KEY is required. You can configure multiple providers and select between them at runtime.
export NVIDIA_API_KEY="your-nvidia-api-key"
export OPENAI_API_KEY="your-openai-api-key"
export OPENROUTER_API_KEY="your-openrouter-api-key"Check available providers and models
Use the data-designer CLI to list configured providers and available models, confirming your API keys are recognized correctly.
data-designer config providers
data-designer config models
data-designer config listInstall the Claude Code skill (optional)
Add the Data Designer skill to Claude Code to generate datasets using natural language directly in your coding sessions.
npx skills add NVIDIA-NeMo/DataDesignerConfigure as an MCP server
Add Data Designer to your MCP client configuration to expose data generation capabilities as tools accessible from Claude Desktop or other MCP clients.
{
"mcpServers": {
"datadesigner": {
"command": "npx",
"args": ["datadesigner"],
"env": {
"NVIDIA_API_KEY": "your-nvidia-api-key"
}
}
}
}Generate a synthetic dataset
Use the data-designer Python API or CLI to define your dataset schema and generate records. Use preview mode first to test your configuration before a full-scale run.
# Python API example
from data_designer import DataDesigner
dd = DataDesigner()
dd.add_field("question", description="A factual question about world history")
dd.add_field("answer", description="A concise accurate answer", depends_on=["question"])
result = dd.generate(num_records=100, preview=True)NeMo Data Designer Examples
Client configuration
Claude Desktop configuration for the NeMo Data Designer MCP server with NVIDIA API key.
{
"mcpServers": {
"datadesigner": {
"command": "npx",
"args": ["datadesigner"],
"env": {
"NVIDIA_API_KEY": "nvapi-xxxx",
"OPENAI_API_KEY": "sk-xxxx"
}
}
}
}Prompts to try
Example prompts for generating synthetic datasets through the Data Designer MCP server or Claude Code skill.
- "Generate 500 question-answer pairs about Python programming for a fine-tuning dataset"
- "Create a synthetic dataset of customer support tickets with resolutions, varying by sentiment and category"
- "Augment these 10 seed examples into 200 diverse training samples"
- "Score the quality of this dataset using LLM-as-a-judge and flag low-quality records"Troubleshooting NeMo Data Designer
API key not recognized โ 'No valid provider found' error
Run 'data-designer config providers' to see which providers are detected. Ensure the environment variable is exported (not just set) in the same shell session, or add it to your .env file in the project root.
Generation hangs or times out on large datasets
Use the preview mode (preview=True or --preview flag) to test with a small batch first. For large runs, set DATA_DESIGNER_ASYNC_ENGINE=0 to fall back to the synchronous engine if the async engine is causing issues.
Telemetry errors appear in logs during generation
Set NEMO_TELEMETRY_ENABLED=false in your environment to disable telemetry data collection entirely if it is causing noise or errors in air-gapped environments.
Frequently Asked Questions about NeMo Data Designer
What is NeMo Data Designer?
NeMo Data Designer is a Model Context Protocol (MCP) server that ๐จ nemo data designer: generate high-quality synthetic data from scratch or from seed data. It connects AI assistants to external tools and data sources through a standardized interface.
How do I install NeMo Data Designer?
Follow the installation instructions on the NeMo Data Designer GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.
Which AI clients work with NeMo Data Designer?
NeMo Data Designer works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.
Is NeMo Data Designer free to use?
Yes, NeMo Data Designer is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.
NeMo Data Designer Alternatives โ Similar Data Science & ML Servers
Looking for alternatives to NeMo Data Designer? Here are other popular data science & ml servers you can use with Claude, Cursor, and VS Code.
Ultrarag
โ 5.6kA Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines
RocketRide
โ 3.1k๐ ๐ - MCP server that exposes RocketRide AI pipelines as t
Aix Db
โ 2.1kAix-DB ๅบไบ LangChain/LangGraph ๆกๆถ๏ผ็ปๅ MCP Skills ๅคๆบ่ฝไฝๅไฝๆถๆ๏ผๅฎ็ฐ่ช็ถ่ฏญ่จๅฐๆฐๆฎๆดๅฏ็็ซฏๅฐ็ซฏ่ฝฌๆขใ
PaperBanana
โ 1.7kOpen source implementation and extension of Google Researchโs PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.
MiniMax
โ 1.5kBridges MiniMax AI capabilities to the Model Context Protocol, enabling AI agents to perform image understanding, text-to-image generation, and speech synthesis. It provides a standardized interface for accessing MiniMax's core tools via JSON-RPC.
NpcPy
โ 1.4kThe python library for research and development in NLP, multimodal LLMs, Agents, ML, Knowledge Graphs, and more.
Browse More Data Science & ML MCP Servers
Explore all data science & ml servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.
Set Up NeMo Data Designer in Your Editor
Choose your AI client for step-by-step setup instructions.
Quick Config Preview
Add this to your claude_desktop_config.json or .cursor/mcp.json
Ready to use NeMo Data Designer?
Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.