NeMo Data Designer

v1.0.0โ€ขData Science & MLโ€ขstable

๐ŸŽจ NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.

agentic-aidata-augmentationdata-generationllmmcp
Share:
1,891
Stars
0
Downloads
0
Weekly
0/5

What is NeMo Data Designer?

NeMo Data Designer is a Model Context Protocol (MCP) server that allows AI assistants like Claude, Cursor, and VS Code to ๐ŸŽจ nemo data designer: generate high-quality synthetic data from scratch or from seed data.

๐ŸŽจ NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.

This server falls under the Data Science & ML category on MCPgee, the world's largest MCP server directory with 33,000+ servers.

Features

  • ๐ŸŽจ NeMo Data Designer: Generate high-quality synthetic data f

Use Cases

Generate high-quality synthetic data
Create data from scratch or seed data augmentation
NVIDIA-NeMo

Maintainer

LicenseApache-2.0
Languagepython
Versionv1.0.0
UpdatedMay 22, 2026
Statushealthy
Maintenanceactive

Works with

ClaudeOpenAIwindowsmacoslinux

Installation

Manual Installation

npx datadesigner

Configuration

Configuration Details

Config File

claude_desktop_config.json

Performance

Response Metrics

Response Time< 200ms
ThroughputMedium

Resource Usage

Memory UsageLow
CPU UsageLow

How to Set Up and Use NeMo Data Designer

NeMo Data Designer is NVIDIA's open-source toolkit for generating high-quality synthetic datasets from scratch or by augmenting existing seed data, using LLMs with dependency-aware field generation, built-in quality validators, and LLM-as-a-judge scoring. It supports multiple AI providers including NVIDIA Build API, OpenAI, and OpenRouter, and integrates directly with Claude Code as a skill for natural-language-driven dataset creation. ML engineers and data scientists use it to build training and evaluation datasets for LLM fine-tuning, benchmarking, and RAG pipeline testing without needing large manual labeling efforts.

Prerequisites

  • Python 3.10+ with pip or make available
  • An NVIDIA Build API key (nvidia.build account) for NVIDIA-hosted models, OR an OpenAI or OpenRouter API key
  • Git for cloning the repository if installing from source
  • Optional: Claude Code CLI for using the built-in /data-designer skill
  • Optional: Node.js and npx for installing the Claude Code skill via 'npx skills add'
1

Install NeMo Data Designer

Install via pip for the quickest start, or clone the repository for development and run 'make install'. The pip package name is data-designer.

# Via pip
pip install data-designer

# From source
git clone https://github.com/NVIDIA-NeMo/DataDesigner.git
cd DataDesigner && make install
2

Set your AI provider API keys

Export the API key for your chosen provider. At least one of NVIDIA_API_KEY, OPENAI_API_KEY, or OPENROUTER_API_KEY is required. You can configure multiple providers and select between them at runtime.

export NVIDIA_API_KEY="your-nvidia-api-key"
export OPENAI_API_KEY="your-openai-api-key"
export OPENROUTER_API_KEY="your-openrouter-api-key"
3

Check available providers and models

Use the data-designer CLI to list configured providers and available models, confirming your API keys are recognized correctly.

data-designer config providers
data-designer config models
data-designer config list
4

Install the Claude Code skill (optional)

Add the Data Designer skill to Claude Code to generate datasets using natural language directly in your coding sessions.

npx skills add NVIDIA-NeMo/DataDesigner
5

Configure as an MCP server

Add Data Designer to your MCP client configuration to expose data generation capabilities as tools accessible from Claude Desktop or other MCP clients.

{
  "mcpServers": {
    "datadesigner": {
      "command": "npx",
      "args": ["datadesigner"],
      "env": {
        "NVIDIA_API_KEY": "your-nvidia-api-key"
      }
    }
  }
}
6

Generate a synthetic dataset

Use the data-designer Python API or CLI to define your dataset schema and generate records. Use preview mode first to test your configuration before a full-scale run.

# Python API example
from data_designer import DataDesigner

dd = DataDesigner()
dd.add_field("question", description="A factual question about world history")
dd.add_field("answer", description="A concise accurate answer", depends_on=["question"])
result = dd.generate(num_records=100, preview=True)

NeMo Data Designer Examples

Client configuration

Claude Desktop configuration for the NeMo Data Designer MCP server with NVIDIA API key.

{
  "mcpServers": {
    "datadesigner": {
      "command": "npx",
      "args": ["datadesigner"],
      "env": {
        "NVIDIA_API_KEY": "nvapi-xxxx",
        "OPENAI_API_KEY": "sk-xxxx"
      }
    }
  }
}

Prompts to try

Example prompts for generating synthetic datasets through the Data Designer MCP server or Claude Code skill.

- "Generate 500 question-answer pairs about Python programming for a fine-tuning dataset"
- "Create a synthetic dataset of customer support tickets with resolutions, varying by sentiment and category"
- "Augment these 10 seed examples into 200 diverse training samples"
- "Score the quality of this dataset using LLM-as-a-judge and flag low-quality records"

Troubleshooting NeMo Data Designer

API key not recognized โ€” 'No valid provider found' error

Run 'data-designer config providers' to see which providers are detected. Ensure the environment variable is exported (not just set) in the same shell session, or add it to your .env file in the project root.

Generation hangs or times out on large datasets

Use the preview mode (preview=True or --preview flag) to test with a small batch first. For large runs, set DATA_DESIGNER_ASYNC_ENGINE=0 to fall back to the synchronous engine if the async engine is causing issues.

Telemetry errors appear in logs during generation

Set NEMO_TELEMETRY_ENABLED=false in your environment to disable telemetry data collection entirely if it is causing noise or errors in air-gapped environments.

Frequently Asked Questions about NeMo Data Designer

What is NeMo Data Designer?

NeMo Data Designer is a Model Context Protocol (MCP) server that ๐ŸŽจ nemo data designer: generate high-quality synthetic data from scratch or from seed data. It connects AI assistants to external tools and data sources through a standardized interface.

How do I install NeMo Data Designer?

Follow the installation instructions on the NeMo Data Designer GitHub repository. Clone the repo, install dependencies, and add the server config to your AI client.

Which AI clients work with NeMo Data Designer?

NeMo Data Designer works with all major MCP-compatible AI clients including Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, and Cline.

Is NeMo Data Designer free to use?

Yes, NeMo Data Designer is open source and available under the Apache-2.0 license. You can use it freely in both personal and commercial projects.

Browse More Data Science & ML MCP Servers

Explore all data science & ml servers available in the MCPgee directory. Each server includes setup guides for Claude, Cursor, and VS Code.

Quick Config Preview

{ "mcpServers": { "datadesigner": { "command": "npx", "args": ["-y", "datadesigner"] } } }

Add this to your claude_desktop_config.json or .cursor/mcp.json

Read the full setup guide โ†’

Ready to use NeMo Data Designer?

Browse our complete directory of 33,000+ MCP servers, read setup guides for your editor, and start building with the Model Context Protocol.

33,000+ ServersFree & Open SourceStep-by-Step Guides