Sony Pixel Power calrec Sony

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

14/06/2024

NVIDIA today announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare, finance, manufacturing, retail and every other industry.

High-quality training data plays a critical role in the performance, accuracy and quality of responses from a custom LLM - but robust datasets can be prohibitively expensive and difficult to access.

Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs.

The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs. The models are optimized to work with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization and evaluation. They're also optimized for inference with the open-source NVIDIA TensorRT-LLM library.

Nemotron-4 340B can be downloaded now from the NVIDIA NGC catalog and from Hugging Face, where developers can also use the Train on DGX Cloud service to easily fine-tune open AI models. Developers will soon be able to access the models at ai.nvidia.com, where they'll be packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.

Navigating Nemotron to Generate Synthetic Data LLMs can help developers generate synthetic training data in scenarios where access to large, diverse labeled datasets is limited.

The Nemotron-4 340B Instruct model creates diverse synthetic data that mimics the characteristics of real-world data, helping improve data quality to increase the performance and robustness of custom LLMs across various domains.

Then, to boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses. Nemotron-4 340B Reward grades responses on five attributes: helpfulness, correctness, coherence, complexity and verbosity. It's currently first place on the Hugging Face RewardBench leaderboard, created by AI2, for evaluating the capabilities, safety and pitfalls of reward models.

In this synthetic data generation pipeline, (1) the Nemotron-4 340B Instruct model is first used to produce synthetic text-based output. An evaluator model, (2) Nemotron-4 340B Reward, then assesses this generated text - providing feedback that guides iterative improvements and ensures the synthetic data is accurate, relevant and aligned with specific requirements. Researchers can also create their own instruct or reward models by customizing the Nemotron-4 340B Base model using their proprietary data, combined with the included HelpSteer2 dataset.

Fine-Tuning With NeMo, Optimizing for Inference With TensorRT-LLM Using open-source NVIDIA NeMo and NVIDIA TensorRT-LLM, developers can optimize the efficiency of their instruct and reward models to generate synthetic data and to score responses.

All Nemotron-4 340B models are optimized with TensorRT-LLM to take advantage of tensor parallelism, a type of model parallelism in which individual weight matrices are split across multiple GPUs and servers, enabling efficient inference at scale.

Nemotron-4 340B Base, trained on 9 trillion tokens, can be customized using the NeMo framework to adapt to specific use cases or domains. This fine-tuning process benefits from extensive pretraining data and yields more accurate outputs for specific downstream tasks.

A variety of customization methods are available through the NeMo framework, including supervised fine-tuning and parameter-efficient fine-tuning methods such as low-rank adaptation, or LoRA.

To boost model quality, developers can align their models with NeMo Aligner and datasets annotated by Nemotron-4 340B Reward. Alignment is a key step in training LLMs, where a model's behavior is fine-tuned using algorithms like reinforcement learning from human feedback (RLHF) to ensure its outputs are safe, accurate, contextually appropriate and consistent with its intended goals.

Businesses seeking enterprise-grade support and security for production environments can also access NeMo and TensorRT-LLM through the cloud-native NVIDIA AI Enterprise software platform, which provides accelerated and efficient runtimes for generative AI foundation models.

Evaluating Model Security and Getting Started The Nemotron-4 340B Instruct model underwent extensive safety evaluation, including adversarial tests, and performed well across a wide range of risk indicators. Users should still perform careful evaluation of the model's outputs to ensure the synthetically generated data is suitable, safe and accurate for their use case.

For more information on model security and safety evaluation, read the model card.

Download Nemotron-4 340B models via NVIDIA NGC and Hugging Face. For more details, read the research papers on the model and dataset.

See notice regarding software product information.
LINK: https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm...
See more stories from nvidia

North America Stories

13/01/2026

CES 2026 Attendance Hits 148,000

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

13/01/2026

Altafiber Asks FCC to Reconsider Nexstar Retrans Ruling

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

13/01/2026

Kiloview to Showcase Integrated AV-over-IP Ecosystem at ISE 2026

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

13/01/2026

Spectrum to Give $300,000 in College Scholarships

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

12/01/2026

Sports and Music Unite: A Look Back at the Operationally Ambitious FIFA Club World Cup Final Halftime Show

Sports and Music Unite: A Look Back at the Operationally Ambitious FIFA Club Wor...

12/01/2026

Nielsen Panel Discussion Highlights World Cup Impact for FOX, Telemundo, MetLife, and MLS

Nielsen Panel Discussion Highlights World Cup Impact for FOX, Telemundo, MetLife...

12/01/2026

FIFA President Infantino Highlights How AI Will Impact 2026 FIFA World Cup and Beyond

FIFA President Infantino Highlights How AI Will Impact 2026 FIFA World Cup and B...

12/01/2026

Baltimore Ravens' Jack Dana on Telling a Better Story With Media Management in the Cloud

Baltimore Ravens' Jack Dana on Telling a Better Story With Media Management ...

12/01/2026

FOX Sports' Brad Cheney on Spectrum Debate's Impact on the Sports-Production Industry

FOX Sports' Brad Cheney on Spectrum Debate's Impact on the Sports-Produc...

12/01/2026

Accelerating Mission Readiness in the Indo-Pacific: L3Harris' Alan Clements on Australia's Naval Transformation

Australia and New Zealand VP Alan Clements shares L3Harris' leadership insig...

12/01/2026

Paramount Sues Warner Bros. Discovery, Launches Proxy Fight

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

12/01/2026

Berklee International Folk Festival Celebrates 40 Years

Berklee International Folk Festival Celebrates 40 Years Featuring Grammy-winning alumna Arooj Aftab, the milestone concert and weeklong festival marks four de...

12/01/2026

Global Investment in Media Content to Hit $255B in 2026

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

12/01/2026

Andreas Eriksson to Helm Net Insight as CEO

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

12/01/2026

WWE By the Numbers: Celebrating One Year on Netflix

Back to All News WWE By the Numbers: Celebrating One Year on Netflix From left to right: Rey Mysterio, Intercontinental Champion John Cena and Sheamus celebr...

10/01/2026

Give Me the Backstory: Get to Know Cherien Dabis, the Filmmaker Behind All That's Left of You ( )

By Bailey Pennick One of the most exciting things about the Sundance Film Festi...

10/01/2026

ESPN Reports Near-Record Viewing for Start of NBA Season

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

10/01/2026

Connection and Youth Defined: 'The Ramparts of Ice' Brings a Heartwarming Story of Friendship to Netflix this April

Back to All News Connection and Youth Defined: The Ramparts of Ice Brings a Hea...

09/01/2026

Rock-It Sports' Deron Brown & Laura Rowlands on Launching a New Brand, Supplying Logistical Needs for Events in 2026

Rock-It Sports' Deron Brown & Laura Rowlands on Launching a New Brand, Suppl...

09/01/2026

Warner Bros. Discovery's Chris Brown on the Broadcaster's First French Open, Advancing Remote Workflows at Techwood Facility

Warner Bros. Discovery's Chris Brown on the Broadcaster's First French O...

09/01/2026

NFL Playoffs 2026: CBS Sports Travels to Jacksonville With Packed Arsenal of Production Capabilities

NFL Playoffs 2026: CBS Sports Travels to Jacksonville With Packed Arsenal of Pro...

09/01/2026

NFL Playoffs 2026: NBC Sports Is Set To Roll Out New Scorebar, Insert Graphics This Weekend

NFL Playoffs 2026: NBC Sports Is Set To Roll Out New Scorebar, Insert Graphics T...

09/01/2026

NFL Playoffs 2026: Prime Video Production Team Caps Historic Season With Iconic Bears-Packers Primetime Matchup in Chicago

NFL Playoffs 2026: Prime Video Production Team Caps Historic Season With Iconic ...

09/01/2026

NFL Playoffs 2026: FOX Sports Kicks Off Postseason Slate With Two-Game Wild Card Coverage

NFL Playoffs 2026: FOX Sports Kicks Off Postseason Slate With Two-Game Wild Card...

09/01/2026

NFL Playoffs 2026: ESPN's Run Brings Monday Night Football' Flagship Operation Into January

NFL Playoffs 2026: ESPN's Run Brings Monday Night Football' Flagship Op...

09/01/2026

Carr: FCC Looking for Ways to Empower' Local Broadcasters

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

09/01/2026

Panasonic to Introduce Projection, LED and Workflow Offerings at ISE 2026

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

09/01/2026

SMPTE Announces 2026 Leadership

SMPTE , the home of media professionals, technologists, and engineers, today introduced the board officers and regional governors who will serve terms beginning...

09/01/2026

Globecast Appoints Chris Pulis as Group Chief Technology...

Globecast, the leading provider of broadcast, media and entertainment managed services, today announced the appointment of Chris Pulis as Group Chief Technology...

09/01/2026

Hollywood Professional Association Announces Updates to B...

The Hollywood Professional Association (HPA) today announced several updates to its board of directors. As part of HPA s annual governance cycle, new leadership...

09/01/2026

SDVI APPOINTS SIMON ELDRIDGE AS CHIEF OPERATING OFFICER

SDVI, the leading platform provider for cloud-native media supply chains, today announced that Simon Eldridge has been appointed chief operating officer. In thi...

09/01/2026

Cobalt Digital Returns to ISE with Comprehensive Lineup o...

Cobalt Digital, the leading designer and manufacturer of award-winning ST 2110 and SDI signal processing products, and a founding partner in the openGear initi...

09/01/2026

iWedia Strengthens Leadership in ATSC 3-0 with Market-Pro...

iWedia, a global leader in connected TV software solutions, announces that its market-proven ATSC 3.0 software stack is powering the broadcast functionality of ...

09/01/2026

Amino and Xibo Partner to Deliver Next Generation 4K Digi...

Amino, a global leader in enterprise video and digital signage technology, today announced a strategic partnership with Xibo, a leading global digital signage s...

09/01/2026

FIFA Strikes Content Deal with TikTok for 2026 World Cup

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

09/01/2026

HPA Elects New Officers, Board Members for 2026

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

09/01/2026

NFL's 2025-26 Regular Season Is Second-Most-Watched Ever

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

09/01/2026

FCC Sets Tentative Agenda for Jan. 29 Open Meeting

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

09/01/2026

Carr: FCC Looking for Ways to 'Empower' Local Broadcasters

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

09/01/2026

NVIDIA Unveils Multi-Agent Intelligent Warehouse and Catalog Enrichment AI Blueprints to Power the Retail Pipeline

Every that was easy shopping moment is made possible by teams working to hit s...

08/01/2026

At CFP Semifinals, ESPN Again Flexes Its Operational Muscle With 20+ MegaCast Viewing Options

At CFP Semifinals, ESPN Again Flexes Its Operational Muscle With 20+ MegaCast Vi...

08/01/2026

SVG Students To Watch: Sophie Fowler, University of Oregon

SVG Students To Watch: Sophie Fowler, University of OregonThe Portland product has honed her skills as a producer, director, and TD at Quack VideoBy Brandon Cos...

08/01/2026

Follow the Money, Episode 3: Inside the Sports-Media Biz With Sam McCleery and Ken Aagaard

Follow the Money, Episode 3: Inside the Sports-Media Biz With Sam McCleery and K...

08/01/2026

SVG New Sponsor Spotlight: Qualstar's Jeff Sengpiehl on the Enduring Power and Value of LTO Tape for Video Archiving

SVG New Sponsor Spotlight: Qualstar's Jeff Sengpiehl on the Enduring Power a...

08/01/2026

Legendary February: Production Leaders at NBC Sports Pull Back the Curtain on Olympics, Super Bowl, NBA All-Star Plans

Legendary February: Production Leaders at NBC Sports Pull Back the Curtain on Ol...

08/01/2026

Hollywood Professional Association (HPA) Announces Updates to Board of Directors

The Hollywood Professional Association (HPA) today announced several updates to its Board of Directors. As part of HPA's annual governance cycle, new leader...

08/01/2026

Chyron Releases Virtual Placement 8.0

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

08/01/2026

SMPTE Names Board Officers, Governors for 2026

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

08/01/2026

FCC to Vote on Proposals Expanding Unlicensed Use of 6 GHz Band

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...