Sony Pixel Power calrec Sony

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

14/06/2024

NVIDIA today announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare, finance, manufacturing, retail and every other industry.

High-quality training data plays a critical role in the performance, accuracy and quality of responses from a custom LLM - but robust datasets can be prohibitively expensive and difficult to access.

Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs.

The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs. The models are optimized to work with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization and evaluation. They're also optimized for inference with the open-source NVIDIA TensorRT-LLM library.

Nemotron-4 340B can be downloaded now from the NVIDIA NGC catalog and from Hugging Face, where developers can also use the Train on DGX Cloud service to easily fine-tune open AI models. Developers will soon be able to access the models at ai.nvidia.com, where they'll be packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.

Navigating Nemotron to Generate Synthetic Data LLMs can help developers generate synthetic training data in scenarios where access to large, diverse labeled datasets is limited.

The Nemotron-4 340B Instruct model creates diverse synthetic data that mimics the characteristics of real-world data, helping improve data quality to increase the performance and robustness of custom LLMs across various domains.

Then, to boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses. Nemotron-4 340B Reward grades responses on five attributes: helpfulness, correctness, coherence, complexity and verbosity. It's currently first place on the Hugging Face RewardBench leaderboard, created by AI2, for evaluating the capabilities, safety and pitfalls of reward models.

In this synthetic data generation pipeline, (1) the Nemotron-4 340B Instruct model is first used to produce synthetic text-based output. An evaluator model, (2) Nemotron-4 340B Reward, then assesses this generated text - providing feedback that guides iterative improvements and ensures the synthetic data is accurate, relevant and aligned with specific requirements. Researchers can also create their own instruct or reward models by customizing the Nemotron-4 340B Base model using their proprietary data, combined with the included HelpSteer2 dataset.

Fine-Tuning With NeMo, Optimizing for Inference With TensorRT-LLM Using open-source NVIDIA NeMo and NVIDIA TensorRT-LLM, developers can optimize the efficiency of their instruct and reward models to generate synthetic data and to score responses.

All Nemotron-4 340B models are optimized with TensorRT-LLM to take advantage of tensor parallelism, a type of model parallelism in which individual weight matrices are split across multiple GPUs and servers, enabling efficient inference at scale.

Nemotron-4 340B Base, trained on 9 trillion tokens, can be customized using the NeMo framework to adapt to specific use cases or domains. This fine-tuning process benefits from extensive pretraining data and yields more accurate outputs for specific downstream tasks.

A variety of customization methods are available through the NeMo framework, including supervised fine-tuning and parameter-efficient fine-tuning methods such as low-rank adaptation, or LoRA.

To boost model quality, developers can align their models with NeMo Aligner and datasets annotated by Nemotron-4 340B Reward. Alignment is a key step in training LLMs, where a model's behavior is fine-tuned using algorithms like reinforcement learning from human feedback (RLHF) to ensure its outputs are safe, accurate, contextually appropriate and consistent with its intended goals.

Businesses seeking enterprise-grade support and security for production environments can also access NeMo and TensorRT-LLM through the cloud-native NVIDIA AI Enterprise software platform, which provides accelerated and efficient runtimes for generative AI foundation models.

Evaluating Model Security and Getting Started The Nemotron-4 340B Instruct model underwent extensive safety evaluation, including adversarial tests, and performed well across a wide range of risk indicators. Users should still perform careful evaluation of the model's outputs to ensure the synthetically generated data is suitable, safe and accurate for their use case.

For more information on model security and safety evaluation, read the model card.

Download Nemotron-4 340B models via NVIDIA NGC and Hugging Face. For more details, read the research papers on the model and dataset.

See notice regarding software product information.
LINK: https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm...
See more stories from nvidia

North America Stories

09/12/2025

2025 Sports Broadcasting Hall of Fame: Pam Oliver, Sideline Icon Who Redefined the Role

2025 Sports Broadcasting Hall of Fame: Pam Oliver, Sideline Icon Who Redefined t...

09/12/2025

SVG Summit 2025 Technology Exhibits Preview, Part 2

SVG Summit 2025 Technology Exhibits Preview, Part 2By Jason Dachman, Editorial Director, U.S. Tuesday, December 9, 2025 - 7:17 am Print This Story | Subscr...

09/12/2025

SVG Summit 2025 Preview: Cloud Production Workshop Spotlights Live and Non-Live Workflows in the Cloud

SVG Summit 2025 Preview: Cloud Production Workshop Spotlights Live and Non-Live ...

09/12/2025

Next-Generation Content Protection: Multi-Technology Security is Integral to Combating New Threats

Next-generation content protection: Multi-technology security is integral to com...

09/12/2025

CBS Sports Provides One-of-a-Kind Production' for UEFA Champions League Crossover Event

CBS Sports Provides One-of-a-Kind Production' for UEFA Champions League Cro...

09/12/2025

Spanish Professional Basketball League Relies on NETGEAR AV, MAM Tech for Seamless Production

Spanish Professional Basketball League Relies on NETGEAR AV, MAM Tech for Seamle...

09/12/2025

SVG Sit-Down: St. Thomas's Mike Gallagher and Casey Eakins on the Tommies' Bold Leap to Division I and How Video Plays a Key Role

SVG Sit-Down: St. Thomas's Mike Gallagher and Casey Eakins on the Tommies...

09/12/2025

Free Registration for the 2025 SVG Summit Closes Today at 5 p.m. ET!

Free Registration for the 2025 SVG Summit Closes Today at 5 p.m. ET!After the deadline, tickets will cost $150 to attend the eventBy SVG Staff Tuesday, Decemb...

09/12/2025

University of St. Thomas Ushers in Division I Era With New Arena and a Broadcast Operation Built for the Big Time

University of St. Thomas Ushers in Division I Era With New Arena and a Broadcast...

09/12/2025

Leading the Charge: L3Harris' Advanced EW Technologies for Superior Battlefield Advantage

For decades, customers have turned to L3Harris capabilities in electronic warfar...

09/12/2025

L3Harris Propulsion Solutions: Securing the High Ground in Space

L3Harris propulsion systems enable agile maneuvering and resilience for U.S. spacecraft, supporting national security and mission success in the evolving space ...

09/12/2025

L3Harris Successfully Completes Critical Design Review of Key Components for Japan's New Geostationary Meteorological Satellite

L3Harris Technologies' imaging and sounding instruments will play a critical...

09/12/2025

James Shears Joins ThinkAnalytics as Senior VP, Advertising

LOS ANGELES James Shears has joined ThinkAnalytics as senior vice president, advertising, tasked with leading global strategy and commercial expansion of the co...

09/12/2025

Kathleen Kirby, Heidi Raphael Join Media Institute Board

VIENNA, Va. The Media Institute, a nonprofit, nonpartisan organization specializing in communications policy and the First Amendment, has named Kathleen Kirby o...

09/12/2025

Hollywood's Ecosystem for Combating Piracy

Throughout its century-plus existence, the motion picture industry has had to fight battles on multiple fronts to protect its content from piracy. From protecti...

09/12/2025

Football Shifts TV Viewing Towards Ad Supported Services

NEW YORK Fueled by football season, ad supported TV viewing in Q3 peaked in September, representing 74.7% of overall TV viewership in that month, per Nielsen...

09/12/2025

Digital Alert Systems Now Offers DC Power Option for DASDEC-III

LYNDONVILLE, N.Y. Digital Alert Systems has introduced the DAS3-DC-PS, a new DC power supply option for its DASDEC-III emergency alert system. Designed to meet ...

09/12/2025

Macky Beheshti Joins Advanced Systems Group as Director o...

Advanced Systems Group, LLC (ASG), a technology and services provider for media creatives and content owners, announced the appointment of Macky Beheshti as Dir...

09/12/2025

Device Solutions Inc and Triveni Digital Secure Patent fo...

Triveni Digital and Device Solutions Inc today announced that the United States Patent and Trademark Office granted them a new patent for their novel ATSC 3.0 r...

09/12/2025

ThinkAnalytics appoints James Shears as SVP of Advertisin...

ThinkAnalytics, the global leader in AI-powered data analytics for TV and streaming, today announced the appointment of James Shears as Senior Vice President of...

09/12/2025

Dot Group Partners with SingleStore to Deliver Real-Time...

European data management specialists appointed as SingleStore champion partner, bringing unified real-time transaction, vector and analytics capabilities to org...

09/12/2025

Broadcast Solutions creates new business to lead on workf...

Broadcast Solutions, the leading systems integration group, has launched a new business, aimed at providing consultancy and design services across the media ind...

09/12/2025

FCC Delays Implementation of Foreign Sponsorship Rules

WASHINGTON The Federal Communications Commission's Media Bureau has once again delayed implementation of sponsorship identification requirements for foreign...

09/12/2025

#GALSNGEAR Announces 2026 Leadership Retreats

WASHINGTON #GALSNGEAR has announced two major leadership retreats in early 2026 that the group said are designed to equip women in media, entertainment, and tec...

09/12/2025

Cox Media Group, Verizon Trade Barbs Over Possible Blackout

WASHINGTON With a Dec. 15 deadline looming for a new retransmission consent and carriage deal between Cox and Verizon, the two parties have started trading barb...

09/12/2025

Samba TV Secures New Funding of up to $60 Million

SAN FRANCISCO Samba TV has announced that it has secured new financing from Horizon Technology Finance Corporation, an affiliate of Monroe Capital, with an init...

09/12/2025

Survey: 75% of Cord-Cutters Ditched a Streaming Subscription in 2025

Many cord-cutters looking to reduce their monthly cable or satellite bills are also cutting back on streaming costs, according to new research from All About Co...

09/12/2025

Shure Takes Center Stage at CMAs, Latin Grammys

LAS VEGAS and NASHVILLE, Tenn. Shure technology played a critical role during the broadcast productions of the 59th Annual CMA Awards at the Bridgestone Arena i...

09/12/2025

Nixer to address AV market demands for speed visibility a...

Unveiling of new CV1, web-browser workflows and Milan roadmap for mission-critical AV environments Nixer Pro Audio will return to Integrated Systems Europe (IS...

09/12/2025

How to Prepare for Daylight Saving Time in WO Automation for Radio

Spring Forward Are you ready to spring forward? The following steps will automate your transition from Standard Time to Daylight Saving Time. Ensure all WO Aut...

09/12/2025

Verified: LinkedIn crosses 100M member milestone

Verified: LinkedIn crosses 100M member milestone From LinkedIn Published on Dec 9, 2025 Categories: Company News, Featured News LinkedIn Corporate Communi...

09/12/2025

Redefining Remote Production Through Cloud-Connected Workflows

Original article published in TV Tech: https://www.tvtechnology.com/equipment/bmg-redefines-remote-production-through-cloud-connected-workflows When I started ...

09/12/2025

December 08, 2025

An easier approach to recreate the powerful nerve-blocking molecule found in shellfish A Scripps Research-led study resolves the challenge of synthesizing saxit...

08/12/2025

SVG Summit 2025 Technology Exhibits Preview, Part 1

SVG Summit 2025 Technology Exhibits Preview, Part 1By SVG Staff Monday, December 8, 2025 - 6:56 am Print This Story | Subscribe Story Highlights The 2025...

08/12/2025

2025 Sports Broadcasting Hall of Fame: Bill Rasmussen, the Entrepreneur Who Dreamed Sports Into a New World

2025 Sports Broadcasting Hall of Fame: Bill Rasmussen, the Entrepreneur Who Drea...

08/12/2025

SVG Summit 2025 Preview: Live Production Innovation Workshop Goes Deep on ST 2110, MXL, AR/XR, and More

SVG Summit 2025 Preview: Live Production Innovation Workshop Goes Deep on ST 211...

08/12/2025

#GALSNGEAR Announces East, West Leadership Retreats for 2026

#GALSNGEAR Announces East, West Leadership Retreats for 2026By Ken Kerschbaumer Monday, December 8, 2025 - 9:10 am Print This Story | Subscribe Story High...

08/12/2025

MediaKind Acquiring Harmonic's Video Business to Create New Streaming-Video Heavyweight

MediaKind Acquiring Harmonic's Video Business to Create New Streaming-Video ...

08/12/2025

Platinum White Paper: Sony HDR/SDR Camera Shading Techniques for Live Production Applications

Platinum White Paper: Sony HDR/SDR Camera Shading Techniques for Live Production...

08/12/2025

SVG Sit-Down: NESN's Kenny Elcock and Harmonic's Jean Macher Talk Next-Gen Distribution for Live Sports

SVG Sit-Down: NESN's Kenny Elcock and Harmonic's Jean Macher Talk Next-G...

08/12/2025

Monsters Funday Football,' ESPN's Latest Live Animation Effort, Advances the Altcast Art Form Again

Monsters Funday Football,' ESPN's Latest Live Animation Effort, Advances...

08/12/2025

SVG Sit-Down: ESPN Director Jeff Nelson on Calling the (Animated) Action on Monsters Funday Football'

SVG Sit-Down: ESPN Director Jeff Nelson on Calling the (Animated) Action on Mon...

08/12/2025

Monsters Funday Football': Louisiana Philharmonic Delivers Again With Orchestral Mashup of MNF' Theme and Monsters, Inc.' Score

Monsters Funday Football': Louisiana Philharmonic Delivers Again With Orches...

08/12/2025

Nielsen Audience Segments Now Available in Amazon DSP & Amazon Marketing Cloud

Nielsen's robust data offering enables marketers to connect with their target audience more effectively and drive better results throughout Amazon's adv...

08/12/2025

Paramount Launches Hostile Bid for Warner Bros. Discovery

LOS ANGELES and NEW YORK Paramount has launched a hostile takeover bid for Warner Bros. Discovery with an all-cash tender offer to acquire all of the outstandin...

08/12/2025

ASG Names Macky Beheshti Director, Enterprise Storage and Systems

EMERYVILLE, Calif. Media and entertainment technology and services provider Advanced Systems Group has named Macky Beheshti as director, enterprise storage and ...

08/12/2025

MediaKind to Acquire Harmonic's Video Business for $145M

DENVER MediaKind, a global provider of cloud-based video streaming technology announced today that it is acquiring the video business of Harmonic Inc. for appro...

08/12/2025

Nielsen Audience Segments Now Available in Amazon Ads Marketplace

NEW YORK Nielsen has announced that its Audience Segments from Nielsen Marketing Cloud (NMC) are now available across the Amazon Ads marketplace, including the ...

08/12/2025

Marshall Electronics Highlights CV355 27X ND3 Camera With...

Marshall Electronics showcases the CV355-27X-ND3 Optical Zoom NDI (NDI HX2, NDI HX3) Camera at ISE 2026 (Booth 4N900). Designed for users seeking high-quality ...

08/12/2025

Cinnafilm Launches Tachyon LIVE and IPx LIVE - Real-Time...

Cinnafilm today announced the immediate availability of IPx LIVE and Tachyon LIVE, delivering broadcast-grade, real-time IP video transcoding and motion-compens...