Sony Pixel Power calrec Sony

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

14/06/2024

NVIDIA today announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare, finance, manufacturing, retail and every other industry.

High-quality training data plays a critical role in the performance, accuracy and quality of responses from a custom LLM - but robust datasets can be prohibitively expensive and difficult to access.

Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs.

The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs. The models are optimized to work with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization and evaluation. They're also optimized for inference with the open-source NVIDIA TensorRT-LLM library.

Nemotron-4 340B can be downloaded now from the NVIDIA NGC catalog and from Hugging Face, where developers can also use the Train on DGX Cloud service to easily fine-tune open AI models. Developers will soon be able to access the models at ai.nvidia.com, where they'll be packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.

Navigating Nemotron to Generate Synthetic Data LLMs can help developers generate synthetic training data in scenarios where access to large, diverse labeled datasets is limited.

The Nemotron-4 340B Instruct model creates diverse synthetic data that mimics the characteristics of real-world data, helping improve data quality to increase the performance and robustness of custom LLMs across various domains.

Then, to boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses. Nemotron-4 340B Reward grades responses on five attributes: helpfulness, correctness, coherence, complexity and verbosity. It's currently first place on the Hugging Face RewardBench leaderboard, created by AI2, for evaluating the capabilities, safety and pitfalls of reward models.

In this synthetic data generation pipeline, (1) the Nemotron-4 340B Instruct model is first used to produce synthetic text-based output. An evaluator model, (2) Nemotron-4 340B Reward, then assesses this generated text - providing feedback that guides iterative improvements and ensures the synthetic data is accurate, relevant and aligned with specific requirements. Researchers can also create their own instruct or reward models by customizing the Nemotron-4 340B Base model using their proprietary data, combined with the included HelpSteer2 dataset.

Fine-Tuning With NeMo, Optimizing for Inference With TensorRT-LLM Using open-source NVIDIA NeMo and NVIDIA TensorRT-LLM, developers can optimize the efficiency of their instruct and reward models to generate synthetic data and to score responses.

All Nemotron-4 340B models are optimized with TensorRT-LLM to take advantage of tensor parallelism, a type of model parallelism in which individual weight matrices are split across multiple GPUs and servers, enabling efficient inference at scale.

Nemotron-4 340B Base, trained on 9 trillion tokens, can be customized using the NeMo framework to adapt to specific use cases or domains. This fine-tuning process benefits from extensive pretraining data and yields more accurate outputs for specific downstream tasks.

A variety of customization methods are available through the NeMo framework, including supervised fine-tuning and parameter-efficient fine-tuning methods such as low-rank adaptation, or LoRA.

To boost model quality, developers can align their models with NeMo Aligner and datasets annotated by Nemotron-4 340B Reward. Alignment is a key step in training LLMs, where a model's behavior is fine-tuned using algorithms like reinforcement learning from human feedback (RLHF) to ensure its outputs are safe, accurate, contextually appropriate and consistent with its intended goals.

Businesses seeking enterprise-grade support and security for production environments can also access NeMo and TensorRT-LLM through the cloud-native NVIDIA AI Enterprise software platform, which provides accelerated and efficient runtimes for generative AI foundation models.

Evaluating Model Security and Getting Started The Nemotron-4 340B Instruct model underwent extensive safety evaluation, including adversarial tests, and performed well across a wide range of risk indicators. Users should still perform careful evaluation of the model's outputs to ensure the synthetically generated data is suitable, safe and accurate for their use case.

For more information on model security and safety evaluation, read the model card.

Download Nemotron-4 340B models via NVIDIA NGC and Hugging Face. For more details, read the research papers on the model and dataset.

See notice regarding software product information.
LINK: https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm...
See more stories from nvidia

North America Stories

12/07/2025

TV Station Groups Launch Texas Flood Relief Efforts

As the death toll continues to mount, with at least 120 killed and more than 170 people still missing on July 10 from devastating Texas floods, a number of broa...

12/07/2025

DirecTV Adds ViX Premium With Ads to MiEspaol Genre Pack

EL SEGUNDO, Calif., and MIAMI -DirecTV and TelevisaUnivision have signed a deal that will make the ad-supported premium subscription tier of ViX, ViX Premium wi...

11/07/2025

2025 Sundance Institute Producers Lab Fellows Announced

PARK CITY, UTAH, July 11, 2025 - The nonprofit Sundance Institute announced today the 11 producers chosen for its annual Producers Labs, returning to Ucross Fou...

11/07/2025

L3Harris Delivers First P-8A Poseidon Aircraft to US Navy

L3Harris Technologies President of Intelligence, Surveillance and Reconnaissance Jason Lambert and General Manager of L3Harris Waco facility Sean Ling held a ce...

11/07/2025

WETA Launches WETA+ Free Streaming Service

ARLINGTON, Va. WETA, the flagship public media station in the national capital area, has launched WETA+, a new streaming service tailored for the local Washingt...

11/07/2025

TV Tech's Top Regulatory Stories of 2025

The Federal Communications Commission has emerged as one of the central players in the broadcast TV landscape in 2025, with its deregulatory policies sparking h...

11/07/2025

Calrec to Feature Suite of Interconnected Audio Solutions at IBC2025

Calrec will introduce usability, customization and system enhancements across its entire range of Argo consoles during IBC2025, Sept. 12-15, at the RAI Amsterda...

11/07/2025

Encompass Supports DAZN's Coverage of 2025 FIFA Club World Cup

LONDON Encompass Digital Media said it will support live and on-demand viewing of the 2025 FIFA Club World Cup across multiple global regions for sports enterta...

11/07/2025

SBE Survey: Certified Broadcast Engineers Earn More

Two-thirds of broadcast engineers reaped the benefits of a pay raise within the last year....

11/07/2025

SmallHD Unveils Quantum 27 OLED Monitor

CARY, N.C. SmallHD has launched the Quantum 27, a new 26.5-inch Quantum-Dot OLED monitor designed to deliver postproduction image quality in a compact, set-frie...

11/07/2025

Tegna Will Pay $225K to Settle FCC Investigation

The Federal Communications Commission's Enforcement Bureau and Tegna have entered into a consent decree that will settle an investigation into the accidenta...

11/07/2025

Sens. Markey, Lujn Again Call for FCC Vote on Paramount-Skydance Merger

WASHINGTON Following news in early July that Paramount had settled President Donald Trump's lawsuit, Sens. Edward J. Markey (D-Mass.) and Ben Ray Luj n (D-N...

11/07/2025

Model/Actriz Performs Lead Single Cinderella on The Late Show with Stephen Colbert

Model/Actriz Performs Lead Single Cinderella on The Late Show with Stephen Colbe...

11/07/2025

Behind the Mic: Amazon Prime Preps for First Season of NBA Action; MSG Networks Adjusts Broadcast Booths for Rangers, Devils

Behind the Mic: Amazon Prime Preps for First Season of NBA Action; MSG Networks ...

11/07/2025

SVG New Sponsor Spotlight: Suite Studios' Craig Hering on Adapting to Clients' Needs With Scalable Cloud-Based Storage

SVG New Sponsor Spotlight: Suite Studios' Craig Hering on Adapting to Client...

11/07/2025

2025 SVG Content Management Forum Breaks Down AI's Impact, Continued Transition to the Cloud

2025 SVG Content Management Forum Breaks Down AI's Impact, Continued Transit...

11/07/2025

A Journey HOME: University of Nebraska's HuskerVision Goes IP

A Journey HOME: University of Nebraska's HuskerVision Goes IP Leaders from the HuskerVision and Lawo share their IP learnings By SVG Staff Friday, July 1...

11/07/2025

CMSI, Remote Picture Labs, Ace ESPN's Cloud-Based Editing Efforts for Wimbledon

CMSI, Remote Picture Labs, Ace ESPN's Cloud-Based Editing Efforts for Wimble...

11/07/2025

Netflix Enters the Live-Boxing-Production Ring for Round 2 With Historic Taylor-Serrano 3 Card at MSG

Netflix Enters the Live-Boxing-Production Ring for Round 2 With Historic Taylor-...

11/07/2025

'Too Hot to Handle: Italy' Is Coming on July 18 Only on Netflix

Back to All News Too Hot to Handle: Italy Is Coming on July 18 Only on Netflix Entertainment 11 July 2025 GlobalItaly Link copied to clipboard July 11, 20...

11/07/2025

Netflix Will Release 'Death Inc.' Seasons 1, 2 and 3

Back to All News Netflix Will Release Death Inc. Seasons 1, 2 and 3 Entertainment 11 July 2025 GlobalSpain Link copied to clipboard Season 1 Season 2 Se...

11/07/2025

A Gaming GPU Helps Crack the Code on a Thousand-Year Cultural Conversation

Ceramics - the humble mix of earth, fire and artistry - have been part of a global conversation for millennia. From Tang Dynasty trade routes to Renaissance pa...

10/07/2025

Nielsen Appoints Richard Pacheco as Head of Global Partnerships

NEW YORK - July 10, 2025 - Nielsen, the global leader in audience measurement, data and analytics, today announced that it appointed Richard Pacheco as head of ...

10/07/2025

Sponsored: Robotic Deployments Are Transforming Local News

Local newscasts don't exist in a vacuum. News directors and station management constantly evaluate what's working, what isn't and perhaps most impor...

10/07/2025

Stuttgart Media University Upgrades Studio with Lawo mc56

Lawo has announced that Stuttgart Media University (Hochschule der Medien, HdM) has comprehensively modernized its central recording studio after selecting an I...

10/07/2025

SMPTE Opens Early Bird Registration for Media Technology Summit

The Society of Motion Picture and Television Engineers (SMPTE) has opened early-bird registration for the Media Technology Summit, which will take place in a ne...

10/07/2025

TNDV Television Launches Aspiration 35 to Support Cinematic Workflows

NASHVILLE, Tenn. TNDV Television has launched Aspiration 35, a new version of its 40-foot Aspiration truck reimagined for cinematic multicamera productions....

10/07/2025

Key Code Education Launches Beginner, Intermediate Training Courses

BURBANK, Calif. Key Code Education, a provider of instructor-led postproduction training, is growing its curriculum with new programs for beginner and intermedi...

10/07/2025

Actus Digital to Show Actus X Intelligent Monitoring With AI at IBC2025

HACKENSACK, N.J. Actus Digital will demonstrate how broadcasters can transform compliance monitoring from a necessary expense into a strategic revenue driver at...

10/07/2025

Comments on FCC Ownership Rules Due in August

The Federal Register has published a summary of the Federal Communications Commission's Public Notice seeking comments on its ownership rules that lists a d...

10/07/2025

Netflix Presents the Official Trailer for 'Superestar'

Back to All News Netflix Presents the Official Trailer for SuperestarPlay Video Play Video Entertainment 10 July 2025 GlobalSpain Link copied to clipboard...

10/07/2025

From Terabytes to Turnkey: AI-Powered Climate Models Go Mainstream

In the race to understand our planet's changing climate, speed and accuracy are everything. But today's most widely used climate simulators often strugg...

10/07/2025

Indonesia on Track to Achieve Sovereign AI Goals With NVIDIA, Cisco and IOH

As one of the world's largest emerging markets, Indonesia is making strides toward its Golden 2045 Vision - an initiative tapping digital technologies and...

10/07/2025

5G for All? What the DFL's Use of Easy5G, RefCam Could Mean for Events in the Future

5G for all? What the DFL's use of Easy5G and RefCam could mean for events in...

10/07/2025

Save the Date: PGA TOUR Studios Welcomes SVG Remote Production Summit on Oct 14-15

Save the Date: PGA TOUR Studios Welcomes SVG Remote Production Summit on Oct 14-...

10/07/2025

Cloud on the Road: How Remote-Production-Service Providers Are Adapting to a New Era

Cloud on the Road: How Remote-Production-Service Providers Are Adapting to a New...

10/07/2025

Seattle Kraken's Ryan Schaber on the NHL Team Taking Live Game Productions In-House

Seattle Kraken's Ryan Schaber on the NHL Team Taking Live Game Productions I...

10/07/2025

FOX Sports Reboots Small Control Room in Los Angeles as Hub for Vertical-First Production

FOX Sports Reboots Small Control Room in Los Angeles as Hub for Vertical-First P...

10/07/2025

SVG Sit-Down: MSE's Zach Leonsis, ViewLift's Rick Allen Go Deep on Joint Venture Targeting Local-Sports-Media Market

SVG Sit-Down: MSE's Zach Leonsis, ViewLift's Rick Allen Go Deep on Joint...

10/07/2025

Bringing Culture Into Focus on My Brilliant Career': First Nations Voices Reshaping Storytelling on Set

Back to All News Bringing Culture Into Focus on My Brilliant Career': Firs...

10/07/2025

Daktronics and Grass Valley Announce Strategic Partnership to Deliver End-to-End Venue Solutions

Strategic Alliance Combines Daktronics' LED Display and Content Management S...

10/07/2025

Reach the PEAK' on GeForce NOW

Grab a friend and climb toward the clouds - PEAK is now available on GeForce NOW, enabling members to try the hugely popular indie hit on virtually any device. ...

10/07/2025

How to Run Coding Assistants for Free on RTX AI PCs and Workstations

Coding assistants or copilots - AI-powered assistants that can suggest, explain and debug code - are fundamentally changing how software is developed for both e...

09/07/2025

Through Their Lens: What Cinematographer Jomo Fray Saw at the 2025 Directors Lab

By Bailey Pennick There's something arresting about the way Jomo Fray captures the world. The cinematographer, now best known for his unparalleled work on ...

09/07/2025

How Sencore is Upgrading IPTV for the Hospitality Industry

Key Highlights Centralized management interface for full control, monitoring, and diagnostics Scalable, multi-site OTT decryption and distribution Secure int...

09/07/2025

L3Harris Appoints Rob Mitrevski to Lead Enterprise Pursuit of Golden Dome

MELBOURNE, Fla., July 9, 2025 - L3Harris Technologies (NYSE: LHX) has appointed Rob Mitrevski as President, Golden Dome Strategy and Integration, a new role cre...

09/07/2025

TAM Ireland awards programme data harmonisation contract to MetaBroadcast and Nielsen's Gracenote

Collaboration will result in improved data quality and understanding of genre-le...

09/07/2025

NAB Slams NextGen TV Critics for Protecting Their Turf

The National Association of Broadcasters is hitting back at critics who oppose its proposal to phase out the current ATSC 1.0 DTV over-the-air standard and tran...

09/07/2025

Zeam Launches on LG Smart TVs

Zeam Media's hyperlocal streaming platform Zeam has announced a new distribution deal with LG that will bring the streaming service to LG smart TVs and devi...

09/07/2025

TAM Ireland awards programme data harmonisation contract...

MetaBroadcast, the UK's leading metadata management specialist, announced today that it was awarded a three-year contract from TAM Ireland (Television Audie...