Sony Pixel Power calrec Sony

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

14/06/2024

NVIDIA today announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare, finance, manufacturing, retail and every other industry.

High-quality training data plays a critical role in the performance, accuracy and quality of responses from a custom LLM - but robust datasets can be prohibitively expensive and difficult to access.

Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs.

The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs. The models are optimized to work with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization and evaluation. They're also optimized for inference with the open-source NVIDIA TensorRT-LLM library.

Nemotron-4 340B can be downloaded now from the NVIDIA NGC catalog and from Hugging Face, where developers can also use the Train on DGX Cloud service to easily fine-tune open AI models. Developers will soon be able to access the models at ai.nvidia.com, where they'll be packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.

Navigating Nemotron to Generate Synthetic Data LLMs can help developers generate synthetic training data in scenarios where access to large, diverse labeled datasets is limited.

The Nemotron-4 340B Instruct model creates diverse synthetic data that mimics the characteristics of real-world data, helping improve data quality to increase the performance and robustness of custom LLMs across various domains.

Then, to boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses. Nemotron-4 340B Reward grades responses on five attributes: helpfulness, correctness, coherence, complexity and verbosity. It's currently first place on the Hugging Face RewardBench leaderboard, created by AI2, for evaluating the capabilities, safety and pitfalls of reward models.

In this synthetic data generation pipeline, (1) the Nemotron-4 340B Instruct model is first used to produce synthetic text-based output. An evaluator model, (2) Nemotron-4 340B Reward, then assesses this generated text - providing feedback that guides iterative improvements and ensures the synthetic data is accurate, relevant and aligned with specific requirements. Researchers can also create their own instruct or reward models by customizing the Nemotron-4 340B Base model using their proprietary data, combined with the included HelpSteer2 dataset.

Fine-Tuning With NeMo, Optimizing for Inference With TensorRT-LLM Using open-source NVIDIA NeMo and NVIDIA TensorRT-LLM, developers can optimize the efficiency of their instruct and reward models to generate synthetic data and to score responses.

All Nemotron-4 340B models are optimized with TensorRT-LLM to take advantage of tensor parallelism, a type of model parallelism in which individual weight matrices are split across multiple GPUs and servers, enabling efficient inference at scale.

Nemotron-4 340B Base, trained on 9 trillion tokens, can be customized using the NeMo framework to adapt to specific use cases or domains. This fine-tuning process benefits from extensive pretraining data and yields more accurate outputs for specific downstream tasks.

A variety of customization methods are available through the NeMo framework, including supervised fine-tuning and parameter-efficient fine-tuning methods such as low-rank adaptation, or LoRA.

To boost model quality, developers can align their models with NeMo Aligner and datasets annotated by Nemotron-4 340B Reward. Alignment is a key step in training LLMs, where a model's behavior is fine-tuned using algorithms like reinforcement learning from human feedback (RLHF) to ensure its outputs are safe, accurate, contextually appropriate and consistent with its intended goals.

Businesses seeking enterprise-grade support and security for production environments can also access NeMo and TensorRT-LLM through the cloud-native NVIDIA AI Enterprise software platform, which provides accelerated and efficient runtimes for generative AI foundation models.

Evaluating Model Security and Getting Started The Nemotron-4 340B Instruct model underwent extensive safety evaluation, including adversarial tests, and performed well across a wide range of risk indicators. Users should still perform careful evaluation of the model's outputs to ensure the synthetically generated data is suitable, safe and accurate for their use case.

For more information on model security and safety evaluation, read the model card.

Download Nemotron-4 340B models via NVIDIA NGC and Hugging Face. For more details, read the research papers on the model and dataset.

See notice regarding software product information.
LINK: https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm...
See more stories from nvidia

North America Stories

25/04/2026

Mediagenix Sweeps 2026 NAB Awards With Wins for Product of the Year and Best of Show for Scheduling Optimization

Mediagenix Sweeps 2026 NAB Awards With Wins for Product of the Year and Best of ...

25/04/2026

SCHOEPS Microphones Announces Desert Island Boom Set for NAB 2026

SCHOEPS Microphones Announces Desert Island Boom Set for NAB 2026 Brie Clayton April 24, 2026 0 Comments Compact modular set ideal for location sound ...

25/04/2026

Berklee Africana Studies Hosts Gospel Extravaganza 2026

Berklee Africana Studies Hosts Gospel Extravaganza 2026 The Signature Series event will honor three new inductees to the Berklee Gospel Hall of Fame and celeb...

25/04/2026

Student Spotlight: Siva

Student Spotlight: Siva Maja Gierszewska, who performs under the artist name Siva, shares how she found her songwriting confidence at Berklee. April 24, 2026...

25/04/2026

WNBA Offers Free Streaming of all Preseason Games

Share Copy link Facebook X Linkedin Bluesky Email...

24/04/2026

Churchill Downs Inc. Acquires Preakness Stakes for $85 Million

Churchill Downs Inc. (CDI) has announced a definitive agreement to acquire the intellectual property of the Preakness Stakes and Black-Eyed Susan Stakes from 1/...

24/04/2026

Inter Miami CF Opens Miami Freedom Park with 26 LED Displays From Daktronics, DCL

Daktronics has partnered with DCL (Design Communications, Ltd.) to design, manuf...

24/04/2026

NAB 2026: Chyron Launches PRIME Translate for Multi-Language Live Production

Chyron has announced PRIME Translate, a workflow solution that produces live content simultaneously in multiple languages within the PRIME platform. The system ...

24/04/2026

Eutelsat Supports Co-op Cable in Launching Expanded DTH Offering Across the Caribbean

Eutelsat has announced a new partnership with Co-op Cable, introducing an expand...

24/04/2026

Pitch Dublin Deploys Panasonic Laser LCD Projectors for Indoor Golf and Hospitality Venue

Pitch Dublin, an indoor golf simulation and hospitality venue on Dawson Street i...

24/04/2026

G&D and VuWall Appoint Mirko Aubel as EVP Sales EMEA/APAC, Eric Hnique as Chief Revenue Officer

G&D and VuWall have announced two senior leadership appointments, effective Apri...

24/04/2026

Victory+ Becomes Exclusive Local Streaming Home for Minnesota Lynx

Victory , the free sports streaming service from A Parent Media Co. Inc. (APMC), has announced a multi-year agreement to become the exclusive local streaming ho...

24/04/2026

SVG Students To Watch: Jason Weitz, University of South Florida

The former business major from Massachusetts has found his home in graphics and bug operation while contributing to live ESPN productions In the live-sports-vi...

24/04/2026

MASN and Spectrum Announce Multiyear Carriage Agreement

The Mid-Atlantic Sports Network (MASN) and Spectrum have announced a multiyear carriage agreement making MASN available to Spectrum customers in areas of southe...

24/04/2026

Case Study: 4Wall Entertainment Powers 2026 NFL Draft on NETGEAR AV Line Network

The NFL Draft is rebuilt from the ground up in a new city every year. The three-day fan festival is expected to draw 500,000 or more attendees, with millions fo...

24/04/2026

Diversified Continues Live Sports Expansion, Supports Mobile TV Group Ops Center Buildout

Diversified has continued expansion of its sports and media capabilities to supp...

24/04/2026

NAB Show 2026 Is In The Books! Our Coverage Continues at SVG's SportsTech@NABShow Blog

NAB reports that the 2026 NAB Show wrapped with more than 58,000 registered atte...

24/04/2026

NAB 2026: Clear-Com Announces Updates to Arcadia Central Station and Eclipse HX with New ARC Architecture

Clear-Com has announced significant updates to its Arcadia Central Station and E...

24/04/2026

NAB Reports 58,000+ Registered Attendees at 2026 NAB Show, Up From 2025 but Down From 2024

NAB reports that the 2026 NAB Show wrapped with more than 58,000 registered atte...

24/04/2026

Ratings Roundup: NHL Sees Best Regular Season Average Since 2013; CBS Sports Secures Most Watched Final Round Masters

Ratings Roundup is a rundown of recent rating news and is derived from press rel...

24/04/2026

Bleacher Reports Live NFL Draft Stream Builds on Record-Breaking Audience With Player-Driven, Interactive Production

B/R NFL Draft Live' refines the digital giant's productions around footb...

24/04/2026

Sportradar Report: The Viewing Experience Is the Product' as Sports Media Enters New Era of Personalization, Data-Driven Storytelling

Study highlights five pillars shaping modern fan engagement as broadcasters reth...

24/04/2026

ESPN/ABC, NFL Network Prepare for Record-Setting NFL Draft Presentation From Pittsburgh

The 2026 event, the first Draft with NFL Network under the ESPN umbrella, will b...

24/04/2026

SVG GameDay, Ep. 12: Detroit Lions' Jessica Shlemon - Motor City Football at Ford Field

In-venue and creative video staffers at the professional and collegiate level ha...

24/04/2026

Van Wagner Brings Warhol-Inspired Pop Art Vision to NFL Draft Videoboard Production

Integral to the Draft production for three almost decades, the company tells the...

24/04/2026

Film Festival Watch: 17 Sundance Institute-Supported Films to Screen at the 2026 Hot Docs Festival

Championing documentaries that illuminate and expand the artform is at the core ...

24/04/2026

NASA Receives L3Harris' Modified Next-Generation Research Aircraft

The NASA 777 aircraft departs the L3Harris facility in Waco, Texas....

24/04/2026

L3Harris Closes $1B Investment from Department of War in Missile Solutions Business

WASHINGTON, April 23, 2026 - L3Harris Technologies (NYSE: LHX) has closed a $1 b...

24/04/2026

Autonomous Logistics is the Marine Corps' Next Combat Advantage

In partnership with Airbus U.S. Space & Defense, Inc., L3Harris Technologies is advancing autonomous aviation for the U.S. Marine Corps' Aerial Logistics Co...

24/04/2026

Open, Connected, Decisive: the L3Harris hC2 Software Suite

L3Harris' hC2 software, powered by Systematic SitaWare, strengthens battlefield decision-making by using open architectures to connect platforms, sensors an...

24/04/2026

AERIS X Airborne Early Warning & Control: The Right Choice for Allied Homeland Defense

Artist rendering of L3Harris' AERIS X next-generation airborne early warning...

24/04/2026

Bitfocus Buttons wins NAB Show Product of the Year Award

Recognition for incredible advances in broadcast and enterprise control...

24/04/2026

Dalet Takes Home The Best in Show Award for Dalia at 2026 NAB

Dalet Takes Home The Best in Show Award for Dalia at 2026 NAB Brie Clayton April 23, 2026 0 Comments Media-aware agentic AI wins big for real-world ef...

24/04/2026

Calrec and Grass Valley Unlock Exceptional Choice and Flexibility for Broadcasters with ImPulseV and AMPP Integration

Calrec and Grass Valley Unlock Exceptional Choice and Flexibility for Broadcaste...

24/04/2026

Bitfocus Buttons wins NAB Show Product of the Year Award

Bitfocus Buttons wins NAB Show Product of the Year Award Brie Clayton April 24, 2026 0 Comments Recognition for incredible advances in broadcast and e...

24/04/2026

NAB Show Reports More Than 58,000 Registered Attendees for 2026

Share Copy link Facebook X Linkedin Bluesky Email...

24/04/2026

Dalet Takes Home The Best in Show Award for Dalia at 2026...

Media-aware agentic AI wins big for real-world efficiencies and time to value Dalet, a leading technology and service provider for media-rich organizations, to...

24/04/2026

Mediagenix Sweeps 2026 NAB Awards With Wins for Product o...

Mediagenix wins for its Scheduling Optimization capabilities that help broadcasters and FAST operators move beyond traditional scheduling automation toward cont...

24/04/2026

Setplex Secures Top Honors with NAB Show Project of the Y...

Setplex today announced that it has taken home the NAB Show Project of the Year Award in the Distribution category for its innovative deployment with UVOtv. Dur...

24/04/2026

farmerswife and Cirkus are Exhibiting MPTS 2026

Media and post-production teams are invited to experience next-level resource planning, project management, and connected media workflows at Stand K59 in The Gr...

23/04/2026

NAB Honors Rob Lowe and John Tesh With Hall of Fame Induction

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

Roku, Samsung Dominate CTV Platform Market in U.S.

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

G&D and VuWall Strengthen International Sales Team

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

The 2026 NAB Show Reports More than 58,000 Attendees

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

SmallHD Monitor Overlay License for Hi-5 and Hi-5 SX deli...

Partnership between ARRI and SmallHD brings new Hi-5 license Configurable monitor overlays adapt to individual working styles Supported by SmallHD monitors ru...

23/04/2026

Jeff Cronenweth ASC Sheds Light on Tron Ares with Astera

Lighting Master Cronenweth ASC brings a unique look to each grid world with the help of Astera Jeff Cronenweth on the set of Disney's TRON: ARES. Photo by...

23/04/2026

ZEISS Supreme Primes Shine in Star-Driven Short Dr Sam

DP Chloe Smolkin ( The Late Show, Kidz Bop ) joins director Danielle Beckmann and writer/actor Raji Ahsan behind the camera for the heartfelt short comedy Dr...

23/04/2026

Tribeca Festival 2026 Expands Tribeca Now, Spotlighting Digital Creators

April 23rd, 2026 Press Materials Available Here TRIBECA FESTIVAL 2026 EXPANDS TRIBECA NOW, SPOTLIGHTING DIGITAL CREATORS Tribeca Becomes First Major Film Fes...

23/04/2026

Strange Things Are Happening at McDonald's: The 'Tales From 85' Happy Meal Arrives Soon in Restaurants Worldwide

Back to All News Strange Things Are Happening at McDonald's: The Tales From...