Sony Pixel Power calrec Sony

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

14/06/2024

NVIDIA today announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare, finance, manufacturing, retail and every other industry.

High-quality training data plays a critical role in the performance, accuracy and quality of responses from a custom LLM - but robust datasets can be prohibitively expensive and difficult to access.

Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs.

The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs. The models are optimized to work with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization and evaluation. They're also optimized for inference with the open-source NVIDIA TensorRT-LLM library.

Nemotron-4 340B can be downloaded now from the NVIDIA NGC catalog and from Hugging Face, where developers can also use the Train on DGX Cloud service to easily fine-tune open AI models. Developers will soon be able to access the models at ai.nvidia.com, where they'll be packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.

Navigating Nemotron to Generate Synthetic Data LLMs can help developers generate synthetic training data in scenarios where access to large, diverse labeled datasets is limited.

The Nemotron-4 340B Instruct model creates diverse synthetic data that mimics the characteristics of real-world data, helping improve data quality to increase the performance and robustness of custom LLMs across various domains.

Then, to boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses. Nemotron-4 340B Reward grades responses on five attributes: helpfulness, correctness, coherence, complexity and verbosity. It's currently first place on the Hugging Face RewardBench leaderboard, created by AI2, for evaluating the capabilities, safety and pitfalls of reward models.

In this synthetic data generation pipeline, (1) the Nemotron-4 340B Instruct model is first used to produce synthetic text-based output. An evaluator model, (2) Nemotron-4 340B Reward, then assesses this generated text - providing feedback that guides iterative improvements and ensures the synthetic data is accurate, relevant and aligned with specific requirements. Researchers can also create their own instruct or reward models by customizing the Nemotron-4 340B Base model using their proprietary data, combined with the included HelpSteer2 dataset.

Fine-Tuning With NeMo, Optimizing for Inference With TensorRT-LLM Using open-source NVIDIA NeMo and NVIDIA TensorRT-LLM, developers can optimize the efficiency of their instruct and reward models to generate synthetic data and to score responses.

All Nemotron-4 340B models are optimized with TensorRT-LLM to take advantage of tensor parallelism, a type of model parallelism in which individual weight matrices are split across multiple GPUs and servers, enabling efficient inference at scale.

Nemotron-4 340B Base, trained on 9 trillion tokens, can be customized using the NeMo framework to adapt to specific use cases or domains. This fine-tuning process benefits from extensive pretraining data and yields more accurate outputs for specific downstream tasks.

A variety of customization methods are available through the NeMo framework, including supervised fine-tuning and parameter-efficient fine-tuning methods such as low-rank adaptation, or LoRA.

To boost model quality, developers can align their models with NeMo Aligner and datasets annotated by Nemotron-4 340B Reward. Alignment is a key step in training LLMs, where a model's behavior is fine-tuned using algorithms like reinforcement learning from human feedback (RLHF) to ensure its outputs are safe, accurate, contextually appropriate and consistent with its intended goals.

Businesses seeking enterprise-grade support and security for production environments can also access NeMo and TensorRT-LLM through the cloud-native NVIDIA AI Enterprise software platform, which provides accelerated and efficient runtimes for generative AI foundation models.

Evaluating Model Security and Getting Started The Nemotron-4 340B Instruct model underwent extensive safety evaluation, including adversarial tests, and performed well across a wide range of risk indicators. Users should still perform careful evaluation of the model's outputs to ensure the synthetically generated data is suitable, safe and accurate for their use case.

For more information on model security and safety evaluation, read the model card.

Download Nemotron-4 340B models via NVIDIA NGC and Hugging Face. For more details, read the research papers on the model and dataset.

See notice regarding software product information.
LINK: https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm...
See more stories from nvidia

North America Stories

28/04/2026

Rise Upskill - Applications Now Open for Free Global Trai...

Rise, the award-winning advocacy group for gender diversity in the broadcast and media technology sector, is pleased to announce a new global training programme...

28/04/2026

Clear Com Announces New Roles for Brian Grahn and Ben Tur...

Clear-Com has appointed Brian Grahn as Market Outreach Manager of the Americas and Ben Turnwell as Business Development Manager for EMEA live, expanding their ...

28/04/2026

LiveU Steps into the Future at MPTS 2026 with the Introdu...

LiveU is inviting MPTS visitors to step into the companys new Q Era on Stand D32, at The Grand Hall, Olympia, London (May 13-14). The company will showcase its ...

28/04/2026

IBC launches 2026 Innovation Awards to spotlight real-wor...

IBC today announces the launch of the IBC2026 Innovation Awards, with nominations now open for projects, programmes and initiatives that exemplify breakthrough ...

28/04/2026

WNBA to Stream All Preseason Games for Free

Share Copy link Facebook X Linkedin Bluesky Email...

28/04/2026

Nexstar Media Charitable Foundation Sets 30 Days of Giving'

Share Copy link Facebook X Linkedin Bluesky Email...

28/04/2026

Sinclair's Chief Compliance Officer Jeff Lewis to Retire

Share Copy link Facebook X Linkedin Bluesky Email...

28/04/2026

Nielsen Introduces Predictive Sales Lift' Tool

Share Copy link Facebook X Linkedin Bluesky Email...

28/04/2026

Sencore's VB440 Monitoring, Analysis Tool Debuts at NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

28/04/2026

Pinterest Makes a Major Push into CTV Advertising

Share Copy link Facebook X Linkedin Bluesky Email...

28/04/2026

Introducing Nx 3-Strip v2 - A Physics-Based Technicolor Reconstruction for DaVinci Resolve

Introducing Nx 3-Strip v2 - A Physics-Based Technicolor Reconstruction for DaVin...

27/04/2026

CES Power Acquires Three Ireland-Based Businesses

CES Power, a provider of infrastructure for live events, has announced the acquisition of three Ireland-based businesses: GH Energy Rental Ltd, Event Power, and...

27/04/2026

Fubo to Launch Multiview on Select LG TVs Ahead of 2026 Football Season

FuboTV Inc. has announced it is developing its Multiview feature for the Fubo streaming service on select LG TVs, including 2024, 2025, and newer 4K and 8K mode...

27/04/2026

Shade Raises $14 Million in Funding Round Led by Khosla Ventures

Shade, a file management platform for creative teams, has announced a $14 million funding round led by Khosla Ventures, Construct Capital, and Bling Capital, br...

27/04/2026

AES to Present Immersive Audio Academy 12th Edition on April 30

The Audio Engineering Society (AES) will present the Immersive Audio Academy 12th Edition - Immersive Audio in All Flavors - on April 30, 2026, at 12:00 p.m. ...

27/04/2026

DAZN Launches DAZN48 Creator Program for FIFA World Cup 2026

DAZN has announced DAZN48, a creator program for the FIFA World Cup 2026 that will recruit 48 creators - one representing each of the 48 qualified nations - to ...

27/04/2026

FloSports Acquires Streaming Rights to Four CrossFit Events

FloSports has announced exclusive streaming rights to four CrossFit competitions: Legends Del Mar: CrossFit Semi-Finals, Magic City Games, NorCal Classic, and t...

27/04/2026

NAB 2026: Telestream Pulse Named NAB Show 2026 Product of the Year for Monitoring and Measuring

Telestream has announced that Pulse, its software-defined test and measurement p...

27/04/2026

NAB 2026: Shade Wins 2026 NAB Show Product of the Year Award

Shade has announced it is a Cloud Computing and Storage winner in the 2026 NAB Show Product of the Year Awards. Winners were selected by a panel of industry exp...

27/04/2026

SVG All-Stars: Kelsey Kjeldsen, Senior Director and Video Ads Platform Lead, DTC Products, Tech, and Operations, NBA

Leading the NBA's video-ads platform, this Penn State grad is at the forefro...

27/04/2026

DAZN Extends T100 Triathlon World Tour Rights to Africa

DAZN has expanded its international broadcast rights for the T100 Triathlon World Tour to include Africa. All races from the T100 calendar will be available for...

27/04/2026

NAB 2026: NAB Show Announces 2026 Project and Product of the Year Award Winners

NAB Show has announced the recipients of its 2026 Project of the Year and Product of the Year Awards at a ceremony at the Las Vegas Convention Center. Each wi...

27/04/2026

DIRECTV Launches on Meta Quest Headsets as Sports Season Heats Up

DIRECTV has launched on Meta Quest headsets, becoming the first MVPD to offer live TV through the platform. The timing coincides with the stretch run of the MLB...

27/04/2026

TBL Team Boxing League and MSG Networks Announce Broadcast Partnership

TBL Team Boxing League has announced a broadcast agreement with MSG Networks to air all remaining Season 4 fights live across MSG's television and digital p...

27/04/2026

Audio Exhibitors Showcase New Platforms, Innovative Solutions for Complex Issues

IP integration, interoperability, growth of intercommunications were key concerns for vendors and visitors alike Attendees at the recently concluded 2026 NAB S...

27/04/2026

Behind The Mic: Kenny Beecham to Launch NBA Radio Show on SiriusXM; Mike Tomlin to Join NBC Pregame Show

Behind The Mic provides a roundup of recent news regarding on-air talent, includ...

27/04/2026

On the Show Floor, the Microphone Is Still the Place Where Audio Begins

A pro-audio emphasis, spectrum changes, and on-field audio mark the new products and enhancements to existing offerings Microphones remain the primary point of...

27/04/2026

NAB Show 2026 In Review: Our Complete Collection of Video Interviews with Industry Thought Leaders

The Sports Video Group team was all over the NAB Show floor out in Las Vegas las...

27/04/2026

Study: Local TV Political Ad Spend to Top $4 billion in 2026

Share Copy link Facebook X Linkedin Bluesky Email...

27/04/2026

Nippon TV's In-House Proprietary AI Solution AiDi Wins Product of the Year Award at NAB 2026

Nippon TV's In-House Proprietary AI Solution AiDi Wins Product of the Year A...

27/04/2026

Outpost Introduces Unlimited Collaboration Model for Review and Approval Workflows

Outpost Introduces Unlimited Collaboration Model for Review and Approval Workflo...

27/04/2026

Ikegami Announces VFE-P07D Monocular OLED Viewfinder with Tiltable 3.5-inch LCD Monitor

Ikegami Announces VFE-P07D Monocular OLED Viewfinder with Tiltable 3.5-inch LCD ...

27/04/2026

Custom Consoles Completes Large Module-R MCR Desk and Med...

Custom Consoles announces the completion of a large Module-R desk and MediaWall monitor display mount for an expanded master control room at Gravity Medias West...

27/04/2026

Other World Computing Launches OWC Express 4M2 Ultra - Thunderbolt 5 Four-Slot NVMe M.2 SSD Enclosure

Other World Computing Launches OWC Express 4M2 Ultra - Thunderbolt 5 Four-Slot N...

27/04/2026

Netflix announces El sobrino, the new film by Damin Szifron starring Leonardo Sbaraglia

Back to All News Netflix announces El sobrino, the new film by Dami n Szifron s...

26/04/2026

Director Yoon Jong-bin Returns with The Generals' (WT), An Incisive Chronicle of a Second-in-Command

Back to All News Director Yoon Jong-bin Returns with The Generals' (WT), A...

26/04/2026

'Nine Queens,' Starring Alvaro Morte and Patrick Criado, Starts Production

Back to All News Nine Queens, Starring Alvaro Morte and Patrick Criado, Starts ...

25/04/2026

Mediagenix Sweeps 2026 NAB Awards With Wins for Product of the Year and Best of Show for Scheduling Optimization

Mediagenix Sweeps 2026 NAB Awards With Wins for Product of the Year and Best of ...

25/04/2026

SCHOEPS Microphones Announces Desert Island Boom Set for NAB 2026

SCHOEPS Microphones Announces Desert Island Boom Set for NAB 2026 Brie Clayton April 24, 2026 0 Comments Compact modular set ideal for location sound ...

25/04/2026

Berklee Africana Studies Hosts Gospel Extravaganza 2026

Berklee Africana Studies Hosts Gospel Extravaganza 2026 The Signature Series event will honor three new inductees to the Berklee Gospel Hall of Fame and celeb...

25/04/2026

Student Spotlight: Siva

Student Spotlight: Siva Maja Gierszewska, who performs under the artist name Siva, shares how she found her songwriting confidence at Berklee. April 24, 2026...

25/04/2026

WNBA Offers Free Streaming of all Preseason Games

Share Copy link Facebook X Linkedin Bluesky Email...

24/04/2026

Churchill Downs Inc. Acquires Preakness Stakes for $85 Million

Churchill Downs Inc. (CDI) has announced a definitive agreement to acquire the intellectual property of the Preakness Stakes and Black-Eyed Susan Stakes from 1/...

24/04/2026

Inter Miami CF Opens Miami Freedom Park with 26 LED Displays From Daktronics, DCL

Daktronics has partnered with DCL (Design Communications, Ltd.) to design, manuf...

24/04/2026

NAB 2026: Chyron Launches PRIME Translate for Multi-Language Live Production

Chyron has announced PRIME Translate, a workflow solution that produces live content simultaneously in multiple languages within the PRIME platform. The system ...

24/04/2026

Eutelsat Supports Co-op Cable in Launching Expanded DTH Offering Across the Caribbean

Eutelsat has announced a new partnership with Co-op Cable, introducing an expand...

24/04/2026

Pitch Dublin Deploys Panasonic Laser LCD Projectors for Indoor Golf and Hospitality Venue

Pitch Dublin, an indoor golf simulation and hospitality venue on Dawson Street i...

24/04/2026

G&D and VuWall Appoint Mirko Aubel as EVP Sales EMEA/APAC, Eric Hnique as Chief Revenue Officer

G&D and VuWall have announced two senior leadership appointments, effective Apri...

24/04/2026

Victory+ Becomes Exclusive Local Streaming Home for Minnesota Lynx

Victory , the free sports streaming service from A Parent Media Co. Inc. (APMC), has announced a multi-year agreement to become the exclusive local streaming ho...