Sony Pixel Power calrec Sony

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

14/06/2024

NVIDIA today announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare, finance, manufacturing, retail and every other industry.

High-quality training data plays a critical role in the performance, accuracy and quality of responses from a custom LLM - but robust datasets can be prohibitively expensive and difficult to access.

Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs.

The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs. The models are optimized to work with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization and evaluation. They're also optimized for inference with the open-source NVIDIA TensorRT-LLM library.

Nemotron-4 340B can be downloaded now from the NVIDIA NGC catalog and from Hugging Face, where developers can also use the Train on DGX Cloud service to easily fine-tune open AI models. Developers will soon be able to access the models at ai.nvidia.com, where they'll be packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.

Navigating Nemotron to Generate Synthetic Data LLMs can help developers generate synthetic training data in scenarios where access to large, diverse labeled datasets is limited.

The Nemotron-4 340B Instruct model creates diverse synthetic data that mimics the characteristics of real-world data, helping improve data quality to increase the performance and robustness of custom LLMs across various domains.

Then, to boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses. Nemotron-4 340B Reward grades responses on five attributes: helpfulness, correctness, coherence, complexity and verbosity. It's currently first place on the Hugging Face RewardBench leaderboard, created by AI2, for evaluating the capabilities, safety and pitfalls of reward models.

In this synthetic data generation pipeline, (1) the Nemotron-4 340B Instruct model is first used to produce synthetic text-based output. An evaluator model, (2) Nemotron-4 340B Reward, then assesses this generated text - providing feedback that guides iterative improvements and ensures the synthetic data is accurate, relevant and aligned with specific requirements. Researchers can also create their own instruct or reward models by customizing the Nemotron-4 340B Base model using their proprietary data, combined with the included HelpSteer2 dataset.

Fine-Tuning With NeMo, Optimizing for Inference With TensorRT-LLM Using open-source NVIDIA NeMo and NVIDIA TensorRT-LLM, developers can optimize the efficiency of their instruct and reward models to generate synthetic data and to score responses.

All Nemotron-4 340B models are optimized with TensorRT-LLM to take advantage of tensor parallelism, a type of model parallelism in which individual weight matrices are split across multiple GPUs and servers, enabling efficient inference at scale.

Nemotron-4 340B Base, trained on 9 trillion tokens, can be customized using the NeMo framework to adapt to specific use cases or domains. This fine-tuning process benefits from extensive pretraining data and yields more accurate outputs for specific downstream tasks.

A variety of customization methods are available through the NeMo framework, including supervised fine-tuning and parameter-efficient fine-tuning methods such as low-rank adaptation, or LoRA.

To boost model quality, developers can align their models with NeMo Aligner and datasets annotated by Nemotron-4 340B Reward. Alignment is a key step in training LLMs, where a model's behavior is fine-tuned using algorithms like reinforcement learning from human feedback (RLHF) to ensure its outputs are safe, accurate, contextually appropriate and consistent with its intended goals.

Businesses seeking enterprise-grade support and security for production environments can also access NeMo and TensorRT-LLM through the cloud-native NVIDIA AI Enterprise software platform, which provides accelerated and efficient runtimes for generative AI foundation models.

Evaluating Model Security and Getting Started The Nemotron-4 340B Instruct model underwent extensive safety evaluation, including adversarial tests, and performed well across a wide range of risk indicators. Users should still perform careful evaluation of the model's outputs to ensure the synthetically generated data is suitable, safe and accurate for their use case.

For more information on model security and safety evaluation, read the model card.

Download Nemotron-4 340B models via NVIDIA NGC and Hugging Face. For more details, read the research papers on the model and dataset.

See notice regarding software product information.
LINK: https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm...
See more stories from nvidia

North America Stories

17/12/2025

Sports Broadcasting Hall of Fame Inducts 10 Industry Icons During Unforgettable Night

Sports Broadcasting Hall of Fame Inducts 10 Industry Icons During Unforgettable ...

17/12/2025

ESPN to Debut MNF Playbook with Next Gen Stats, a New AI-Driven NFL Data-AltCast

ESPN to Debut MNF Playbook with Next Gen Stats, a New AI-Driven NFL Data-AltCastThe series, powered by Adrenaline TruPlay AI, launches Dec. 22 and runs through ...

17/12/2025

Inaugural Optum Golf Channel Games Debut Under the Lights' in Primetime on Golf Channel and USA Network

Inaugural Optum Golf Channel Games Debut Under the Lights' in Primetime on ...

17/12/2025

Broadcast and Streaming Serve Up a Historic Month of TV in Nielsen's The Gauge

Audiences Watched Over 103 Billion Minutes of TV on Thanksgiving Day NFL Games ...

17/12/2025

EdgeBeam Wireless Makes Initial Sale, Expands Executive Team

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

Warner Bros. Discovery Tells Shareholders to Reject Paramount Bid

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

NDI, Zoom Collaborate on Seamless Connectivity

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

Broadcasters Mark Momentous Year of Challenges Amid Viewing Fragmentation

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

Tennessee Public Broadcaster to Trial AI Metadata

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

Chyron Unveils New AXIS Maps

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

Broadcast Solutions Group Acquires PMT Professional Motion Technology

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

Study: U.S. Pay TV, Video Revenue to Total $190.7B in 2030

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

Texas Attorney General Takes Aim at 5 TV Manufacturers

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

DirecTV Wins Appeal in Retransmission Price-Fixing Suit

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

E.W. Scripps Rejects Sinclair Takeover Bid

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

17/12/2025

KPop Demon Hunters Stars Visit Berklee for Weeklong Celebration

KPop Demon Hunters Stars Visit Berklee for Weeklong Celebration Andrew Choi and EJAE, who voiced the film's main characters and contributed to its soundtr...

17/12/2025

Inside The Unseen World Of Indian Customs: Netflix Reveals The Brave Heroics Of The Customs Officers In Taskaree

Back to All News Inside The Unseen World Of Indian Customs: Netflix Reveals The...

17/12/2025

Netflix announces PAPARAZZI KING: the docu series coming to Netflix on January 9th

Back to All News Netflix announces PAPARAZZI KING: the docu series coming to Ne...

17/12/2025

Netflix Unveils First Look at 'Jo Nesbo's Detective Hole' Premiering on March 26, 2026

Back to All News Netflix Unveils First Look at Jo Nesbo's Detective Hole Pr...

17/12/2025

Netflix Welcomes Warner Bros. Discovery Board Recommendation

Back to All News Netflix Welcomes Warner Bros. Discovery Board Recommendation Business 17 December 2025 Global Link copied to clipboard After Careful Revi...

17/12/2025

UC San Diego Lab Advances Generative AI Research With NVIDIA DGX B200 System

The Hao AI Lab research team at the University of California San Diego - at the forefront of pioneering AI model innovation - recently received an NVIDIA DGX B...

17/12/2025

Into the Omniverse: OpenUSD and NVIDIA Halos Accelerate Safety for Robotaxis, Physical AI Systems

Editor's note: This post is part of Into the Omniverse, a series focused on ...

16/12/2025

L3Harris Delivers Most Powerful Thrusters for NASA's Lunar Gateway

Three 12-kilowatt Advanced Electric Propulsion System thrusters, supplied by L3Harris Technologies, form the core of Gateway's propulsion system. Pictured i...

16/12/2025

Palantir and L3Harris: Reindustrializing Defense Through AI-Powered Production

The challenge facing America's defense industrial base is not just about speed - its about rebuilding the foundation that makes speed possible. Our nations ...

16/12/2025

NFL Boosts Broadcast, Streaming Viewing Share in November

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

16/12/2025

Spanish Broadcaster Updates Playout With Pebble Solutions

SEVILLE, Spain Canal Sur, the public broadcasting service for Andalusia, Spain, has completed a total technology refresh based on Pebble's resilient, softwa...

16/12/2025

Telescript International Buys Prompter Software Company

NEW YORK Teleprompting hardware provider Telescript International has acquired all software code and intellectual property previously owned by Telescript West. ...

16/12/2025

Ookla: T-Mobile Is Fastest Fixed Wireless Access Provider

As cable operators face increased competition from 5G fixed wireless access providers, a new report from Ookla Research finds that T-Mobile is the FWA speed lea...

16/12/2025

Apple TV App for Android Now Supports Google Cast

Apple has announced a major upgrade to the Apple TV app for device owners outside the Apple ecosystem with news that the Apple TV app for Android now supports G...

16/12/2025

Emma Appleton, Fares Fares, Frida Gustavsson and Jakob Oftebro Star in Swedish Thriller Series Bytet'

Back to All News Emma Appleton, Fares Fares, Frida Gustavsson and Jakob Oftebro...

16/12/2025

Docu-reality 'My Korean Boyfriend' Gets a Trailer and Premiere Date: January 1st. Is Real Life Really like a K-drama?

Back to All News Docu-reality My Korean Boyfriend Gets a Trailer and Premiere D...

15/12/2025

Harlem Globetrotters Celebrate 100th Anniversary With New Brand Campaign From The Famous Group

Harlem Globetrotters Celebrate 100th Anniversary With New Brand Campaign From Th...

15/12/2025

2026 Sundance Film Festival Reveals 54 Titles Selected for Short Film Program Presented by Ketel One Vodka

Top L-R: La Tierra Del Valor (The Home of the Brave), Mangittatuarjuk (The Gnawe...

15/12/2025

L3Harris to Provide Assured Communications for US Air Force's Survivable Airborne Operations Center Program

L3Harris will leverage 15 years of experience supporting the E-4B Nightwatch and...

15/12/2025

Arkansas TV Ends PBS Affiliation Amid Funding Cuts

CONWAY, Ark. In a notable example of how the loss of federal funding is forcing public stations to make massive cuts and operational changes, the statewide pub...

15/12/2025

PMVG Acquires WBPA-LD for WQED Pittsburgh

BOULDER, Colo. Public Media Venture Group (PMVG), Venture Technologies Group (VTG), and WQED have completed a multipart agreement that they say will significant...

15/12/2025

How Rivian s Design Puts Drivers First-And Why That Matters

How Rivian s Design Puts Drivers First-And Why That Matters Published on Dec 15, 2025 Categories: Business Solutions LinkedIn Corporate Communications Sha...

15/12/2025

NVIDIA Acquires Open-Source Workload Management Provider SchedMD

NVIDIA today announced it has acquired SchedMD - the leading developer of Slurm, an open-source workload management system for high-performance computing (HPC) ...

15/12/2025

How to Fine-Tune an LLM on NVIDIA GPUs With Unsloth

Modern workflows showcase the endless possibilities of generative and agentic AI on PCs. Of many, some examples include tuning a chatbot to handle product-supp...

13/12/2025

YouTube TV to Launch Genre Packages

In a move that will help it offer more flexible and less costly programming options, YouTube TV has announced that it will be launching YouTube TV Plans with mo...

13/12/2025

Magna Systems Finishes UHD, IP-based OB Truck For Singapore Network

SINGAPORE Magna Systems has designed, built and completed what is believed to be the first full UHD and IP-based OB truck in Southeast Asia for a Singapore medi...

12/12/2025

SVG Summit 2025 Preview: Everything You Need to Know for Next Week's Big Show in NYC

SVG Summit 2025 Preview: Everything You Need to Know for Next Week's Big Sho...

12/12/2025

Hailey Gates and Alia Shawkat Welcome You to the Village of Atropia

Hailey Gates at the Atropia premiere (photo by George Pimentel / Shutterstock for Sundance Film Festival)...

12/12/2025

Arkansas TV Drops PBS Affiliation Amid Funding Cuts

CONWAY, Ark. In a notable example of how the elimination of Federal federal funding is forcing public stations to make massive cuts and changes in the way they...

12/12/2025

Wisycom and DPA Microphones Appoint Rene Moerch as Group...

Wisycom and DPA Microphones announce the appointment of Ren Moerch as Group Product Director, Wireless, a strategic leadership role that will guide the combine...

12/12/2025

SMPTE Releases Updated Engineering Report on Artificial I...

SMPTE , the home of media professionals, technologists, and engineers, in conjuncture with the European Broadcasting Union (EBU) and the Entertainment Technolog...

12/12/2025

Keepit and Ingram Micro form strategic relationship in Po...

Keepit, the vendor-independent, cloud-native data protection provider, today announced a strategic go-to-market relationship in Poland with Ingram Micro, a lead...

12/12/2025

Atomos Enhances FUJIFILM GFX ETERNA 55 with RAW Capabilit...

Atomos announced the immediate availability of a new firmware update for its Ninja TX GO and Ninja TX monitor-recorders, unlocking Open Gate 48P RAW recording w...

12/12/2025

Professional Wireless Systems Provides Comprehensive RF S...

Professional Wireless Systems (PWS) once again played a critical role in delivering flawless wireless coordination and support at the 2025 Latin Grammy Awards a...