Sony Pixel Power calrec Sony

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

14/06/2024

NVIDIA today announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare, finance, manufacturing, retail and every other industry.

High-quality training data plays a critical role in the performance, accuracy and quality of responses from a custom LLM - but robust datasets can be prohibitively expensive and difficult to access.

Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs.

The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs. The models are optimized to work with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization and evaluation. They're also optimized for inference with the open-source NVIDIA TensorRT-LLM library.

Nemotron-4 340B can be downloaded now from the NVIDIA NGC catalog and from Hugging Face, where developers can also use the Train on DGX Cloud service to easily fine-tune open AI models. Developers will soon be able to access the models at ai.nvidia.com, where they'll be packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.

Navigating Nemotron to Generate Synthetic Data LLMs can help developers generate synthetic training data in scenarios where access to large, diverse labeled datasets is limited.

The Nemotron-4 340B Instruct model creates diverse synthetic data that mimics the characteristics of real-world data, helping improve data quality to increase the performance and robustness of custom LLMs across various domains.

Then, to boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses. Nemotron-4 340B Reward grades responses on five attributes: helpfulness, correctness, coherence, complexity and verbosity. It's currently first place on the Hugging Face RewardBench leaderboard, created by AI2, for evaluating the capabilities, safety and pitfalls of reward models.

In this synthetic data generation pipeline, (1) the Nemotron-4 340B Instruct model is first used to produce synthetic text-based output. An evaluator model, (2) Nemotron-4 340B Reward, then assesses this generated text - providing feedback that guides iterative improvements and ensures the synthetic data is accurate, relevant and aligned with specific requirements. Researchers can also create their own instruct or reward models by customizing the Nemotron-4 340B Base model using their proprietary data, combined with the included HelpSteer2 dataset.

Fine-Tuning With NeMo, Optimizing for Inference With TensorRT-LLM Using open-source NVIDIA NeMo and NVIDIA TensorRT-LLM, developers can optimize the efficiency of their instruct and reward models to generate synthetic data and to score responses.

All Nemotron-4 340B models are optimized with TensorRT-LLM to take advantage of tensor parallelism, a type of model parallelism in which individual weight matrices are split across multiple GPUs and servers, enabling efficient inference at scale.

Nemotron-4 340B Base, trained on 9 trillion tokens, can be customized using the NeMo framework to adapt to specific use cases or domains. This fine-tuning process benefits from extensive pretraining data and yields more accurate outputs for specific downstream tasks.

A variety of customization methods are available through the NeMo framework, including supervised fine-tuning and parameter-efficient fine-tuning methods such as low-rank adaptation, or LoRA.

To boost model quality, developers can align their models with NeMo Aligner and datasets annotated by Nemotron-4 340B Reward. Alignment is a key step in training LLMs, where a model's behavior is fine-tuned using algorithms like reinforcement learning from human feedback (RLHF) to ensure its outputs are safe, accurate, contextually appropriate and consistent with its intended goals.

Businesses seeking enterprise-grade support and security for production environments can also access NeMo and TensorRT-LLM through the cloud-native NVIDIA AI Enterprise software platform, which provides accelerated and efficient runtimes for generative AI foundation models.

Evaluating Model Security and Getting Started The Nemotron-4 340B Instruct model underwent extensive safety evaluation, including adversarial tests, and performed well across a wide range of risk indicators. Users should still perform careful evaluation of the model's outputs to ensure the synthetically generated data is suitable, safe and accurate for their use case.

For more information on model security and safety evaluation, read the model card.

Download Nemotron-4 340B models via NVIDIA NGC and Hugging Face. For more details, read the research papers on the model and dataset.

See notice regarding software product information.
LINK: https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm...
See more stories from nvidia

North America Stories

02/12/2025

Case Study: How Mid-Atlantic Sports Network Moved to All-IP Distribution in 60 Days

Case Study: How Mid-Atlantic Sports Network Moved to All-IP Distribution in 60 D...

02/12/2025

2025 Sports Broadcasting Hall of Fame: Lee Corso, Coach, Commentator, Firebrand

2025 Sports Broadcasting Hall of Fame: Lee Corso, Coach, Commentator, FirebrandBy Ken Kerschbaumer Tuesday, December 2, 2025 - 7:00 am Print This Story | S...

02/12/2025

SVG All-Stars: Dan Nabors, Senior Director, Remote Engineering, TNT Sports

SVG All-Stars: Dan Nabors, Senior Director, Remote Engineering, TNT SportsThe veteran tech leader is helping guide Warner Bros. Discovery's at-home' re...

02/12/2025

National Lacrosse League Opens Season With New Cloud-Based Official Replay-Review System

National Lacrosse League Opens Season With New Cloud-Based Official Replay-Revie...

02/12/2025

Platinum White Paper: The Cinematic Look in Live Production - Bridging Aesthetics and Real-Time Broadcast Technology with Grass Valley

Platinum White Paper: The Cinematic Look in Live Production - Bridging Aesthetic...

02/12/2025

SVG India Announces Advisory Board; JioStar's Prashant Khanna Named Chairman

SVG India Announces Advisory Board; JioStar's Prashant Khanna Named ChairmanBy Ken Kerschbaumer Tuesday, December 2, 2025 - 11:15 am Print This Story |...

02/12/2025

SVG Summit 2025 Preview: New AI Production Tools Workshop' Features ESPN, Fox Sports, NBC Sports NEXT, PGA Tour, PlayOn, and USGA

SVG Summit 2025 Preview: New AI Production Tools Workshop' Features ESPN, F...

02/12/2025

Release Rundown: What to Watch in December, From Atropia to Endless Cookie

Written and directed by Hailey Gates, Atropia won the U.S. Grand Jury Prize: Dramatic at the 2025 Sundance Film Festival....

02/12/2025

Revolutionizing Defense: L3Harris' Advanced Counter-UAS Cutting-Edge Technologies Offer Scalable Solutions for NATO

The lightweight and highly portable CORVUS-RAVEN, recently demonstrated at the V...

02/12/2025

L3Harris Enhancing Marine Corps' Resilient, Multi-Orbit SATCOM Operations

The lightweight, rugged Panther 2 tri-band VSAT delivers high-speed data communications for Internet, VPN connectivity and video transmission over commercial an...

02/12/2025

Nielsen Global Impact Day 2025: Expanding our global impact

NGID volunteers in the United States NGID volunteers in Mexico NGID volunteers in India NGID volunteers in Spain NGID volunteers in Australia NGID voluntee...

02/12/2025

FCC Issues Reminder on Audio-Description Rule Deadlines

WASHINGTON The Federal Communications Commission's Media Bureau has issued a reminder that stations in DMAs 111 through 120 must implement its audio descrip...

02/12/2025

Sony Unveils Alpha 7 V Full-Frame Mirrorless Camera

SAN DIEGO Sony Electronics today introduced the Alpha 7 V (ILCE-7M5), the fifth generation of its Alpha 7 full-frame mirrorless lineup, powered by the newly dev...

02/12/2025

FCC Sets Comment Deadlines for Nexstar-Tegna Merger

WASHINGTON The Federal Communications Commission has opened a docket for comments on Nexstar Media Group's proposed $6.2 billion acquisition of Tegna and se...

02/12/2025

Veronica Rodriguez, Matt Schnaars and Jeff Warshaw Join NAB Board

WASHINGTON The National Association of Broadcasters has added a trio of media executives, Veronica Rodriguez, Matt Schnaars and Jeff Warshaw, to its board of di...

02/12/2025

Brightcove Unveils New Features

MILAN, Italy Brightcove has released seven new features designed to expand global reach, improve audience engagement, enhance live-streaming quality and streaml...

02/12/2025

BCNEXXT and NPC Media Sign Partnership Agreement for Vipe...

BCNEXXT, the developers of the advanced playout platform Vipe, today announced a new Service Provider Partner agreement with NPC Media, the Australian managed p...

02/12/2025

DPA Microphones N Series Wireless Mic System Now Availabl...

DPA Microphones today announces the immediate availability of its new N-Series Digital Wireless Microphone System for customers throughout the U.S. In addition ...

02/12/2025

Hiltron Reports a Well Attended and Successful Space Tech...

Hiltron Communications reports a well attended and successful Space Tech Expo exhibition, held at the Messe Bremen from Tuesday November 18 through Thursday Nov...

02/12/2025

The HELM powers Riverfire with flawless timecode precisio...

The beginning of September saw Riverfire by Australian Retirement Trust, in association with Channel 9 and Triple M, kick off the 2025 Brisbane Festival in spec...

02/12/2025

SGM Lighting and ACT Entertainment Announce Exclusive US...

SGM Lighting is pleased to announce an exciting new partnership with ACT Entertainment, under which ACT will serve as the exclusive distributor for SGM's ne...

02/12/2025

Harmonic Powers Telias NextGeneration Broadband Upgrade i...

Harmonic (NASDAQ: HLIT) today announced that Telia, the second largest telecom operator in Norway, is modernizing its broadband network with the company's i...

02/12/2025

Keepit and TIM AG enters distribution partnership for the...

European cloud data protection specialist Keepit is expanding its sales network in the DACH region and has gained TIM AG, one of the leading value-added distrib...

02/12/2025

Calrec pushes beyond traditional broadcasting with flexib...

Featured products at ISE 2026 This year s ISE 2026 theme "Push Beyond" reflects Calrec s direction of travel. The company is dedicated to supporting the AV an...

02/12/2025

Czech TV Implements Landmark LiveU OB Project in Central...

Czech Television, the national public broadcaster, has deployed one of the largest LiveU live production projects in the region for live sports and news coverag...

02/12/2025

Leader expands cinematic camera support for Live Producti...

Test & measurement innovator, Leader Electronics, has announced the release of v8.2 software for its ZEN and ZEN-W Series, bringing advanced cinematic camera se...

02/12/2025

Accedo and Magine Pro agree to merge SaaS businesses

Global provider of video streaming software and services, Accedo, and OTT platform provider, Magine Pro, have signed a binding agreement to carve out and merge ...

02/12/2025

Skandha Media Services New VP Sales Role Signals Bold Gro...

With proven success in broadcast, OTT, and content supply chains, K.S Avinash set to scale Skandha's service-first approach across APAC Skandha Media Servi...

02/12/2025

AJA Video Systems Joins RAVENNA Community

AJA Video Systems has joined the RAVENNA community, a network of partners devoted to advancing the RAVENNA standard for distributing real-time audio over intern...

02/12/2025

Report: ATSC 3.0 Would Boost Wireless Efficiency, Sustainability

HUNT VALLEY, Md. Sinclair and its wholly-owned subsidiary One Media Technologies have released a new report focused on how ATSC 3.0, the IP-based NextGen TV sta...

02/12/2025

FCC Issues a Reminder on Audio Description Rule Deadlines

WASHINGTON The Federal Communications Commission's Media Bureau has issued a reminder that stations in DMAs 111 through 120 must implement its audio descrip...

02/12/2025

ATSC Members Elect 4 to Board of Directors

WASHINGTON Members of the Advanced Television Systems Committee have elected four industry veterans to three-year terms on the organization's board of direc...

02/12/2025

Survey: Younger Gen Z Consumers Spend 5.1 Hours a Day on Social Media

A new survey from S&P Global Market Intelligence Kagan highlights how important social media is for younger consumers, with Gen Z spending 5.1 hours a day on so...

02/12/2025

Faculty Notes: Fall/Winter 2025

Faculty Notes: Fall/Winter 2025 Recent accomplishments, releases, and events by Berklee faculty. December 1, 2025 By Editorial Staff The cast of Kind of M...

02/12/2025

Ring In The Holidays With Our Biggest Sale Ever On Ivory II Pianos!

Ring In The Holidays With Our Biggest Sale Ever On Ivory II Pianos!From now until December 31st, you can save 50% or more on every Ivory II piano and collection...

02/12/2025

Made in America: How Stranger Things' Electrified the US Economy

Back to All News Made in America: How Stranger Things' Electrified the US Economy Entertainment 02 December 2025 GlobalUnited States Link copied to cl...

02/12/2025

NVIDIA and AWS Expand Full-Stack Partnership, Providing the Secure, High-Performance Compute Platform Vital for Future Innovation

At AWS re:Invent, NVIDIA and Amazon Web Services expanded their strategic collab...

01/12/2025

L3Harris and PentenAmio Formalise Agreement to Advance Key Management and Secure Communications Technology

L3Harris and PentenAmio formalise their teaming agreement at MilCIS 2025, streng...

01/12/2025

Artemis II: A Mission of Veterans, Firsts and Lunar Dreams

Artemis II is NASA's first crewed flight test of the Space Launch System rocket and Orion spacecraft. The crew, from left: Commander Reid Wiseman, Pilot Vic...

01/12/2025

Wooden Camera Releases Accessory Collection for Canon EOS C50

IRVINE, Calif. Wooden Camera has introduced its new Accessory Collection for the Canon EOS C50. The new lineup includes a low-profile, gimbal-ready cage, expand...

01/12/2025

FCC to Vote on LPTV Rules at December Public Meeting

WASHINGTON The Federal Communications Commission has released a tentative agenda for its Dec. 18 Open Commission Meeting that will include a vote on a report an...

01/12/2025

2026 Local TV Ad Forecasts Offer Growth and Uncertainties

In most years, a graph of annual local TV ad spending is about as predictable as an electrocardiogram of a reasonably healthy patient in a doctor's office. ...

01/12/2025

Increasingly Software-Centric Switchers Occupy Hybrid Space

Many industries have seen big-ticket hardware turn into software. Switchers, though, demand a combination of real-time performance and sheer bandwidth that has ...

01/12/2025

China to Host ITU World Radiocommunication Conference 2027

GENEVA Shanghai will host the next quadrennial Radiocommunication Assembly (RA-27) and World Radiocommunication Conference (WRC-27), Oct. 11-Nov. 12, 2027. This...

01/12/2025

Broadcasters Foundation Seeks Donations for Giving Tuesday

NEW YORK Just in time for Giving Tuesday tomorrow (Dec. 2), the Broadcasters Foundation of America is seeking out donations to help television and radio industr...

01/12/2025

Net Insight CEO Crister Fritzson Sets 2026 Retirement

STOCKHOLM, Sweden Net Insight CEO Crister Fritzson has informed the company's board that he will retire from the video transport and media cloud technology ...

01/12/2025

Netflix Unveils Hilarious, Heartwarming Trailer for Upcoming Family Comedy 'Single Papa'

Back to All News Netflix Unveils Hilarious, Heartwarming Trailer for Upcoming F...

01/12/2025

A DowntoEarth, AllTooRelatable Hero: Cashero' Teaser Trailer Unveiled, Premieres December 26

Back to All News A Down to Earth, All Too Relatable Hero: Cashero' Teaser ...