Sony Pixel Power calrec Sony

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

14/06/2024

NVIDIA today announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare, finance, manufacturing, retail and every other industry.

High-quality training data plays a critical role in the performance, accuracy and quality of responses from a custom LLM - but robust datasets can be prohibitively expensive and difficult to access.

Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs.

The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs. The models are optimized to work with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization and evaluation. They're also optimized for inference with the open-source NVIDIA TensorRT-LLM library.

Nemotron-4 340B can be downloaded now from the NVIDIA NGC catalog and from Hugging Face, where developers can also use the Train on DGX Cloud service to easily fine-tune open AI models. Developers will soon be able to access the models at ai.nvidia.com, where they'll be packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.

Navigating Nemotron to Generate Synthetic Data LLMs can help developers generate synthetic training data in scenarios where access to large, diverse labeled datasets is limited.

The Nemotron-4 340B Instruct model creates diverse synthetic data that mimics the characteristics of real-world data, helping improve data quality to increase the performance and robustness of custom LLMs across various domains.

Then, to boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses. Nemotron-4 340B Reward grades responses on five attributes: helpfulness, correctness, coherence, complexity and verbosity. It's currently first place on the Hugging Face RewardBench leaderboard, created by AI2, for evaluating the capabilities, safety and pitfalls of reward models.

In this synthetic data generation pipeline, (1) the Nemotron-4 340B Instruct model is first used to produce synthetic text-based output. An evaluator model, (2) Nemotron-4 340B Reward, then assesses this generated text - providing feedback that guides iterative improvements and ensures the synthetic data is accurate, relevant and aligned with specific requirements. Researchers can also create their own instruct or reward models by customizing the Nemotron-4 340B Base model using their proprietary data, combined with the included HelpSteer2 dataset.

Fine-Tuning With NeMo, Optimizing for Inference With TensorRT-LLM Using open-source NVIDIA NeMo and NVIDIA TensorRT-LLM, developers can optimize the efficiency of their instruct and reward models to generate synthetic data and to score responses.

All Nemotron-4 340B models are optimized with TensorRT-LLM to take advantage of tensor parallelism, a type of model parallelism in which individual weight matrices are split across multiple GPUs and servers, enabling efficient inference at scale.

Nemotron-4 340B Base, trained on 9 trillion tokens, can be customized using the NeMo framework to adapt to specific use cases or domains. This fine-tuning process benefits from extensive pretraining data and yields more accurate outputs for specific downstream tasks.

A variety of customization methods are available through the NeMo framework, including supervised fine-tuning and parameter-efficient fine-tuning methods such as low-rank adaptation, or LoRA.

To boost model quality, developers can align their models with NeMo Aligner and datasets annotated by Nemotron-4 340B Reward. Alignment is a key step in training LLMs, where a model's behavior is fine-tuned using algorithms like reinforcement learning from human feedback (RLHF) to ensure its outputs are safe, accurate, contextually appropriate and consistent with its intended goals.

Businesses seeking enterprise-grade support and security for production environments can also access NeMo and TensorRT-LLM through the cloud-native NVIDIA AI Enterprise software platform, which provides accelerated and efficient runtimes for generative AI foundation models.

Evaluating Model Security and Getting Started The Nemotron-4 340B Instruct model underwent extensive safety evaluation, including adversarial tests, and performed well across a wide range of risk indicators. Users should still perform careful evaluation of the model's outputs to ensure the synthetically generated data is suitable, safe and accurate for their use case.

For more information on model security and safety evaluation, read the model card.

Download Nemotron-4 340B models via NVIDIA NGC and Hugging Face. For more details, read the research papers on the model and dataset.

See notice regarding software product information.
LINK: https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm...
See more stories from nvidia

Most recent headlines

09/11/2025

Dalet Unveils Agentic AI Media Workflows at IBC2025

Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...

06/10/2025

France Tlvisions Wins Prestigious 2025 EBU Technology & Innovation Award in Groundbreaking Collaboration with Dalet

France T l visions, France's leading broadcaster, has received the 2025 EBU ...

18/09/2025

DirecTV Adds Three Sports Channels to Its FAST Streaming Offering

DirecTV continues to expand its women's sports offerings by adding Sports Fanatics' and Whoopi Goldberg's Free Ad-Supported Television (FAST) All Wo...

18/09/2025

Fall Football Kicks Off With Native HDR Production, NextGen TV Telecast

WASHINGTON NextGen TV viewers are beginning to see what high dynamic range (HDR) brings to their football enjoyment with the launch of the sport's fall seas...

18/09/2025

FCC Extends Deadline for EEO Audit Replies

WASHINGTON After issuing audit letters seeking Equal Employment Opportunity data from a randomly selected group of TV and radio stations in August, the Federal ...

18/09/2025

Warner Bros. Discovery Signs New Deal with Nielsen

NEW YORK Warner Bros. Discovery and Nielsen have signed a new, long-term, multi-year deal that covers measurement for all Warner Bros. Discovery platforms acros...

18/09/2025

CIMM Launches Startup Program and Innovation Showcase

NEW YORK As part of its ongoing efforts to develop better measurement solutions, the Coalition for Innovative Media Measurement (CIMM) announced the launch of t...

18/09/2025

UPDATED: ABC Takes 'Jimmy Kimmel Live! Off the Air Indefinitely

IRVING, Texas Following threats from the Federal Communications Commission and the announcement that the nations largest station group, Nexstar Media Group, wou...

17/09/2025

Tech Focus: Audio Training, Part 2 - Manufacturers Offer Extensive Online Learning

Tech Focus: Audio Training, Part 2 - Manufacturers Offer Extensive Online Learni...

17/09/2025

Tech Focus: Audio Training, Part 1 - A1 Shortage Remains a Major-League Challenge for Sports Broadcasting

Tech Focus: Audio Training, Part 1 - A1 Shortage Remains a Major-League Challeng...

17/09/2025

Dua Lipa's Service95 Book Club' Goes Live at the New York Public Library

It was the ultimate convergence of pop culture and literary prestige: Last night, Dua Lipa brought her Service95 Book Club podcast to the stage for a special li...

17/09/2025

The Gauge: Mexico August 2025

During August, streaming's share of TV viewing in Mexico showed an increase of 0.4% compared to the previous month, accounting for 25% of TV viewing. Discl...

17/09/2025

Jo Aun Joins FOR-A America as Senior Manager, Product Engineering

CYPRESS, Calif. FOR-A America has named Jo Aun as senior manager of product engineering, a new role responsible for guiding the planning, development and rollou...

17/09/2025

PlayBox Neo and CIS Group Power CazeTV with a seamless Pl...

PlayBox Neo, in partnership with CIS Group, a leading provider of media and broadcast technology solutions, has successfully deployed PlayBox Neo's Dual Cha...

17/09/2025

Energy Regulatory Agency Underscores Commitment with Ene...

In a relationship that mirrors societal advances in sustainability, Brightline Lighting and the Federal Energy Regulatory Commission (FERC) Headquarters have en...

17/09/2025

Clear-Com Powers Star-Studded Communications at Houston A...

Clear-Com is proud to support the world-class productions of Alley Theatre, one of the oldest and largest nonprofit resident theatres in the United States. With...

17/09/2025

Arch Platform Technologies Announces Strategic Collaborat...

Arch Platform Technologies (www.archpt.io), a pioneer in automated, scalable cloud infrastructure for high-performance workflows, today announced a Strategic Co...

17/09/2025

With over 39bn EUR in assets under management and record-...

Over 300 selected decision-makers from start-ups, corporates, and VC funds worldwide will gather for the third edition of the event, united by a single goal: to...

17/09/2025

Telestream Celebrates Award Win at IBC2025

Telestream, a global leader in media workflow technologies, is excited to announce that its flagship Vantage platform and its next-generation AI capabilities re...

17/09/2025

Mediagenix Celebrates Triple Best of Show Wins at IBC2025...

Mediagenix, a global leader in smart content solutions that profitably connect the right content to the right audience, proudly announces its three Best of Show...

17/09/2025

PlayBox Neo Appoints Transtel Universal as Top Reseller P...

In a move to further establish a firm foothold across South East Asia, PlayBox Neo, the well-respected name in broadcast playout and channel branding, has appoi...

17/09/2025

Wisycom Unveils Two New Solutions at IBC 2025

Wisycom, a global leader in advanced wireless audio solutions, announced two major wireless solutions at IBC 2025 (Stand 8.D30). This includes the Portable RF-o...

17/09/2025

Six Berklee Alumni Win Emmy Awards

Six Berklee Alumni Win Emmy Awards The recipients were recognized for their contributions to acclaimed programs Severance, The Studio, The Penguin, SNL50: The...

17/09/2025

Applications Open for Berklee in Santo Domingo

Applications Open for Berklee in Santo Domingo The weeklong contemporary music program will run January 5-10, 2026. By Colette Greenstein September 17, 2025 ...

17/09/2025

Ukrainian Students Find Creative Consonance' at Berklee Valencia

Ukrainian Students Find Creative Consonance' at Berklee Valencia Through ELIA's UAx Platform, six students from Kyiv joined Berklee Valencia for a week...

17/09/2025

Meet Kenna Hilburn, Avids New Incoming Chief Product Officer

Earlier this year Avid announced Kenna Hilburn as its new senior vice president of product. Recently Hilburn was promoted to Avids new Chief Product Officer, su...

17/09/2025

SES and K2 Space to Accelerate Development of Next-Generation MEO Network

Transatlantic collaboration combines experience and agility to drive innovation in network design and delivery Luxembourg, September 16, 2025 - SES, a leading ...

17/09/2025

Fox TV Stations Join Madhive's Local Live Sports Marketplace

NEW YORK Madhive has announced that the Fox Television Stations have joined its Live Sports Marketplace....

17/09/2025

Sony Electronics Partners with Newhouse School at Syracuse University

SYRACUSE, N.Y. Sony Electronics has announced that it is partnering with the Newhouse School at Syracuse University to provide state-of-the-art equipment, hands...

17/09/2025

Roku's First TV Smart Projector Now Available in the U.S.

SAN JOSE, Calif. Roku has announced that the first smart projector using its Roku TV operating system, the Aurzen Roku TV Smart Projector D1R Cube, is now avail...

17/09/2025

Meet the Streamlabs Streaming Assistant, Accelerated by NVIDIA RTX

Today's creators are equal parts entertainer, producer and gamer, juggling game commentary, scene changes, replay clips, chat moderation and technical troub...

17/09/2025

Portrait Artist of the Year returns to Sky Arts with a dazzling line-up of celebrity sitters on 1 October

Wednesday 17 September 2025 UK artists capture icons of stage and screen, inclu...

17/09/2025

FOR-A America Appoints Jo Aun to Lead U.S. Product Development

Jo Returns to FOR-A as Senior Manager of Product Management and Engineering...

17/09/2025

AIR's Big Comeback with DPA Microphones

For the Moon Safari anniversary tour, AIR opened the doors to their backstage. Just a few hours before the Paris concert, DPA met with two key figures of the te...

17/09/2025

The Late Late Toy Show hits the road in search of Ireland's brightest young stars

Auditions will be held in Dublin, Cork and Galway The County Parade returns f...

16/09/2025

SVG All-Stars: Leigh Michaud, Manager, Remote Operations, ESPN

SVG All-Stars: Leigh Michaud, Manager, Remote Operations, ESPNThe UConn grad rose from ESPN's mailroom to become one of its most valuable ops leadersBy Bran...

16/09/2025

Live From IBC 2025: Friday's Latest From Halls 1-4, Outdoor Exhibits in Amsterdam

Live From IBC 2025: Friday's Latest From Halls 1-4, Outdoor Exhibits in Amst...

16/09/2025

Live From IBC 2025: Saturday's Latest From Halls 5-7 in Amsterdam

Live From IBC 2025: Saturday's Latest From Halls 5-7 in Amsterdam By SVG Staff Friday, September 12, 2025 - 17:00 Print This Story The SVG Europe and ...

16/09/2025

Live From IBC 2025: Sunday's Latest From Halls 8-10 in Amsterdam

Live From IBC 2025: Sunday's Latest From Halls 8-10 in Amsterdam By SVG Staff Saturday, September 13, 2025 - 17:00 Print This Story The SVG Europe and...

16/09/2025

Live From IBC 2025: Monday's Latest From Halls 11-14 in Amsterdam

Live From IBC 2025: Monday's Latest From Halls 11-14 in Amsterdam By SVG Staff Sunday, September 14, 2025 - 17:00 Print This Story The SVG Europe and ...

16/09/2025

Amazon Prime Video Picks Up Four Hours of Early-Round Masters Coverage in 2026

Amazon Prime Video Picks Up Four Hours of Early-Round Masters Coverage in 2026 By Jason Dachman, Editorial Director, U.S. Tuesday, September 16, 2025 - 10:15...

16/09/2025

VERSANT Inks Deal for League One Volleyball as Women's Sports Rights Slate Grows

VERSANT Inks Deal for League One Volleyball as Women's Sports Rights Slate G...

16/09/2025

ESPN VP, Corporate Communications, Katina Arnold Named SVP, Disney Advertising Communications

ESPN VP, Corporate Communications, Katina Arnold Named SVP, Disney Advertising C...

16/09/2025

IBC 2025 in Review: SVG Europe's Full Collection of Video Interviews From the Show Floor

IBC 2025 in Review: SVG Europe's Full Collection of Video Interviews From th...

16/09/2025

Celebramos 10 aos de Viva Latino en Spotify y el xito global de la msica latina

Hace una d cada, la m sica latina representaba apenas el 8% de las reproducciones globales en Spotify. Hoy, constituye m s de una cuarta parte (27%) de toda la ...

16/09/2025

Celebrating 10 Years of Spotify's Viva Latino Playlist and the Global Rise of Latin Music

A decade ago, Latin music made up just 8% of global Spotify streams. Today, it a...

16/09/2025

Spotify Welcomes Graham Norton and Select VICE Studios Content

Spotify is expanding our video lineup with a new partnership with Zoo 55, part of ITV Studios. For the first time, acclaimed content from ITV Studios is landing...

16/09/2025

One Enterprise, One Mission: Aligning the Supply Chain to the Warfighter

At DSEI 2025, James Dunne of L3Harris Maritime UK chaired a panel on aligning the supply chain to the warfighter, where leaders discussed modernising support fo...

16/09/2025

RTW chooses Calrec as technology partner

Calrec has strengthened its collaboration with audio metering expert RTW by integrating RTW's new TMxCore metering platform across its full range of Argo IP...

16/09/2025

Football and Back-to-School Dynamics Spark First Gains Since April for Traditional TV

College Football Scores Top Telecast in August with 16M+ Viewers on FOX, Followe...