Sony Pixel Power calrec Sony

NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

27/03/2024

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI.

In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM - software that speeds and simplifies the complex job of inference on large language models - boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago.

The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI.

Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM - a set of inference microservices that includes inferencing engines like TensorRT-LLM - makes it easier than ever for businesses to deploy NVIDIA's inference platform.

Raising the Bar in Generative AI TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs - the latest, memory-enhanced Hopper GPUs - delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date.

The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks.

The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark.

The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.

Memory Boost for NVIDIA Hopper GPUs NVIDIA is sampling H200 GPUs to customers today and shipping in the second quarter. They'll be available soon from nearly 20 leading system builders and cloud service providers.

H200 GPUs pack 141GB of HBM3e running at 4.8TB/s. That's 76% more memory flying 43% faster compared to H100 GPUs. These accelerators plug into the same boards and systems and use the same software as H100 GPUs.

With HBM3e memory, a single H200 GPU can run an entire Llama 2 70B model with the highest throughput, simplifying and speeding inference.

GH200 Packs Even More Memory Even more memory - up to 624GB of fast memory, including 144GB of HBM3e - is packed in NVIDIA GH200 Superchips, which combine on one module a Hopper architecture GPU and a power-efficient NVIDIA Grace CPU. NVIDIA accelerators are the first to use HBM3e memory technology.

With nearly 5 TB/second memory bandwidth, GH200 Superchips delivered standout performance, including on memory-intensive MLPerf tests such as recommender systems.

Sweeping Every MLPerf Test On a per-accelerator basis, Hopper GPUs swept every test of AI inference in the latest round of the MLPerf industry benchmarks.

In addition, NVIDIA Jetson Orin remains at the forefront in MLPerf's edge category. In the last two inference rounds, Orin ran the most diverse set of models in the category, including GPT-J and Stable Diffusion XL.

The MLPerf benchmarks cover today's most popular AI workloads and scenarios, including generative AI, recommendation systems, natural language processing, speech and computer vision. NVIDIA was the only company to submit results on every workload in the latest round and every round since MLPerf's data center inference benchmarks began in October 2020.

Continued performance gains translate into lower costs for inference, a large and growing part of the daily work for the millions of NVIDIA GPUs deployed worldwide.

Advancing What's Possible Pushing the boundaries of what's possible, NVIDIA demonstrated three innovative techniques in a special section of the benchmarks called the open division, created for testing advanced AI methods.

NVIDIA engineers used a technique called structured sparsity - a way of reducing calculations, first introduced with NVIDIA A100 Tensor Core GPUs - to deliver up to 33% speedups on inference with Llama 2.

A second open division test found inference speedups of up to 40% using pruning, a way of simplifying an AI model - in this case, an LLM - to increase inference throughput.

Finally, an optimization called DeepCache reduced the math required for inference with the Stable Diffusion XL model, accelerating performance by a whopping 74%.

All these results were run on NVIDIA H100 Tensor Core GPUs.

A Trusted Source for Users MLPerf's tests are transparent and objective, so users can rely on the results to make informed buying decisions.

NVIDIA's partners participate in MLPerf because they know it's a valuable tool for customers evaluating AI systems and services. Partners submitting results on the NVIDIA AI platform in this round included ASUS, Cisco, Dell Technologies, Fujitsu, GIGABYTE, Google, Hewlett Packard Enterprise, Lenovo, Microsoft Azure, Oracle, QCT, Supermicro, VMware (recently acquired by Broadcom) and Wiwynn.

All the software NVIDIA used in the tests is available in the MLPerf repository. These optimizations are continuously folded into containers available on NGC, NVIDIA's software hub for GPU applications, as well as NVIDIA AI Enterprise - a secure, supported platform that includes NIM inference microservices.

The Next Big Thing The use cases, model sizes and datasets for generative AI continue to expand. That's why MLPerf continues to evolve, adding real-world tests with popular models like Llama 2 70B and Stable Diffusion XL.

Keeping pace with the explosion in LLM model sizes, NVIDIA founder and CEO Jensen Huang announced last week at GTC that the NVIDIA Blackwell architecture GPUs will deliver new levels of performance required for the multitrillion-parameter AI models.

Inference for large language models is difficult, requiring both e
LINK: https://blogs.nvidia.com/blog/tensorrt-llm-inference-mlperf/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

04/06/2026

Nielsen: Thunder Rolls as NBA's Most-Watched Team

Share Copy link Facebook X Linkedin Bluesky Email...

04/06/2026

AI Drives Lenovo's 2026 FIFA World Cup Broadcast Plans

Share Copy link Facebook X Linkedin Bluesky Email...

04/06/2026

ATSC Conference Looks Beyond Traditional TV for 3.0 Success

Share Copy link Facebook X Linkedin Bluesky Email...

04/06/2026

ATSC Awards Highest Technical Honor to Julia Kenyon

Share Copy link Facebook X Linkedin Bluesky Email...

04/06/2026

Lumine Group to Acquire Synamedias Video Network Business

Share Copy link Facebook X Linkedin Bluesky Email...

04/06/2026

BFOA Launches 5th Annual Giving Day

Share Copy link Facebook X Linkedin Bluesky Email...

04/06/2026

Hemisphere Media Group Brings WAPA+ Fast Channel to Prime Video

Share Copy link Facebook X Linkedin Bluesky Email...

04/06/2026

APTS to Hold June 4 'Protect My Public Media Day'

Share Copy link Facebook X Linkedin Bluesky Email...

04/06/2026

NVIDIA and Microsoft Reinvent Windows PCs for the Age of Personal AI

NVIDIA and Microsoft Reinvent Windows PCs for the Age of Personal AI Brie Clayton June 3, 2026 0 Comments RTX Spark - a 1-Petaflop Superchip, the Full...

04/06/2026

Inside the Peaky Blinders: The Immortal Man Grade

Inside the Peaky Blinders: The Immortal Man Grade Brie Clayton June 3, 2026 0 Comments Simone Grattarola discusses shaping the look in DaVinci Resolve...

04/06/2026

Cine Gear Expo Announces 2026 Awards of Excellence Recipients

Cine Gear Expo Announces 2026 Awards of Excellence Recipients Brie Clayton June 3, 2026 0 Comments Ed Lachman ASC, Caleb Deschanel ASC, and M. David M...

04/06/2026

St. Vincents Live with Orchestra Tour to Feature Berklee Alum Ruby Plume

St. Vincents Live with Orchestra Tour to Feature Berklee Alum Ruby Plume Berklee alumni St. Vincent and Ruby Plume will appear on the same bill across seven d...

03/06/2026

SES Launches Multi-Orbit Satellite Inflight Connectivity on Viva Airlines

SES and Viva, Mexico's ultra low-cost airline, have launched multi-orbit satellite inflight connectivity on Viva's Airbus aircraft. A total of 60 A320s ...

03/06/2026

CFP, ESPN, and TNT Sports Announce 2026-27 College Football Playoff Broadcast Schedule

The College Football Playoff, ESPN, and TNT Sports have announced kick times and...

03/06/2026

RED Digital Cinema to Host Panels and Demos at Cine Gear Expo 2026

RED Digital Cinema will exhibit at Cine Gear Expo 2026 (Booth 33, June 5-6, Universal Studios Lot), hosting three panels and hands-on product demonstrations. P...

03/06/2026

Roku Launches Soccer Zone for FIFA World Cup 2026

Roku has announced the Soccer Zone, a dedicated hub for FIFA World Cup 2026 content available across the United States, Canada, Mexico, Brazil, Colombia, Argent...

03/06/2026

Liverpool FC and Wasabi Technologies Renew Multi-Year Partnership

Liverpool FC and Wasabi Technologies have announced a multi-year extension of their partnership, with Wasabi continuing as the club's official cloud storage...

03/06/2026

Sky News Australia Deploys Grass Valley AMPP for Cloud-Based Newsroom Production

Grass Valley has announced that Australian News Channel (ANC), operator of Sky News Australia, has deployed Grass Valley AMPP as part of the relocation of its n...

03/06/2026

Gudrun Scharler Appointed CEO of Riedel Networks

The Riedel Group has announced the appointment of Gudrun Scharler as CEO of Riedel Networks. She succeeds Michael Martens, who has led Riedel Networks since 201...

03/06/2026

Clear-Com Deploys Arcadia Central Station and FreeSpeak II at Itaka Arena in Poland

Clear-Com has announced the deployment of its intercom solutions at Itaka Arena ...

03/06/2026

Scott Coker Announces Six Executive Hires for New MMA Promotion

Scott Coker has announced six senior executive appointments for his new global MMA promotion, which launched earlier this year with $60 million in financing. Ad...

03/06/2026

Telemundo Announces Digital and Social Media Plans for FIFA World Cup 2026

Telemundo, the exclusive Spanish-language home of the FIFA World Cup 2026, has announced its digital and social media programming for the tournament, running Ju...

03/06/2026

Bundesliga Launches AI Assistant 'Captain' in Official App, Developed With AWS

The Bundesliga has announced the launch of Captain, an AI assistant built into t...

03/06/2026

Anthony James Partners Serving as Technology Consultant for UNC Kenan Stadium Modernization

Anthony James Partners (AJP) is serving as technology consultant for a moderniza...

03/06/2026

Audio-Technica Presents 2025-26 Samurai and Presidents Awards

Audio-Technica has announced the recipients of its annual sales rep firm awards for the 2025-26 fiscal year. The awards were presented by Jim Schanz, Executive ...

03/06/2026

iodyne to Demo Multi-Cam Ingest and Color Workflows With RED and Adobe at Cine Gear Expo 2026

iodyne will exhibit at Cine Gear Expo 2026 alongside RED Digital Cinema and Adob...

03/06/2026

Women's College World Series 2026: ESPN Brings Out Cinematic Cameras, POVORA CapCams for All-Texas Championship

The broadcaster will deploy 39 cameras for game coverage, six cameras for studio...

03/06/2026

How Comcast Xfinity Made RealTime 4K, Multiview a Reality for World Cup Fans with Vito Forlenza

The 2026 FIFA World Cup is providing a great opportunity for not only Fox Sports...

03/06/2026

Sonys New PTZ Cameras Deliver 4K 60p; New STARVIS Sensor Meets Low-Light Demands

Sony Electronics is introducing the SRG-AS10, a 4K 60p compatible PTZ Auto Framing camera that uses Sony's proprietary AI to automatically recognize and tra...

03/06/2026

NBA Finals 2026: ESPN's 1080p HDR Experience Drives San Antonio Spurs vs. New York Knicks Title Series

Game Creek Video's Flagship A, B, C, and D unit will wind up its first year ...

03/06/2026

New Sponsor Spotlight: Caretta Researchs Evangelos Vrysellas Says Booming Sports-Rights Market May Be Hitting the Brakes

According to Caretta Research, the sports-rights market may be hitting the brake...

03/06/2026

Film Festival Watch: Catch These 11 Must-See Sundance Institute-Supported Films at Tribeca 2026

Retrieval still courtesy of Tribeca. By Jessica Herndon This year, the Tribeca...

03/06/2026

Beat Panner from Sound Particles

Step sequencer-style panning tool revealed Alongside their flagship self-titled sound-design platform, Sound Particles offer an array of creative effects an...

03/06/2026

Eventide to launch the H9 Harmonizer Gen 2

Now features full H90 algorithm library Eventide have announced the upcoming launch of the H9 Harmonizer Gen 2, a new and improved version of their hugely p...

03/06/2026

Aim Audio's Essence price drop

Significant discount available until 1 October 2026 Aim Audio have just announced a promotion that sees a significant discount applied to their Essence micr...

03/06/2026

Clarification: SBS's position on definitions of antisemitism

Clarification: SBS's position on definitions of antisemitism 3 June, 2026 Media releases Statement by Mandi Wicks, SBS Director of News and Current Aff...

03/06/2026

Rohde & Schwarz to supply CERTIUM advanced communications system to Memmingen Airport

Rohde & Schwarz to supply CERTIUM advanced communications system to Memmingen Ai...

03/06/2026

Clear-Com Powers Itaka Arena with Arcadia Central Station and FreeSpeak II beltpacks

eds3_5_jq(document).ready(function($) { $(#eds_sliderM519).chameleonSlider_2_1({...

03/06/2026

Comcast Xfinity to Deliver 4K Coverage of FIFA World Cup 2026

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

Sony to Show New PTZ Cameras at InfoComm 2026

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

Oklahoma City Thunder Tops Nielsen Ranking of Most Viewed NBA Teams

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

SmallHD Introduces OLED 16 4K Production Monitor

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

ANDREW WILSON DELIVERS AUTHENTIC SOUND FOR FILM USING NU...

For more than three decades, Re-recording Mixer Andrew Wilson, AMPS, CAS, has helped bring the natural world to the screen with exceptional audio enjoyed by mil...

03/06/2026

From Stadiums to Campuses to Corporate Studios- Telestrea...

Telestream, a global leader in media workflow technologies, will showcase its latest innovations for modern AV production environments at InfoComm 2026 (Booth N...

03/06/2026

DPA ADVANCES INTEGRATED AUDIO SOLUTIONS FOR PRO AV ENVIR...

DPA Microphones will present a comprehensive portfolio of integrated audio solutions designed to meet the evolving needs of today's professional AV environm...

03/06/2026

Lightware Expands Gemini GVN With Full-Featured USB-C

Lightware announces the GVN-HC-TX220AP, a new transmitter in the Gemini GVN 1G AV-over-IP family that introduces full-featured USB-C for professional 1Gb AV-ove...

03/06/2026

Minno Teams with Evergent to Enhance Faith-Based Streamin...

Evergent, the customer management and monetization leader for streaming and digital subscription businesses, and Minno, the global leader in faith-based content...