
It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI.
In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM - software that speeds and simplifies the complex job of inference on large language models - boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago.
The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI.
Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM - a set of inference microservices that includes inferencing engines like TensorRT-LLM - makes it easier than ever for businesses to deploy NVIDIA's inference platform.
Raising the Bar in Generative AI TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs - the latest, memory-enhanced Hopper GPUs - delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date.
The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks.
The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark.
The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.
Memory Boost for NVIDIA Hopper GPUs NVIDIA is sampling H200 GPUs to customers today and shipping in the second quarter. They'll be available soon from nearly 20 leading system builders and cloud service providers.
H200 GPUs pack 141GB of HBM3e running at 4.8TB/s. That's 76% more memory flying 43% faster compared to H100 GPUs. These accelerators plug into the same boards and systems and use the same software as H100 GPUs.
With HBM3e memory, a single H200 GPU can run an entire Llama 2 70B model with the highest throughput, simplifying and speeding inference.
GH200 Packs Even More Memory Even more memory - up to 624GB of fast memory, including 144GB of HBM3e - is packed in NVIDIA GH200 Superchips, which combine on one module a Hopper architecture GPU and a power-efficient NVIDIA Grace CPU. NVIDIA accelerators are the first to use HBM3e memory technology.
With nearly 5 TB/second memory bandwidth, GH200 Superchips delivered standout performance, including on memory-intensive MLPerf tests such as recommender systems.
Sweeping Every MLPerf Test On a per-accelerator basis, Hopper GPUs swept every test of AI inference in the latest round of the MLPerf industry benchmarks.
In addition, NVIDIA Jetson Orin remains at the forefront in MLPerf's edge category. In the last two inference rounds, Orin ran the most diverse set of models in the category, including GPT-J and Stable Diffusion XL.
The MLPerf benchmarks cover today's most popular AI workloads and scenarios, including generative AI, recommendation systems, natural language processing, speech and computer vision. NVIDIA was the only company to submit results on every workload in the latest round and every round since MLPerf's data center inference benchmarks began in October 2020.
Continued performance gains translate into lower costs for inference, a large and growing part of the daily work for the millions of NVIDIA GPUs deployed worldwide.
Advancing What's Possible Pushing the boundaries of what's possible, NVIDIA demonstrated three innovative techniques in a special section of the benchmarks called the open division, created for testing advanced AI methods.
NVIDIA engineers used a technique called structured sparsity - a way of reducing calculations, first introduced with NVIDIA A100 Tensor Core GPUs - to deliver up to 33% speedups on inference with Llama 2.
A second open division test found inference speedups of up to 40% using pruning, a way of simplifying an AI model - in this case, an LLM - to increase inference throughput.
Finally, an optimization called DeepCache reduced the math required for inference with the Stable Diffusion XL model, accelerating performance by a whopping 74%.
All these results were run on NVIDIA H100 Tensor Core GPUs.
A Trusted Source for Users MLPerf's tests are transparent and objective, so users can rely on the results to make informed buying decisions.
NVIDIA's partners participate in MLPerf because they know it's a valuable tool for customers evaluating AI systems and services. Partners submitting results on the NVIDIA AI platform in this round included ASUS, Cisco, Dell Technologies, Fujitsu, GIGABYTE, Google, Hewlett Packard Enterprise, Lenovo, Microsoft Azure, Oracle, QCT, Supermicro, VMware (recently acquired by Broadcom) and Wiwynn.
All the software NVIDIA used in the tests is available in the MLPerf repository. These optimizations are continuously folded into containers available on NGC, NVIDIA's software hub for GPU applications, as well as NVIDIA AI Enterprise - a secure, supported platform that includes NIM inference microservices.
The Next Big Thing The use cases, model sizes and datasets for generative AI continue to expand. That's why MLPerf continues to evolve, adding real-world tests with popular models like Llama 2 70B and Stable Diffusion XL.
Keeping pace with the explosion in LLM model sizes, NVIDIA founder and CEO Jensen Huang announced last week at GTC that the NVIDIA Blackwell architecture GPUs will deliver new levels of performance required for the multitrillion-parameter AI models.
Inference for large language models is difficult, requiring both e
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
25/10/2025
LOS ANGELES As the popularity of short-for vertical videos from mobile devices continues to soar, vgames, Pitango and a group of Hollywood executives and celebr...
25/10/2025
LONDON The AI-powered VFX toolkit Slapshot has launched a professional-grade AI camera tracking tool the company said is designed to deliver precise camera sol...
25/10/2025
NEW YORK NAB Show New York said its 2025 edition wrapped up its program on Oct. 23 with 11,500 registered attendees from 95 countries, reinforcing its status as...
25/10/2025
NEW YORK Vimeo said it rolled out new AI-powered features and creative tools that it said will make professional video production faster, smarter and more rewar...
25/10/2025
HOUSTON Regional sports network Space City Home Network has upgraded its audio control room with a Solid State Logic System T S300-32 audio console as part of t...
25/10/2025
BAFTA-nominated cinematographer Annemarie Lean-Vercoe ( Breeders , Call the Midwife , Murder in Provence ) was just the DoP to set the look on sophisticated a...
25/10/2025
OpenDrives, Inc, a leading provider of software-defined data storage and data services, today announced a new distribution partnership with Versatile Distributi...
25/10/2025
Chaos today announces the release of Chaos Vantage 3, the first major update to its real-time visualization platform in more than two years. With Vantage, AEC p...
25/10/2025
Frequency, the engine behind many of the world's leading streaming television channels, today announced that it powered the first-ever delayed-live broadcas...
25/10/2025
LTN is accelerating the digital evolution of US local broadcasters with innovations that enable stations to launch streaming channels faster, deliver live news ...
25/10/2025
Rise WIB and Rise AV, advocacy groups championing gender diversity and professional development in the broadcast and AV sectors, have announced key leadership u...
25/10/2025
European technology developer Profuz Digital is proud to announce its partnership with the Atlantic Club of Bulgaria as a Special Technical Partner. To overcome...
25/10/2025
European cultural broadcaster ARTE has strengthened its long-standing relationship with Grass Valley, selecting the company's LDX 135 cameras and Creative G...
25/10/2025
IBC today announced the official call for challenge submissions to the Accelerator Media Innovation Programme 2026, inviting forward-thinking organisations from...
25/10/2025
ASB GlassFloor, the Germany-based global leader in high-performance sports flooring, announces the official launch of ASB Arena and Event Services AG (AES), a s...
25/10/2025
Bitmovin, leading provider of video streaming solutions, has announced that its internal playback stream testing system for the Bitmovin Player now leverages AI...
25/10/2025
Envoi, multi-cloud data management and data protection solutions provider, has launched a new solution, Envoi Express Lane, for managing the demands of distribu...
25/10/2025
Accedo, a global provider of video streaming software and services, has supported FloSports, a leading sports media company, to expand its service to Samsung an...
25/10/2025
Spanish-Language Music Production Course Debuts at Berklee Online New 12-week online course expands access to Berklee's renowned music curriculum for Span...
24/10/2025
NEP CEO Martin Stewart on $700M Investment, Restructuring, and the Challenges Fa...
24/10/2025
FOX Sports Debuts Next-Gen Graphics, Celebrates Career of Lead Producer Pete Mac...
24/10/2025
GROUP MEDIAPRO Chairman and CEO Tatxo Benet Steps DownBy Ken Kerschbaumer, Editorial Director
Friday, October 24, 2025 - 2:37 pm
Print This Story | Subscri...
24/10/2025
NBA Tip-Off: Amazon Prime Video Debuts Cutting-Edge Studio, Mobile Units, Global...
24/10/2025
(L-R) Director Justin Lin with his cast and producers at Eccles Theatre for the premiere of Last Days in Park City. (Photo by George Pimentel/Shutterstock for...
24/10/2025
As global connectivity demands continue to grow, non-terrestrial networks (NTNs) are emerging as a transformative force in telecommunications. By extending cove...
24/10/2025
Warsaw - Poland, October 20, 2025 - Nielsen, a global leader in audience measurement, data and analytics, has published its latest All Screens Video Landscape r...
24/10/2025
Springsteen: Deliver Me from Nowhere Filmed at Berklee NYCs Power Station The biopic, starring Jeremy Allen White as the Boss, focuses on the period when Spri...
24/10/2025
TORONTO Sometimes in sports, as in life, it's the little things that matter, and that aphorism will be on full display tonight when the Toronto Blue Jays ta...
24/10/2025
NEW YORK Charters Spectrum Reach has announced that its clients have used Waymark's AI-driven ad creation platform to create more than 15,000 ads since Spec...
24/10/2025
BURLINGTON, Mass. Avid has today announced the release of Pro Tools 2025.10, a feature-rich update that the company said offers notable advances in immersive mu...
24/10/2025
NEW YORK In a major change for the ad industry, Comcast Advertising will unveil technology that enables agencies and brands to buy targetable, biddable ads on l...
24/10/2025
WASHINGTON The ATSC broadcast standards group has outlined a growing list of international activities that the group said is expanding its influence and solidif...
24/10/2025
FIRST PLACE AND 5,000 LYNDA McCARTHY FOR WITNESS'
SECOND PLACE AND 4,000 ANGELA FINN FOR A SPECTRUM OF SORROW'
THIRD PLACE AND 3,000 IAN FE...
24/10/2025
24 Oct 2025
VEON to Release 3Q25 Earnings Update on November 10, 2025 Dubai, October 24, 2025 - VEON Ltd. (NASDAQ: VEON), a global digital operator, today conf...
24/10/2025
One-off special from the team behind BAFTA award-winning Libby, Are You Home Yet...
24/10/2025
The review examined how the model is developed, managed, and delivered against the requirements set out in the Origin framework.
Simon Redlich, Chief Executive...
24/10/2025
Countdown to GTC Washington, DC: What to Watch Next Week Next week, Washington, D.C., becomes the center of gravity for artificial intelligence. NVIDIA GTC W...
24/10/2025
RT will provide extensive coverage of the results of the Presidential Election across television, radio and online on Saturday, 25 October 2025.
Throughout th...
24/10/2025
New Coaches, New Families and New Challenges Set for Ireland's Fittest Famil...
24/10/2025
Westlife, Imelda May and Ben Elton among the guests on this week's Late Late...
23/10/2025
Unlocking character: Sportcast on executing the Bundesliga and Bundesliga 2 new ...
23/10/2025
Clear coordination: Juggling the new Bundesliga rights cycle requirements and pu...
23/10/2025
Analysis: Is piracy just the cost of doing business? By Callum McCarthy, Editor-at-Large
Tuesday, October 21, 2025 - 09:58
Print This Story
It's high ...
23/10/2025
ESPN's Adam Whitlock on Driving Real-World Innovation Across the Video-Trans...
23/10/2025
SVG TranSPORT 2025 Unites 300+ Industry Leaders in New York for Deep Dive Into L...
23/10/2025
NBA Tip-Off: League Starts Season With Two New Broadcast Partners, In-House NBA ...
23/10/2025
NFL Deepens Business Partnership with EA Sports; More Madden Casts to Come?EA Sports will remain the exclusive producer and distributor of Madden NFL video game...
23/10/2025
NFL Moves Pro Bowl Games Indoors and to Super Bowl Week; Leans Into a Made-for-T...
23/10/2025
By Alan Dominguez
Recently I have been thinking about the intersection of two e...