
Inference performance is critical, as it directly influences the economics of an AI factory. The higher the throughput of AI factory infrastructure, the more tokens it can produce at a high speed - increasing revenue, driving down total cost of ownership (TCO) and enhancing the system's overall productivity.
Less than half a year since its debut at NVIDIA GTC, the NVIDIA GB300 NVL72 rack-scale system - powered by the NVIDIA Blackwell Ultra architecture - set records on the new reasoning inference benchmark in MLPerf Inference v5.1, delivering up to 45% more DeepSeek-R1 inference throughput compared with NVIDIA Blackwell-based GB200 NVL72 systems.
Blackwell Ultra builds on the success of the Blackwell architecture, with the Blackwell Ultra architecture featuring 1.5x more NVFP4 AI compute and 2x more attention-layer acceleration than Blackwell, as well as up to 288GB of HBM3e memory per GPU.
The NVIDIA platform also set performance records on all new data center benchmarks added to the MLPerf Inference v5.1 suite - including DeepSeek-R1, Llama 3.1 405B Interactive, Llama 3.1 8B and Whisper - while continuing to hold per-GPU records on every MLPerf data center benchmark.
Stacking It All Up Full-stack co-design plays an important role in delivering these latest benchmark results. Blackwell and Blackwell Ultra incorporate hardware acceleration for the NVFP4 data format - an NVIDIA-designed 4-bit floating point format that provides better accuracy compared with other FP4 formats, as well as comparable accuracy to higher-precision formats.
NVIDIA TensorRT Model Optimizer software quantized DeepSeek-R1, Llama 3.1 405B, Llama 2 70B and Llama 3.1 8B to NVFP4. In concert with the open-source NVIDIA TensorRT-LLM library, this optimization enabled Blackwell and Blackwell Ultra to deliver higher performance while meeting strict accuracy requirements in submissions.
Large language model inference consists of two workloads with distinct execution characteristics: 1) context for processing user input to produce the first output token and 2) generation to produce all subsequent output tokens.
A technique called disaggregated serving splits context and generation tasks so each part can be optimized independently for best overall throughput. This technique was key to record-setting performance on the Llama 3.1 405B Interactive benchmark, helping to deliver a nearly 50% increase in performance per GPU with GB200 NVL72 systems compared with each Blackwell GPU in an NVIDIA DGX B200 server running the benchmark with traditional serving.
NVIDIA also made its first submissions this round using the NVIDIA Dynamo inference framework.
NVIDIA partners - including cloud service providers and server makers - submitted great results using the NVIDIA Blackwell and/or Hopper platform. These partners include Azure, Broadcom, Cisco, CoreWeave, Dell Technologies, Giga Computing, HPE, Lambda, Lenovo, Nebius, Oracle, Quanta Cloud Technology, Supermicro and the University of Florida.
The market-leading inference performance on the NVIDIA AI platform is available from major cloud providers and server makers. This translates to lower TCO and enhanced return on investment for organizations deploying sophisticated AI applications.
Learn more about these full-stack technologies by reading the NVIDIA Technical Blog on MLPerf Inference v5.1. Plus, visit the NVIDIA DGX Cloud Performance Explorer to learn more about NVIDIA performance, model TCO and generate custom reports.
Most recent headlines
17/12/2025
Investigative journalists across the Western Balkans and T rkiye continue to con...
17/12/2025
Sports Broadcasting Hall of Fame Inducts 10 Industry Icons During Unforgettable ...
17/12/2025
ESPN to Debut MNF Playbook with Next Gen Stats, a New AI-Driven NFL Data-AltCastThe series, powered by Adrenaline TruPlay AI, launches Dec. 22 and runs through ...
17/12/2025
Inaugural Optum Golf Channel Games Debut Under the Lights' in Primetime on ...
17/12/2025
The right playlist is essential on New Year's Eve, building the energy as you get ready and keeping it high as you count down to midnight. This year, Spotif...
17/12/2025
eds3_5_jq(document).ready(function($) { $(#eds_sliderM519).chameleonSlider_2_1({...
17/12/2025
Audiences Watched Over 103 Billion Minutes of TV on Thanksgiving Day
NFL Games ...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
17/12/2025
KPop Demon Hunters Stars Visit Berklee for Weeklong Celebration Andrew Choi and EJAE, who voiced the film's main characters and contributed to its soundtr...
17/12/2025
Wednesday 17 December 2025
Heated Rivalry will be coming to Sky and streaming service NOW on 10 JanuaryTurn on cookies to view this content. Go to Privacy opti...
17/12/2025
Back to All News
Inside The Unseen World Of Indian Customs: Netflix Reveals The...
17/12/2025
Back to All News
Netflix announces PAPARAZZI KING: the docu series coming to Ne...
17/12/2025
Back to All News
Netflix Unveils First Look at Jo Nesbo's Detective Hole Pr...
17/12/2025
Back to All News
Netflix Welcomes Warner Bros. Discovery Board Recommendation
Business
17 December 2025
Global
Link copied to clipboard
After Careful Revi...
17/12/2025
RT has announced that Kathy Fox has been appointed Commissioning Editor with re...
17/12/2025
The Hao AI Lab research team at the University of California San Diego - at the forefront of pioneering AI model innovation - recently received an NVIDIA DGX B...
17/12/2025
Editor's note: This post is part of Into the Omniverse, a series focused on ...
17/12/2025
With the new season of Dancing with the Stars shimmering in the not-too-distant future this New Year, the celebrity and dancer pairings of the twelve couples ha...
16/12/2025
Hawkins has landed on Spotify, just in time for Stranger Things Season 5, Volume...
16/12/2025
Wherever you are, your favorite music and audio content should go seamlessly with you. That's why Spotify has partnered with NAVER Corp, Korea's leading...
16/12/2025
2025 Wrapped arrived bigger and bolder than ever. This year's experience is designed to be ultra personal and shareable, with new features like Wrapped Part...
16/12/2025
Three 12-kilowatt Advanced Electric Propulsion System thrusters, supplied by L3Harris Technologies, form the core of Gateway's propulsion system. Pictured i...
16/12/2025
The challenge facing America's defense industrial base is not just about speed - its about rebuilding the foundation that makes speed possible. Our nations ...
16/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
16/12/2025
SEVILLE, Spain Canal Sur, the public broadcasting service for Andalusia, Spain, has completed a total technology refresh based on Pebble's resilient, softwa...
16/12/2025
NEW YORK Teleprompting hardware provider Telescript International has acquired all software code and intellectual property previously owned by Telescript West. ...
16/12/2025
As cable operators face increased competition from 5G fixed wireless access providers, a new report from Ookla Research finds that T-Mobile is the FWA speed lea...
16/12/2025
Apple has announced a major upgrade to the Apple TV app for device owners outside the Apple ecosystem with news that the Apple TV app for Android now supports G...
16/12/2025
Space42 grows Direct-to-Device partner ecosystem through a Memorandum of Underst...
16/12/2025
16 Dec 2025
VEON Announces Release Date for Full Year and Fourth Quarter 2025 R...
16/12/2025
16 Dec 2025
VEON's Kyivstar Invests in Renewable Energy in Ukraine with Acq...
16/12/2025
Back to All News
Emma Appleton, Fares Fares, Frida Gustavsson and Jakob Oftebro...
16/12/2025
Back to All News
Docu-reality My Korean Boyfriend Gets a Trailer and Premiere D...
16/12/2025
Harmonic's XOS Advanced Media Processor Improves Streaming Video Quality and Boosts Viewer Engagement SAN JOSE, Calif. - Dec. 16, 2025 - Harmonic (NASDAQ: ...
16/12/2025
RT Sport Awards 2025 live on RT One and RT Player at 8:05pm on Saturday 20 December.
On Saturday 20 December live on RT One and RT Player at the earlier t...
16/12/2025
Singer -songwriter Brian Kennedy has been announced as the final celebrity dance...
15/12/2025
Harlem Globetrotters Celebrate 100th Anniversary With New Brand Campaign From Th...
15/12/2025
Top L-R: La Tierra Del Valor (The Home of the Brave), Mangittatuarjuk (The Gnawe...
15/12/2025
L3Harris will leverage 15 years of experience supporting the E-4B Nightwatch and...