
As large language models (LLMs) grow larger, they get smarter, with open models from leading developers now featuring hundreds of billions of parameters. At the same time, today's leading models are also capable of reasoning, which means that they generate many intermediate reasoning tokens before delivering a final response to the user. The combination of these two trends-larger models that think using more tokens-drives the need for significantly higher compute performance.
Delivering the highest performance on production workloads takes a state-of-the-art technology stack-spanning chips, systems, and software-and an expansive developer ecosystem that is constantly building on that stack.
MLPerf Inference v5.1 is the latest version of the MLPerf Inference industry standard benchmark. With benchmark rounds held twice per year, the benchmark features many tests of AI inference performance and is regularly updated with new models and scenarios. This round features:
DeepSeek-R1 - a popular 671-billion parameter mixture-of-experts (MoE) reasoning model, developed by DeepSeek. In the server scenario, the time-to-first-token (TTFT) threshold is 2 seconds with a 12.5 tokens/second/user (TPS/user) target. All TPS/user targets are 99th percentile, meaning that 99% of tokens meet or exceed that TPS/user speed.
Llama 3.1 405B - MLPerf Inference v5.1 adds a new interactive scenario for the largest of the Llama 3.1 series of models, providing a faster 12.5 TPS/user threshold with a shorter 4.5 second TTFT requirement compared to the existing server scenario.
Llama 3.1 8B - an 8-billion parameter member of the Llama 3.1 series of models with offline, server (2 second TTFT, 10 TPS/user), and interactive (0.5 second TTFT, 33 TPS/user) scenarios. This replaces the GPT-J benchmark used in prior rounds.
Whisper - a popular speech recognition model that recently saw nearly 5 million downloads in a month on HuggingFace. This replaces RNN-T, which was featured in prior editions of the MLPerf Inference benchmark suite.
This round, NVIDIA submitted the first results using the new Blackwell Ultra architecture, announced in March. It came just six months after Blackwell made its debut in the available category in MLPerf Inference v5.0, setting new inference performance records. Additionally, the NVIDIA platform set new performance records on all newly added benchmarks this round-DeepSeek-R1, Llama 3.1 405B, Llama 3.1 8B, and Whisper-and continues to hold per-GPU performance records on all other MLPerf inference benchmarks.
MLPerf Inference Per-Accelerator Records
Benchmark Offline Server Interactive
DeepSeek-R1 5,842 tokens/second/GPU 2,907 tokens/second/GPU **
Llama 3.1 405B 224 tokens/second/GPU 170 tokens/second/GPU 138 tokens/second/GPU
Llama 2 70B 99.9% 12,934 tokens/second/GPU 12,701 tokens/second/GPU 7,856 tokens/second/GPU
Llama 2 70B 99% 13,015 tokens/second/GPU 12,701 tokens/second/GPU 7,856 tokens/second/GPU
Llama 3.1 8B 18,370 tokens/second/GPU 16,099 tokens/second/GPU 15,284 tokens/second/GPU
Stable Diffusion XL 4.07 samples/second/GPU 3.59 queries/second/GPU **
Mixtral 8x7B 16,099 tokens/second/GPU 16,131 tokens/second/GPU **
DLRMv2 99% 87,228 samples/second/GPU 80,515 samples/second/GPU **
DLRMv2 99.9% 48,666 samples/second/GPU 46,259 queries/second/GPU **
Whisper 5,667 tokens/second/GPU ** **
R-GAT 81,404 samples/second/GPU ** **
Retinanet 1,875 samples/second/GPU 1,801 queries/second/GPU **
Table 1. Performance records per GPU based on submissions powered by the NVIDIA platform. MLPerf Inference v5.0 and v5.1, Closed Division. Results retrieved from www.mlcommons.org on September 9, 2025. NVIDIA platform results from the following entries: 5.0-0072, 5.1-0007, 5.1-0053, 5.1-0079, 5.1-0028, 5.1-0062, 5.1-0086, 5.1-0073, 5.1-0008, 5.1-0070,5.1-0046, 5.1-0009, 5.1-0060, 5.1-0072. 5.1-0071, 5.1-0069 Per chip performance derived by dividing total throughput by number of reported chips. Per-chip performance is not a primary metric of MLPerf Inference v5.0 or v5.1.The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.
NVIDIA also made extensive use of NVFP4 acceleration across all DeepSeek-R1 and Llama model submissions using the Blackwell and Blackwell Ultra architectures.
In this post, we take a closer look at these performance results and the full-stack technologies that enabled them.
Blackwell Ultra sets reasoning records in MLPerf debut This round, NVIDIA submitted results in the available category using the GB300 NVL72 rack-scale system, the first-ever MLPerf submissions using the Blackwell Ultra architecture. Blackwell Ultra builds upon the many advances in the NVIDIA Blackwell architecture, with several key enhancements:
1.5x higher peak NVFP4 AI compute
2x higher attention-layer compute
1.5x higher HBM3e capacity
Compared to the GB200 NVL72 submission, GB300 NVL72 delivered up to 45% higher performance per GPU, setting the standard on the new DeepSeek-R1 benchmark. And compared to unverified results collected on a Hopper-based system, Blackwell Ultra delivered about 5x higher throughput per GPU-translating into significantly higher AI factory throughput and much lower cost per token.
DeepSeek-R1 Performance
Architecture Offline Server
Hopper 1,253 tokens/second/GPU 556 tokens/second/GPU
Blackwell Ultra 5,842 tokens/second/GPU 2,907 tokens/second/GPU
Blackwell Ultra Advantage 4.7x 5.2x
Table 2. Per-GPU performance on DeepSeek-R1. MLPerf Inference v5.1, Closed. Blackwell Ultra results based on results in entry 5.1-0072. Hopper results not verified by MLCommons Association. Per-GPU performance is not a primary metric of MLPerf Inference v5.1 and is calcu
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
02/05/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
01/05/2026
January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...
29/04/2026
It was a delicate job in a 150-year-old venue laden with traditions. Begun at th...
29/04/2026
In annual event, the league gives startup companies the opportunity to prove the...
29/04/2026
Panel discussions, networking, and a facility tour will take place in the renova...
29/04/2026
(L-R) Derek Drescher, Coss Marte, and Syretta Wright have each other's backs. (Micheal Hurcomb/Shutterstock for Sundance Film Festival)
By Veronika Lee Cla...
29/04/2026
Combines EQ and harmonic distortion
Techivation's latest release is a simple EQ designed to offer quick control over a source's overall tonal balanc...
29/04/2026
Two new MPE controllers announced
Expressive E caused quite a stir when they released the Osmose, making the sort of expression that was once reserved for p...
29/04/2026
New modules & enhanced machine-learning
The latest version of iZotope's flagship restoration suite is now available, and now offers over 50 tools design...
29/04/2026
Surgeon Dr Jasmina Kevric wins 2026 Les Murray Award
29 April, 2026
Media releases
Australia for UNHCR and SBS are proud to announce that Dr Jasmina Kevric...
29/04/2026
Some people stumble into their passion. Julissa Padilla walked straight into a film vault. For her, entertainment was never just about the movies themselves. It...
29/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
29/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
29/04/2026
Clear-Com has appointed Brian Grahn as Market Outreach Manager of the Americas and Ben Turnwell as Business Development Manager for EMEA live, expanding their ...
29/04/2026
nxtedition is bringing its range of consolidated production tools to MPTS 2026, with new developments spanning transcription, editing, graphics and AI-assisted ...
29/04/2026
Quortex Switch to boost the streaming experience for Telxius customers, reaching millions of viewers worldwide
Synamedia and Telxius, the leading global connec...
29/04/2026
freispace, the leading ERP-as-a-Service platform for media and entertainment production, and Projective, a leading provider of post-production collaboration tec...
29/04/2026
DHD reports strong interest in its broadcast audio product range, exhibited at the April 19th-22nd NAB Show in Las Vegas. The event attracted a claimed 58,000 a...
29/04/2026
Student Spotlight: Alan Catz The Argentine film and game composer talks about working on League of Legends, receiving Berklee's BMI Award, and the lifelon...
29/04/2026
Jay Jennings Builds the Worlds You Hear on Screen The supervising sound designer behind A Minecraft Movie, The Meg, Letters from Iwo Jima, and dozens of other...
29/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
29/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
29/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
29/04/2026
29 Apr 2026
VEON and Kyivstar Fulfill Commitment to Invest USD 1 Billion in Ukr...
29/04/2026
Rhod Gilbert, Harriet Kemsley, Kae Kurd, Sara Pascoe and Vicki Pattison to take part in brand new series on free streaming service U
London, 29th April 2026: F...
29/04/2026
Wednesday 29 April 2026
Katie Price: Nothing to Hide, a Sky Original documentar...
29/04/2026
Re-examining the case of Ellie Williams and the wider story of grooming in the town of BarrowWednesday 29 April 2026
Sky announces upcoming documentary series ...
29/04/2026
Wednesday 29 April 2026
Jennifer Garner to lead an all-star cast in new Sky Exc...
29/04/2026
Back to All News
SUPERNOVA: GENESIS Reached a Peak Audience of More Than 6.5 Mi...
29/04/2026
Students and staff from Hills Road Sixth Form College in Cambridge ran a 4.5km course around the roads of Cambridge as part of their annual programme of sustain...
29/04/2026
The Dawn Chorus airs Sunday 3 May from midnight to 7am on RT Radio 1 and RT ly...
29/04/2026
Jin-Quan Yu elected to the National Academy of Sciences Yu is recognized for his pioneering work in synthetic organic chemistry.
April 28, 2026
LA JOLLA, CA S...
28/04/2026
The audio team for the entertainment event must blend speech intelligibility with full-range music reproduction while considering the broadcast
Last week's...
28/04/2026
The Pac-12 Conference has released an updated primary mark and logo as the starting point of the new league's brand identity. The mark was soft-launched acr...
28/04/2026
The DP World Tour and Amazon Leo have signed an agreement making Amazon's lo...
28/04/2026
Pixellot and HELIOS have announced an integration that automatically converts full-game hockey video into individualized shift videos for each athlete, without ...
28/04/2026
Daktronics has partnered with the Asheville Tourists to manufacture and install a new LED video display. The installation was completed in late 2025 and is now ...
28/04/2026
Eutelsat has announced the renewal of its partnership with PCTV, a content aggregation and distribution company in Mexico and part of Megacable Holdings, for co...
28/04/2026
Daktronics has partnered with the Gary SouthShore RailCats to install a new LED video display at U.S. Steel Yard, replacing the previous Daktronics display inst...
28/04/2026
Telos Alliance and the College Radio Foundation have announced that WWSU-FM of W...
28/04/2026
Golf viewership is growing. The 2025 Ryder Cup drew five million viewers in the UK, a 45% increase over the 2023 event. The US Open was the most streamed golf e...
28/04/2026
The CW Network and WWE, part of TKO Group Holdings (NYSE: TKO), have announced t...
28/04/2026
The Alliance for IP Media Solutions (AIMS) has announced that the Internet Protocol Media Experience (IPMX) suite of standards and specifications has been named...
28/04/2026
The 2026 NAB Show is in the books and the show once again served up a cavalcade ...
28/04/2026
Gray Media and RAJ Sports have announced Rose City SportsNet (RCSN), a new netwo...
28/04/2026
Today, we announced our First Quarter 2026 earnings, starting the Year of Raising Ambition with strong momentum across the business and continued innovation acr...