
As large language models (LLMs) grow larger, they get smarter, with open models from leading developers now featuring hundreds of billions of parameters. At the same time, today's leading models are also capable of reasoning, which means that they generate many intermediate reasoning tokens before delivering a final response to the user. The combination of these two trends-larger models that think using more tokens-drives the need for significantly higher compute performance.
Delivering the highest performance on production workloads takes a state-of-the-art technology stack-spanning chips, systems, and software-and an expansive developer ecosystem that is constantly building on that stack.
MLPerf Inference v5.1 is the latest version of the MLPerf Inference industry standard benchmark. With benchmark rounds held twice per year, the benchmark features many tests of AI inference performance and is regularly updated with new models and scenarios. This round features:
DeepSeek-R1 - a popular 671-billion parameter mixture-of-experts (MoE) reasoning model, developed by DeepSeek. In the server scenario, the time-to-first-token (TTFT) threshold is 2 seconds with a 12.5 tokens/second/user (TPS/user) target. All TPS/user targets are 99th percentile, meaning that 99% of tokens meet or exceed that TPS/user speed.
Llama 3.1 405B - MLPerf Inference v5.1 adds a new interactive scenario for the largest of the Llama 3.1 series of models, providing a faster 12.5 TPS/user threshold with a shorter 4.5 second TTFT requirement compared to the existing server scenario.
Llama 3.1 8B - an 8-billion parameter member of the Llama 3.1 series of models with offline, server (2 second TTFT, 10 TPS/user), and interactive (0.5 second TTFT, 33 TPS/user) scenarios. This replaces the GPT-J benchmark used in prior rounds.
Whisper - a popular speech recognition model that recently saw nearly 5 million downloads in a month on HuggingFace. This replaces RNN-T, which was featured in prior editions of the MLPerf Inference benchmark suite.
This round, NVIDIA submitted the first results using the new Blackwell Ultra architecture, announced in March. It came just six months after Blackwell made its debut in the available category in MLPerf Inference v5.0, setting new inference performance records. Additionally, the NVIDIA platform set new performance records on all newly added benchmarks this round-DeepSeek-R1, Llama 3.1 405B, Llama 3.1 8B, and Whisper-and continues to hold per-GPU performance records on all other MLPerf inference benchmarks.
MLPerf Inference Per-Accelerator Records
Benchmark Offline Server Interactive
DeepSeek-R1 5,842 tokens/second/GPU 2,907 tokens/second/GPU **
Llama 3.1 405B 224 tokens/second/GPU 170 tokens/second/GPU 138 tokens/second/GPU
Llama 2 70B 99.9% 12,934 tokens/second/GPU 12,701 tokens/second/GPU 7,856 tokens/second/GPU
Llama 2 70B 99% 13,015 tokens/second/GPU 12,701 tokens/second/GPU 7,856 tokens/second/GPU
Llama 3.1 8B 18,370 tokens/second/GPU 16,099 tokens/second/GPU 15,284 tokens/second/GPU
Stable Diffusion XL 4.07 samples/second/GPU 3.59 queries/second/GPU **
Mixtral 8x7B 16,099 tokens/second/GPU 16,131 tokens/second/GPU **
DLRMv2 99% 87,228 samples/second/GPU 80,515 samples/second/GPU **
DLRMv2 99.9% 48,666 samples/second/GPU 46,259 queries/second/GPU **
Whisper 5,667 tokens/second/GPU ** **
R-GAT 81,404 samples/second/GPU ** **
Retinanet 1,875 samples/second/GPU 1,801 queries/second/GPU **
Table 1. Performance records per GPU based on submissions powered by the NVIDIA platform. MLPerf Inference v5.0 and v5.1, Closed Division. Results retrieved from www.mlcommons.org on September 9, 2025. NVIDIA platform results from the following entries: 5.0-0072, 5.1-0007, 5.1-0053, 5.1-0079, 5.1-0028, 5.1-0062, 5.1-0086, 5.1-0073, 5.1-0008, 5.1-0070,5.1-0046, 5.1-0009, 5.1-0060, 5.1-0072. 5.1-0071, 5.1-0069 Per chip performance derived by dividing total throughput by number of reported chips. Per-chip performance is not a primary metric of MLPerf Inference v5.0 or v5.1.The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.
NVIDIA also made extensive use of NVFP4 acceleration across all DeepSeek-R1 and Llama model submissions using the Blackwell and Blackwell Ultra architectures.
In this post, we take a closer look at these performance results and the full-stack technologies that enabled them.
Blackwell Ultra sets reasoning records in MLPerf debut This round, NVIDIA submitted results in the available category using the GB300 NVL72 rack-scale system, the first-ever MLPerf submissions using the Blackwell Ultra architecture. Blackwell Ultra builds upon the many advances in the NVIDIA Blackwell architecture, with several key enhancements:
1.5x higher peak NVFP4 AI compute
2x higher attention-layer compute
1.5x higher HBM3e capacity
Compared to the GB200 NVL72 submission, GB300 NVL72 delivered up to 45% higher performance per GPU, setting the standard on the new DeepSeek-R1 benchmark. And compared to unverified results collected on a Hopper-based system, Blackwell Ultra delivered about 5x higher throughput per GPU-translating into significantly higher AI factory throughput and much lower cost per token.
DeepSeek-R1 Performance
Architecture Offline Server
Hopper 1,253 tokens/second/GPU 556 tokens/second/GPU
Blackwell Ultra 5,842 tokens/second/GPU 2,907 tokens/second/GPU
Blackwell Ultra Advantage 4.7x 5.2x
Table 2. Per-GPU performance on DeepSeek-R1. MLPerf Inference v5.1, Closed. Blackwell Ultra results based on results in entry 5.1-0072. Hopper results not verified by MLCommons Association. Per-GPU performance is not a primary metric of MLPerf Inference v5.1 and is calcu
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
02/05/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
01/05/2026
January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...
01/04/2026
January 4 2026, 18:00 (PST) DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION
Douyin Users Can Now Create And Share Videos With Stun...
15/03/2026
Visit ToolsOnAir at NAB Las Vegas 2026
More Details:From April 19-22, join us at NAB Show Las Vegas in the North Hall, Booth N1258, for an exclusive preview of...
15/03/2026
Latest dark drama, thrillers & tension library announced
The Very Loud Indeed Co.'s latest Kontakt library has just arrived, delivering a third instalme...
15/03/2026
Johannesburg, 14 March 2026 - On 13 and 14 March 2026, the 19th Annual South Afr...
15/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/03/2026
Combines mic, USB interface & wireless IEMs
Following a successful Kickstarter campaign, HISONG have announced that their innovative AirStudio S1 device is ...
14/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/03/2026
Yospace, the trusted leader in Dynamic Ad Insertion (DAI), stitched 5.4 billion one-to-one addressable OTT advertisements across the 17 days of Milano Cortina 2...
14/03/2026
Telestream Advances Production-Ready AI Across Its Product Portfolio
Brie Clayton March 13, 2026
0 Comments
New AI capabilities drive smarter automati...
14/03/2026
Kraken Graded in DaVinci Resolve Studio
Brie Clayton March 13, 2026
0 Comments
Senior Colorist Dylan Hopkin delivers the first Scandinavian feature in...
14/03/2026
Tedial Powers the Future of Media Operations at NAB Show 2026
Brie Clayton March 13, 2026
0 Comments
Transforming Media Through Intelligence, Flexibil...
13/03/2026
Recently named CEO Andreas Eriksson has taken the helm at Net Insight at a pivot...
13/03/2026
Scripps Sports and Ally Financial are partnering with the Professional Women's Hockey League (PWHL) to broadcast its first game on national linear televisio...
13/03/2026
Disney+ has launched Verts, a vertical video feed on its U.S. mobile app, markin...
13/03/2026
LTN, a managed IP video transport company, and Appear, a live production technol...
13/03/2026
The Professional Fighters League (PFL) has announced an agreement with Sportradar for global betting data and streaming rights. Under the deal, Sportradar becom...
13/03/2026
In-venue and creative video staffers at the professional and collegiate level ha...
13/03/2026
The streamer will be the first entertainment platform to offer AI-enabled vertical video for live games, starting with the NBA...
13/03/2026
Ease Live, an Evertz company specializing in interactive graphical overlays, has deployed its platform on Red Bull TV for Premier Padel coverage. The deployment...
13/03/2026
Monday Night Football, ESPN's premiere NFL property, has continued to be improved and upgraded from a production perspective. Alternative broadcasts are aug...
13/03/2026
At NAB Show 2026, Net Insight (booth W1653) is introducing the Nimbra 520, a high-density media processing node for live contribution and distribution across ma...
13/03/2026
Harmonic (booth W2831) has announced Spectrum X Plus, the newest generation of its Spectrum X media server, offering double the channel density of previous gene...
13/03/2026
Riedel Communications has announced the expansion of its Managed Technology Divi...
13/03/2026
Telestream (booth W1503) has announced the expansion of Telestream Cloud Services with the introduction of UP, a cloud-native solution for ingest, orchestration...
13/03/2026
From awards ceremonies and sports honors shows to festivals and fan conventions,...
13/03/2026
Overtime has announced a partnership with Metro by T-Mobile, naming Metro the Of...
13/03/2026
At NAB 2026, Calrec (booth C6907) will IP Ecosystem powered by True Control 2.0, integrating the company's IP-native Argo consoles - including the U.S. debu...
13/03/2026
Ratings Roundup is a rundown of recent ratings news derived from press releases ...
13/03/2026
Spotify has always been built around your taste. More than 80% of listeners say personalization is what they love most about us. Now we're taking that even ...
13/03/2026
The new Spotify Legends Club has opened its doors. Its members: select German-sp...
13/03/2026
Pushing drum sampler technology into new territories
The latest version of Klevgrand's software drum sampler has just arrived, boasting a newly designe...
13/03/2026
Expanded headphone support & engine improvements
IK Multimedia's recently introduced ARC On-Ear system brings the power of their monitoring-correction s...
13/03/2026
Extra sound collections, more presets & new Keys category
UVI's rhythm and pattern instrument has just received a major update that introduces four new ...
13/03/2026
Over a year ago, L3Harris delivered the first missionized Bombardier Global 6500 aircraft for U.S. Indo-Pacific Command. Two ATHENA-R platforms now average 400+...
13/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/03/2026
Harmonic (NASDAQ: HLIT) today announced Spectrum X Plus, the newest generation of its Spectrum X media server, offering double the channel density of previous ...
13/03/2026
When Ewing Covenant Church made the decision to return to its original, historic building, affectionately called 1867 Sanctuary for weekly worship, the congre...
13/03/2026
Marshall Electronics introduces its first all-IP 4K POV camera, the CV574-WP, at NAB 2026 (Booth C8339). The CV574-WP supports NDI |HX, providing ultra-efficien...
13/03/2026
At NAB Show 2026, Net Insight introduces Nimbra 520, a high-density media processing node designed to simplify live contribution and distribution across both ma...
13/03/2026
Abandon Editorial Signs With Micha l Dimitri for West Coast Representation
Brie Clayton March 12, 2026
0 Comments
Abandon Editorial is excited to part...
13/03/2026
Documentary The Bulldogs Shot and Edited with Blackmagic Design
Brie Clayton March 12, 2026
0 Comments
Editorial tools helped shape film in real time,...
13/03/2026
AE Captions as Fast as CapCut - No Plugins
Graham Quince March 12, 2026
0 Comments
Stop wasting hours clicking through nested compositions and manuall...
13/03/2026
New Music USA and Berklee Institute of Jazz and Gender Justice Announce 2026 Nex...
13/03/2026
What We're Doing to Support Authentic Content and Conversations on LinkedIn Published on Mar 13, 2026 Categories: Product News
LinkedIn Corporate Commu...
13/03/2026
13 Mar 2026
VEON Delivers Record Digital Growth: 4Q25 Digital Revenues Grow 84%...