Sony Pixel Power calrec Sony

NVIDIA Blackwell Ultra Sets New Inference Records in MLPerf Debut

09/09/2025

As large language models (LLMs) grow larger, they get smarter, with open models from leading developers now featuring hundreds of billions of parameters. At the same time, today's leading models are also capable of reasoning, which means that they generate many intermediate reasoning tokens before delivering a final response to the user. The combination of these two trends-larger models that think using more tokens-drives the need for significantly higher compute performance.

Delivering the highest performance on production workloads takes a state-of-the-art technology stack-spanning chips, systems, and software-and an expansive developer ecosystem that is constantly building on that stack.

MLPerf Inference v5.1 is the latest version of the MLPerf Inference industry standard benchmark. With benchmark rounds held twice per year, the benchmark features many tests of AI inference performance and is regularly updated with new models and scenarios. This round features:

DeepSeek-R1 - a popular 671-billion parameter mixture-of-experts (MoE) reasoning model, developed by DeepSeek. In the server scenario, the time-to-first-token (TTFT) threshold is 2 seconds with a 12.5 tokens/second/user (TPS/user) target. All TPS/user targets are 99th percentile, meaning that 99% of tokens meet or exceed that TPS/user speed.

Llama 3.1 405B - MLPerf Inference v5.1 adds a new interactive scenario for the largest of the Llama 3.1 series of models, providing a faster 12.5 TPS/user threshold with a shorter 4.5 second TTFT requirement compared to the existing server scenario.

Llama 3.1 8B - an 8-billion parameter member of the Llama 3.1 series of models with offline, server (2 second TTFT, 10 TPS/user), and interactive (0.5 second TTFT, 33 TPS/user) scenarios. This replaces the GPT-J benchmark used in prior rounds.

Whisper - a popular speech recognition model that recently saw nearly 5 million downloads in a month on HuggingFace. This replaces RNN-T, which was featured in prior editions of the MLPerf Inference benchmark suite.

This round, NVIDIA submitted the first results using the new Blackwell Ultra architecture, announced in March. It came just six months after Blackwell made its debut in the available category in MLPerf Inference v5.0, setting new inference performance records. Additionally, the NVIDIA platform set new performance records on all newly added benchmarks this round-DeepSeek-R1, Llama 3.1 405B, Llama 3.1 8B, and Whisper-and continues to hold per-GPU performance records on all other MLPerf inference benchmarks.

MLPerf Inference Per-Accelerator Records

Benchmark Offline Server Interactive

DeepSeek-R1 5,842 tokens/second/GPU 2,907 tokens/second/GPU **

Llama 3.1 405B 224 tokens/second/GPU 170 tokens/second/GPU 138 tokens/second/GPU

Llama 2 70B 99.9% 12,934 tokens/second/GPU 12,701 tokens/second/GPU 7,856 tokens/second/GPU

Llama 2 70B 99% 13,015 tokens/second/GPU 12,701 tokens/second/GPU 7,856 tokens/second/GPU

Llama 3.1 8B 18,370 tokens/second/GPU 16,099 tokens/second/GPU 15,284 tokens/second/GPU

Stable Diffusion XL 4.07 samples/second/GPU 3.59 queries/second/GPU **

Mixtral 8x7B 16,099 tokens/second/GPU 16,131 tokens/second/GPU **

DLRMv2 99% 87,228 samples/second/GPU 80,515 samples/second/GPU **

DLRMv2 99.9% 48,666 samples/second/GPU 46,259 queries/second/GPU **

Whisper 5,667 tokens/second/GPU ** **

R-GAT 81,404 samples/second/GPU ** **

Retinanet 1,875 samples/second/GPU 1,801 queries/second/GPU **

Table 1. Performance records per GPU based on submissions powered by the NVIDIA platform. MLPerf Inference v5.0 and v5.1, Closed Division. Results retrieved from www.mlcommons.org on September 9, 2025. NVIDIA platform results from the following entries: 5.0-0072, 5.1-0007, 5.1-0053, 5.1-0079, 5.1-0028, 5.1-0062, 5.1-0086, 5.1-0073, 5.1-0008, 5.1-0070,5.1-0046, 5.1-0009, 5.1-0060, 5.1-0072. 5.1-0071, 5.1-0069 Per chip performance derived by dividing total throughput by number of reported chips. Per-chip performance is not a primary metric of MLPerf Inference v5.0 or v5.1.The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

NVIDIA also made extensive use of NVFP4 acceleration across all DeepSeek-R1 and Llama model submissions using the Blackwell and Blackwell Ultra architectures.

In this post, we take a closer look at these performance results and the full-stack technologies that enabled them.

Blackwell Ultra sets reasoning records in MLPerf debut This round, NVIDIA submitted results in the available category using the GB300 NVL72 rack-scale system, the first-ever MLPerf submissions using the Blackwell Ultra architecture. Blackwell Ultra builds upon the many advances in the NVIDIA Blackwell architecture, with several key enhancements:

1.5x higher peak NVFP4 AI compute

2x higher attention-layer compute

1.5x higher HBM3e capacity

Compared to the GB200 NVL72 submission, GB300 NVL72 delivered up to 45% higher performance per GPU, setting the standard on the new DeepSeek-R1 benchmark. And compared to unverified results collected on a Hopper-based system, Blackwell Ultra delivered about 5x higher throughput per GPU-translating into significantly higher AI factory throughput and much lower cost per token.

DeepSeek-R1 Performance

Architecture Offline Server

Hopper 1,253 tokens/second/GPU 556 tokens/second/GPU

Blackwell Ultra 5,842 tokens/second/GPU 2,907 tokens/second/GPU

Blackwell Ultra Advantage 4.7x 5.2x

Table 2. Per-GPU performance on DeepSeek-R1. MLPerf Inference v5.1, Closed. Blackwell Ultra results based on results in entry 5.1-0072. Hopper results not verified by MLCommons Association. Per-GPU performance is not a primary metric of MLPerf Inference v5.1 and is calcu
LINK: https://developer.nvidia.com/blog/nvidia-blackwell-ultra-sets-new-infe...
See more stories from nvidia

More from Nvidia

10/09/2025

Paint It Blackwell: GeForce RTX 5080 SuperPOD Rollout Begins

GeForce NOW Blackwell RTX 5080-class SuperPODs are now rolling out, unlocking a new level of ultra high-performance, cinematic cloud gaming. GeForce NOW Ultima...

09/09/2025

NVIDIA Rubin CPX Accelerates Inference Performance and Efficiency for 1M+ Token Context Workloads

Inference has emerged as the new frontier of complexity in AI. Modern models are...

09/09/2025

NVIDIA Blackwell Ultra Sets New Inference Records in MLPerf Debut

As large language models (LLMs) grow larger, they get smarter, with open models from leading developers now featuring hundreds of billions of parameters. At the...

09/09/2025

NVIDIA Partners With AI Infrastructure Ecosystem to Unveil Reference Design for Giga-Scale AI Factories

At this week's AI Infrastructure Summit in Silicon Valley, NVIDIA's VP o...

09/09/2025

NVIDIA Blackwell Ultra Sets the Bar in New MLPerf Inference Benchmark

Inference performance is critical, as it directly influences the economics of an AI factory. The higher the throughput of AI factory infrastructure, the more to...

09/09/2025

Safety First, Always,' NVIDIA VP of Automotive Says, Unveiling the Future of AI-Defined Vehicles at IAA Mobility

At this week's IAA Mobility conference in Munich, NVIDIA Vice President of A...

09/09/2025

Get Started Using Generative AI for Content Creation With ComfyUI and NVIDIA RTX AI PCs

ComfyUI - an open-source, node-based graphical interface for running and buildin...

04/09/2025

NVIDIA Pledges AI Education Funding for K-12 Programs

NVIDIA today announced new AI education support for K-12 programs at a White House event to celebrate public-private partnerships that advance artificial intell...

04/09/2025

AI On: 6 Ways AI Agents Are Raising Team Performance - and How to Measure It

Editor's note: This post is part of the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copi...

04/09/2025

Cloud Gaming to Reach New Heights: GeForce NOW's Blackwell RTX Upgrade Begins Next Week

NVIDIA Blackwell RTX is coming to the cloud on Wednesday, Sept. 10 - an upgrade ...

03/09/2025

Scene It to Believe It: Populate 3D Worlds Quickly With NVIDIA AI Blueprints

3D artists are constantly prototyping. In traditional workflows, modelers must build placeholder, low-fidelity assets to populate 3D scenes, tinkering and adju...

02/09/2025

It's the Humidity: How International Researchers in Poland, Deep Learning and NVIDIA GPUs Could Change the Forecast

For more than a century, meteorologists have chased storms with chalkboards, equ...

28/08/2025

Drop Into the Battle: Gears of War: Reloaded Unleashed' Launches on GeForce NOW

Brace yourself, COGs - the Locusts aren't the only thing rising up. The Coal...

28/08/2025

Game On: How Modders Reimagine Classic Games With NVIDIA RTX Remix and Generative AI

Last week at Gamescom, NVIDIA announced the winners of the NVIDIA and ModDB RTX ...

27/08/2025

How Do You Teach an AI Model to Reason? With Humans

AI models are advancing at a rapid rate and scale. But what might they lack that (most) humans don't? Common sense: an understanding, developed through rea...

25/08/2025

NVIDIA Jetson Thor Unlocks Real-Time Reasoning for General Robotics and Physical AI

Robots around the world are about to get a lot smarter as physical AI developers...

25/08/2025

Take It for a Spin: NVIDIA Rolls Out DRIVE AGX Thor Developer Kit to World's Automotive Developers

As autonomous vehicle systems rapidly grow in complexity, equipped with reasonin...

22/08/2025

Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era

As the latest member of the NVIDIA Blackwell architecture family, the NVIDIA Blackwell Ultra GPU builds on core innovations to accelerate training and AI reason...

22/08/2025

Hot Topics at Hot Chips: Inference, Networking, AI Innovation at Every Scale - All Built on NVIDIA

AI reasoning, inference and networking will be top of mind for attendees of next...

21/08/2025

RIKEN, Japan's Leading Science Institute, Taps Fujitsu and NVIDIA for Next Flagship Supercomputer

Japan is once again building a landmark high-performance computing system - not ...

21/08/2025

Think SMART: How to Optimize AI Factory Inference Performance

From AI assistants doing deep research to autonomous vehicles making split-second navigation decisions, AI adoption is exploding across industries. Behind ever...

21/08/2025

Gearing Up for the Gigawatt Data Center Age

Across the globe, AI factories are rising - massive new data centers built not to serve up web pages or email, but to train and deploy intelligence itself. Inte...

21/08/2025

GeForce NOW Brings RTX 5080 Power to the Ultimate Membership

Get a glimpse into the future of gaming. The NVIDIA Blackwell RTX architecture is coming to GeForce NOW in September, marking the service's biggest upgrade...

20/08/2025

Into the Omniverse: How OpenUSD and Digital Twins Are Powering Industrial and Physical AI

Editor's note: This blog is a part of Into the Omniverse, a series focused o...

18/08/2025

At Gamescom 2025, NVIDIA DLSS 4 and Ray Tracing Come to This Year's Biggest Titles

With over 175 games now supporting NVIDIA DLSS 4 - a suite of advanced, AI-power...

18/08/2025

New Lightweight AI Model for Project G-Assist Brings Support for 6GB NVIDIA GeForce RTX and RTX PRO GPUs

At Gamescom, NVIDIA is releasing its first major update to Project G Assist - an...

15/08/2025

Now We're Talking: NVIDIA Releases Open Dataset, Models for Multilingual Speech AI

Of around 7,000 languages in the world, a tiny fraction are supported by AI lang...

14/08/2025

NVIDIA, National Science Foundation Support Ai2 Development of Open AI Models to Drive U.S. Scientific Leadership

NVIDIA is partnering with the U.S. National Science Foundation (NSF) to create a...

14/08/2025

Warhammer 40,000: Dawn of War - Definitive Edition' Storms GeForce NOW at Launch

Warhammer 40,000: Dawn of War - Definitive Edition is marching onto GeForce NOW,...

13/08/2025

FLUX.1 Kontext NVIDIA NIM Microservice Now Available for Download

Black Forest Labs' FLUX.1 Kontext [dev] image editing model is now available as an NVIDIA NIM microservice. FLUX.1 models allow users to edit existing imag...

11/08/2025

Amazon Devices & Services Achieves Major Step Toward Zero-Touch Manufacturing With NVIDIA AI and Digital Twins

Using NVIDIA digital twin technologies, Amazon Devices & Services is powering bi...

11/08/2025

Mini Footprint, Mighty AI: NVIDIA Blackwell Architecture Powers AI Acceleration in Compact Workstations

Packing the power of the NVIDIA Blackwell architecture in compact, energy-effici...

11/08/2025

Making Safer Spaces: NVIDIA and Partners Bring Physical AI to Cities and Industrial Infrastructure

Physical AI is becoming the foundation of smart cities, facilities and industria...

07/08/2025

The Saga Continues: Stream 2K's Mafia: The Old Country' at Launch on GeForce NOW

This GFN Thursday brings an offer members can't refuse - 2K's highly ant...

05/08/2025

OpenAI and NVIDIA Propel AI Innovation With New Open Models Optimized for the World's Largest AI Inference Infrastructure

Two new open-weight AI reasoning models from OpenAI released today bring cutting...

05/08/2025

OpenAI's New Open Models Accelerated Locally on NVIDIA GeForce RTX and RTX PRO GPUs

In collaboration with OpenAI, NVIDIA has optimized the company's new open-so...

05/08/2025

Delivering 1.5M TPS Inference on NVIDIA GB200 NVL72, NVIDIA Accelerates OpenAI gpt-oss Models From Cloud to Edge

NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA D...

05/08/2025

No Backdoors. No Kill Switches. No Spyware.

NVIDIA GPUs are at the heart of modern computing. They're used across industries - from healthcare and finance to scientific research, autonomous systems an...

31/07/2025

Embark on Epic Adventures in August With a Dozen New Games Coming to GeForce NOW

August brings new levels of gaming excitement on GeForce NOW, with 2,300 titles now available to stream in the cloud. Grab a controller and get ready for epic ...

31/07/2025

Wired for Action: Langflow Enables Local AI Agent Creation on NVIDIA RTX PCs

Interest in generative AI is continuing to grow, as new models include more capabilities. With the latest advancements, even enthusiasts without a developer bac...

29/07/2025

FourCastNet 3 Enables Fast and Accurate Large Ensemble Weather Forecasting With Scalable Geometric ML

FourCastNet3 (FCN3) is the latest AI global weather forecasting system from NVID...

28/07/2025

How New GB300 NVL72 Features Provide Steady Power for AI

The electrical grid is designed to support loads that are relatively steady, such as lighting, household appliances, and industrial machines that operate at con...

24/07/2025

Creative Agency Black Mixture Creates Stunning Visuals With Generative AI Powered by NVIDIA RTX

For media company Black Mixture, AI isn't just a tool - it's an entire p...

24/07/2025

WUCHANG: Fallen Feathers' Lands in the Cloud

Sharpen the blade and brace for a journey steeped in myth and mystery. WUCHANG: Fallen Feathers has launched in the cloud. Ride in style with skateboarding leg...

23/07/2025

Into the Omniverse: How Global Brands Are Scaling Personalized Advertising With AI and 3D Content Generation

In today's fast-evolving digital landscape, marketing teams face increasing ...

22/07/2025

AI On: How Financial Services Companies Use Agentic AI to Enhance Productivity, Efficiency and Security

Editor's note: This post is part of the AI On blog series, which explores th...

17/07/2025

GeForce NOW Delivers Justice With RoboCop: Rogue City - Unfinished Business'

Listen up citizens, the law is back and patrolling the cloud. Nacon's RoboCop Rogue City - Unfinished Business launches today in the cloud, bringing justice...

15/07/2025

Deadline Extended - Create a Project G-Assist Plug-In for a Chance to Win an NVIDIA GeForce RTX GPU and Laptop

Submissions for NVIDIA's Plug and Play: Project G-Assist Plug-In Hackathon a...

14/07/2025

NVIDIA CEO Jensen Huang Promotes AI in Washington, DC and China

This month, NVIDIA founder and CEO Jensen Huang promoted AI in both Washington, D.C. and Beijing - emphasizing the benefits that AI will bring to business and s...