Delivering 1.5M TPS Inference on NVIDIA GB200 NVL72, NVIDIA Accelerates OpenAI gpt-oss Models From Cloud to Edge

05/08/2025

NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA DGX back in 2016. The collaborative AI innovation continues with the OpenAI gpt-oss-20b and gpt-oss-120b launch. NVIDIA has optimized both new open-weight models for accelerated inference performance on NVIDIA Blackwell architecture, delivering up to 1.5 million tokens per second (TPS) on an NVIDIA GB200 NVL72 system.

The gpt-oss models are text-reasoning LLMs with chain-of-thought and tool-calling capabilities using the popular mixture of experts (MoE) architecture with SwigGLU activations. The attention layers use RoPE with 128k context, alternating between full context and a sliding 128-token window. The models are released in FP4 precision, which fits on a single 80 GB data center GPU and is natively supported by Blackwell.

The models were trained on NVIDIA H100 Tensor Core GPUs, with gpt-oss-120b requiring over 2.1 million hours and gpt-oss-20b about 10x less. NVIDIA worked with several top open-source frameworks such as Hugging Face Transformers, Ollama, and vLLM, in addition to NVIDIA TensorRT-LLM for optimized kernels and model enhancements. This blog post showcases how NVIDIA has integrated gpt-oss across the software platform to meet developers' needs.

Model name Transformer Blocks Total Parameters Active Params per Token # of Experts Active Experts per Token Input Context Length

gpt-oss-20b 24 20B 3.6B 32 4 128K

gpt-oss-120b 36 117B 5.1B 128 4 128K

Table 1. OpenAI gpt-oss-20b and gpt-oss-120b model specifications, including total parameters, active parameters, number of experts, and input context length NVIDIA also worked with OpenAI and the community to maximize performance, adding features such as:

TensorRT-LLM Gen for attention prefill, attention decode, and MoE low-latency on Blackwell.

CUTLASS MoE kernels on Blackwell.

XQA kernel for specialized attention on Hopper.

Optimized attention and MoE routing kernels are available through the FlashInfer kernel-serving library for LLMs.

OpenAI Triton kernel MoE support, which is used in both TensorRT-LLM and vLLM.

Deploy using vLLM In collaboration with vLLM, NVIDIA worked together to verify accuracy while also analyzing and optimizing performance for Hopper and Blackwell architectures. Data center developers can use NVIDIA optimized kernels through the FlashInfer LLM serving kernel library.

vLLM recommends using uv for Python dependency management. You can use vLLM to spin up an OpenAL-compatible web server. The following command will automatically download the model and start the server. Refer to the documentation and vLLM Cookbook guide for more details.

uv run --with vllm vllm serve openai/gpt-oss-20b

Deploy using TensorRT-LLM The optimizations are available on the NVIDIA/TensorRT-LLM GitHub repository, where developers can use the deployment guide to launch their high-performance server. The guide downloads the model checkpoints from Hugging Face. NVIDIA collaborated on the developer experience using the Transformers library with the new models. The guide then provides a Docker container and guidance on how to configure performance for both low-latency and max-throughput cases.

More than a million tokens per second with GB200 NVL72 NVIDIA engineers partnered closely with OpenAI to ensure that the new gpt-oss-120b and gpt-oss-20b models deliver accelerated performance on Day 0 across both the NVIDIA Blackwell and NVIDIA Hopper platforms.

At launch, based on early performance measurements, a single GB200 NVL72 rack-scale system is expected to serve the larger, more computationally demanding gpt-oss-120b model at 1.5 million tokens per second, or about 50,000 concurrent users. Blackwell features many architectural capabilities that accelerate inference performance. These include a second-generation Transformer Engine with FP4 Tensor Cores and fifth-generation NVIDIA NVLink and NVIDIA NVLink Switch, for high bandwidth, enabling 72 Blackwell GPUs to act as a single, massive GPU.

The performance, versatility, and pace of innovation of the NVIDIA platform enable the ecosystem to serve the latest models on Day 0 with high throughput and low cost per token.

Try the optimized model with NVIDIA Launchable Deploying with TensorRT-LLM is also available using the Python API in a JupyterLab notebook on the Open AI Cookbook as an NVIDIA Launchable directly in the build platform where developers can test out GPUs from multiple cloud platforms. You can deploy the optimized model with a single click in a pre-configured environment.

data-src=https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-png.webp alt=The image shows the console at brev.dev for users to select which type of GPU option in the Select your Compute' page, the user can select between boxes in a row of H200, H100, A100, L40s, A10 and A100 shown. class=lazyload wp-image-104187 data-srcset=https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-png.webp 1324w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-300x113-png.webp 300w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-625x236-png.webp 625w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-179x68-png.webp 179w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-768x290-png.webp 768w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-645x244-png.webp 645w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-500x189-png.webp 500w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-160x60-png.webp 160w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-362x137-png.webp 362w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-291x110-png.webp 291w, https://developer-blogs.nvidia.com/wp-content/uploads/202

LINK:	https://developer.nvidia.com/blog/delivering-1-5-m-tps-inference-on-nv...
	See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

07/10/2026

Dalet Flex LTS Delivers Smarter Media Operations from Ingest to Distribution

Dalet, a leading technology and service provider for media-rich organizations, today announced the latest Long-Term Supported (LTS) release of Dalet Flex. Build...

06/09/2026

Dolby and MagentaTV Bring Fans Closer to the FIFA World Cup 2026 in Germany with Dolby Vision and Dolby Atmos

June 9 2026, 23:00 (PDT) Dolby and MagentaTV Bring Fans Closer to the FIFA Worl...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

15/07/2026

S&P Analysis: Three Quarters of Americans Watch Live Sports

Share Copy link Facebook X Linkedin Bluesky Email...

15/07/2026

Scripps Sports, Ion Score Women's Volleyball Rights

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Bowling Green State Upgrades Doyt Perry Stadium With New Daktronics LED Display

South end zone videoboard, cloud-based control system will be ready for 2026 football season...

14/07/2026

Mizzou Athletics Launches Connected Digital Platform

Redesigned website, enhanced mobile app unify content, ticketing, and personalized fan engagement...

14/07/2026

DePaul Athletics, Playfly Sports Agree to Multimedia Rights Partnership

Agreement spans sponsorship sales, digital monetization, radio production, and new practice facility naming rights...

14/07/2026

American Association Expands Broadcast Reach Through FanDuel Sports Network Partnership

Independent league adds 14 regional sports network affiliates, growing distribut...

14/07/2026

Euroleague Basketball Introduces Euroleague Basketball+ Digital Ecosystem Initiative

New strategy aims to unify competitions, content, fan engagement, and commercial...

14/07/2026

Professional Fighters League, ESPN Reach Multi-Year Media Rights Deal for Brazil

ESPN and Disney+ become exclusive home of PFL events in key international MMA market...

14/07/2026

TEGNA Names Scott Gill VP of Technology and Operations

Gill will oversee engineering, technology, and sports operations across the company's 64 local television stations...

14/07/2026

Guest Post: Dynamic Media Facilities Could Reshape the Future of Broadcast Workflows

Submitted by North American Broadcasters Association (NABA) As broadcasters con...

14/07/2026

Bayerischer Rundfunk Debuts Fully Software-Defined SMPTE ST 2110 Radio OB Van Built Around Lawo Technology

Modernized mobile unit combines HOME Apps, mc 56 console, and IP infrastructure ...

14/07/2026

Scripps Sports, ION Secure U.S. Rights to 2027 FIVB Womens Volleyball World Cup

Every match of the 32-team tournament will air across ION and Scripps Sports platforms in English and Spanish...

14/07/2026

FloSports Lands Exclusive U.S. Rights to IIHF Mens World Championship Beginning in 2027

FloHockey to stream every game of the annual international tournament under four...

14/07/2026

Minnesota Lynx Add Three Games to KARE 11s Over-the-Air Schedule

Victory+ telecasts to be simulcast on TEGNA-owned station, expanding free local distribution...

14/07/2026

Avalanche Tones debut with Chainsaw Suite

Plug-ins for heavy music Avalanche Tones is the brainchild of Ava Toton, a 17-year-old musician and developer who says her goal is to make the lives of gui...

14/07/2026

IK Multimedia introduce ReSing Voices Brazilian Pack

Launched alongside new Singer Showcase purchase model IK Multimedia's innovative vocal-synthesis software has just gained its latest voice add-on, the R...

14/07/2026

MIDI Innovations Awards 2026

Registration open until 1 September 2026 The MIDI Association have revealed that the registration deadline for this year's MIDI Innovation Awards has no...

14/07/2026

Launchkey MK4 88 joins Novation line-up

88-note model completes MK4 range Novation have just introduced the final model in their flagship MIDI controller keyboard range, the Launchkey MK4 88. Roun...

14/07/2026

CBS Atlanta Adds a Noon Newscast

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Tegna Names Scott Gill VP, Technology and Operations

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Colorado Wildfires Bring Close Call for Broadcasters

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

IBC2026 sets conference agenda

IBC2026 has unveiled a powerful Conference programme bringing together global media leaders, technology innovators, creators, sports organisations, broadcasters...

14/07/2026

Nominations for Best of Show Awards at IBC2026 Now Open

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Broadcast Solutions delivers industry-first software-defi...

Broadcast Solutions, a leading systems integrator and provider of innovative solutions for the broadcast media industry, has delivered two highly capable outsid...

14/07/2026

UPDATED: Scripps, DirecTV End Blackout, Ink New Retrans Deal

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

12 States Sue to Block $110 Billion Warner Bros./Paramount Merger

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Heidi Raphael to Head N.Y. State Broadcasters Association

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

CBS Atlanta Expands Live Local News Programming

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Nemotron Labs: How Open Models Give Enterprises and Nations AI They Can Trust, Control and Customize

Editor's note: This post is part of the Nemotron Labs blog series, which exp...

14/07/2026

Techtel Successfully Relocates AICD Broadcast Studio to New Sydney Headquarters

Techtel Successfully Relocates AICD Broadcast Studio to New Sydney Headquarters BroadcastBroadcast EquipmentLive StreamingBroadcast Studio2026 14 July Writ...

14/07/2026

First look revealed for Friday the 13th prequel, Crystal Lake, from A24 coming to Sky and NOW in the UK and Ireland this October

Tuesday 14 July 2026 First look revealed for Friday the 13th prequel, Crystal ...

14/07/2026

Surround Is Still the Standard

When immersive audio dominates industry headlines, it's easy to assume that every broadcaster is preparing for an Atmos future. The reality is quite differ...

14/07/2026

Fresh Thinking from MAD//Fest London 2026

Emma and Sophie from ICG's marketing team joined thousands of fellow marketers, brands and agencies at MAD//Fest London 2026, one of the UK's biggest ma...

14/07/2026

Seven paradoxes shaping the next era of media production - Episode 3

Why Trusted and Secure Media Operations Matter In this series, we explore the technologies, architectures and operational realities shaping modern media operati...

14/07/2026

How Merchants Can Prepare for the Next Evolution in Digital Commerce

Pilot Project Shows How Retailers Are Prepared for the Next Step in the Evolution of Digital Commerce Arvato Systems Drives Agentic Commerce Forward G terslo...

14/07/2026

Building a more sustainable future - Our commitment to climate action

As part of this commitment, weve joined the SME Climate Hub, publicly pledging to: measure our greenhouse gas emissions reduce them in line with a net zero p...

14/07/2026

Why Performance per Watt Is the Ultimate Metric for AI Infrastructure Efficiency

Power is AI infrastructure's inescapable constraint. How many tokens an AI factory can generate within a fixed power budget determines its revenue and profi...

13/07/2026

BravesVision GM Jeff Cravens on Launching MLB's Newest Team-Owned Network in 35 Days

The Braves opted to keep production in-house rather than hand it off to MLB...

13/07/2026

Behind The Mic: Adam Schefter Signs Multi-Year Extension with ESPN

Behind The Mic provides a roundup of recent news regarding on-air talent, including new deals, departures, and assignments compiled from press releases and repo...

13/07/2026

Eurovision Sport and European Athletics Bring Live Athletics to More Fans with Multilingual AI Commentary Initiative

Eurovision Sport is making live athletics more accessible to fans than ever befo...

13/07/2026

Milwaukee Bucks Return to Full-Season Over-the-Air Television for First Time in 31 Years

The Milwaukee Bucks will return to full-season over-the-air television for the 2...

13/07/2026

SMPTE Expands Education Offerings with Connected Learning Path for IP Media Workflows

SMPTE has announced an expanded education pathway for media technology professio...

13/07/2026

Vizrt Graphics Power Netflix MVP MMA Event at Intuit Dome

Vizrt has announced that its graphics technology was used by broadcast design agency Girraphic for Netflix's debut MVP MMA event, broadcast live from the In...

13/07/2026

ARRI To Sell Global Rental Business to H2 Equity Partners in Management Buyout

ARRI has announced an agreement to sell its global rental activities in Europe, the United Kingdom, and North America to H2 Equity Partners through a management...

13/07/2026

DAZN and Premier Boxing Champions Announce Global Broadcasting Partnership

DAZN has announced a partnership with Premier Boxing Champions (PBC) to bring PBC fight nights to DAZN subscribers globally. The partnership begins Saturday, Ju...

13/07/2026

TikTok and WSC Sports Partner To Connect Sports Rightsholders With Content Creators

TikTok and WSC Sports have announced a strategic partnership that gives sports r...

View most recent headlines