
NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA DGX back in 2016. The collaborative AI innovation continues with the OpenAI gpt-oss-20b and gpt-oss-120b launch. NVIDIA has optimized both new open-weight models for accelerated inference performance on NVIDIA Blackwell architecture, delivering up to 1.5 million tokens per second (TPS) on an NVIDIA GB200 NVL72 system.
The gpt-oss models are text-reasoning LLMs with chain-of-thought and tool-calling capabilities using the popular mixture of experts (MoE) architecture with SwigGLU activations. The attention layers use RoPE with 128k context, alternating between full context and a sliding 128-token window. The models are released in FP4 precision, which fits on a single 80 GB data center GPU and is natively supported by Blackwell.
The models were trained on NVIDIA H100 Tensor Core GPUs, with gpt-oss-120b requiring over 2.1 million hours and gpt-oss-20b about 10x less. NVIDIA worked with several top open-source frameworks such as Hugging Face Transformers, Ollama, and vLLM, in addition to NVIDIA TensorRT-LLM for optimized kernels and model enhancements. This blog post showcases how NVIDIA has integrated gpt-oss across the software platform to meet developers' needs.
Model name Transformer Blocks Total Parameters Active Params per Token # of Experts Active Experts per Token Input Context Length
gpt-oss-20b 24 20B 3.6B 32 4 128K
gpt-oss-120b 36 117B 5.1B 128 4 128K
Table 1. OpenAI gpt-oss-20b and gpt-oss-120b model specifications, including total parameters, active parameters, number of experts, and input context length NVIDIA also worked with OpenAI and the community to maximize performance, adding features such as:
TensorRT-LLM Gen for attention prefill, attention decode, and MoE low-latency on Blackwell.
CUTLASS MoE kernels on Blackwell.
XQA kernel for specialized attention on Hopper.
Optimized attention and MoE routing kernels are available through the FlashInfer kernel-serving library for LLMs.
OpenAI Triton kernel MoE support, which is used in both TensorRT-LLM and vLLM.
Deploy using vLLM In collaboration with vLLM, NVIDIA worked together to verify accuracy while also analyzing and optimizing performance for Hopper and Blackwell architectures. Data center developers can use NVIDIA optimized kernels through the FlashInfer LLM serving kernel library.
vLLM recommends using uv for Python dependency management. You can use vLLM to spin up an OpenAL-compatible web server. The following command will automatically download the model and start the server. Refer to the documentation and vLLM Cookbook guide for more details.
uv run --with vllm vllm serve openai/gpt-oss-20b
Deploy using TensorRT-LLM The optimizations are available on the NVIDIA/TensorRT-LLM GitHub repository, where developers can use the deployment guide to launch their high-performance server. The guide downloads the model checkpoints from Hugging Face. NVIDIA collaborated on the developer experience using the Transformers library with the new models. The guide then provides a Docker container and guidance on how to configure performance for both low-latency and max-throughput cases.
More than a million tokens per second with GB200 NVL72 NVIDIA engineers partnered closely with OpenAI to ensure that the new gpt-oss-120b and gpt-oss-20b models deliver accelerated performance on Day 0 across both the NVIDIA Blackwell and NVIDIA Hopper platforms.
At launch, based on early performance measurements, a single GB200 NVL72 rack-scale system is expected to serve the larger, more computationally demanding gpt-oss-120b model at 1.5 million tokens per second, or about 50,000 concurrent users. Blackwell features many architectural capabilities that accelerate inference performance. These include a second-generation Transformer Engine with FP4 Tensor Cores and fifth-generation NVIDIA NVLink and NVIDIA NVLink Switch, for high bandwidth, enabling 72 Blackwell GPUs to act as a single, massive GPU.
The performance, versatility, and pace of innovation of the NVIDIA platform enable the ecosystem to serve the latest models on Day 0 with high throughput and low cost per token.
Try the optimized model with NVIDIA Launchable Deploying with TensorRT-LLM is also available using the Python API in a JupyterLab notebook on the Open AI Cookbook as an NVIDIA Launchable directly in the build platform where developers can test out GPUs from multiple cloud platforms. You can deploy the optimized model with a single click in a pre-configured environment.
data-src=https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-png.webp alt=The image shows the console at brev.dev for users to select which type of GPU option in the Select your Compute' page, the user can select between boxes in a row of H200, H100, A100, L40s, A10 and A100 shown. class=lazyload wp-image-104187 data-srcset=https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-png.webp 1324w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-300x113-png.webp 300w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-625x236-png.webp 625w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-179x68-png.webp 179w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-768x290-png.webp 768w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-645x244-png.webp 645w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-500x189-png.webp 500w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-160x60-png.webp 160w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-362x137-png.webp 362w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-291x110-png.webp 291w, https://developer-blogs.nvidia.com/wp-content/uploads/202
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
02/05/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
01/05/2026
January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...
15/04/2026
Open Broadcast Systems has announced that BBC World Service has selected its IP ...
15/04/2026
LiveU has announced an expansion of its collaboration with Sony Corporation, add...
15/04/2026
Ateme has announced a collaboration with NVIDIA to support live Apple Immersive ...
15/04/2026
The Professional Fighters League (PFL) has announced a multi-year partnership renewal with DAZN DACH, covering Germany, Switzerland, Austria, Liechtenstein, and...
15/04/2026
Canon U.S.A. (NAB Booth C3825) today took the lid off of the CINE-SERVO 40-1200m...
15/04/2026
Panasonic Video and Audio Systems North America and NEP Group will demonstrate a...
15/04/2026
For the fourth year running, independent analysts found businesses across all industries and verticals pay roughly the same amount in fees as they spend on stor...
15/04/2026
The Soccer Tournament (TST) has announced a media rights deal with NBC Sports to...
15/04/2026
JB&A will host the Pre-NAB 2026 Technology Event on April 17-18 at Flamingo Las Vegas, ahead of NAB Show. The event features hands-on demonstrations and technic...
15/04/2026
The Sennheiser Group will exhibit at NAB Show 2026 (Booth 4931, Central Hall), with demonstrations from Sennheiser, Neumann, and Merging across three areas: Rel...
15/04/2026
NAB Show 2026 will take place April 18-22 at the Las Vegas Convention Center, wi...
15/04/2026
AI-Media has announced the LEXI Text Encoder and LEXI Voice Encoder at NAB Show 2026, the company's first new encoder hardware release in more than a decade...
15/04/2026
Italian camera support manufacturer Cartoni will introduce several new products at NAB Show 2026 (Booth C6540, Central Hall), including the Master 30 OB fluid h...
15/04/2026
Lawo and swXtch.io have announced a memorandum of understanding at NAB Show 2026, under which Lawo will explore incorporating swXtch.io's groundSwXtch softw...
15/04/2026
CacheFly will exhibit at NAB Show 2026 (Booth W3129, April 19-22, Las Vegas Convention Center), showcasing three new additions to its content delivery platform:...
15/04/2026
Synamedia has announced GO Shorts, a new module within its Synamedia Go OTT platform that uses AI to convert an operator's existing content library into a s...
15/04/2026
The NAB Show kicks off on Saturday, and the SVG and SVG Europe editorial teams a...
15/04/2026
AJA Video Systems has announced an agreement to acquire Comprimato, a live video encoding and processing software company. The deal will unite the two companies...
15/04/2026
Prime Video Sports' NBA Playoffs coverage, which includes the entire SoFi NB...
15/04/2026
Just announced, the SDE standard provides a unified method and file format to ensure consistent and reliably comparable noise predictions
Sports and entertainm...
15/04/2026
From immersive storytelling to laugh-out-loud comedies, podcasts are booming in ...
15/04/2026
Books have always moved with us, whether tucked in our bags or humming in our he...
15/04/2026
For many artists, independent venues are where music careers begin and fan communities take shape. Independent venue operators work hard every day to keep local...
15/04/2026
From gripping thrillers to poignant memoirs, the 21st century has had no shortage of unforgettable books. To celebrate the standout storytelling of our modern e...
15/04/2026
Vintage broadcast experts release second plug-in
Telsie T is the second plug-in to be released by SonicWorld, a German audio company who specialise in servi...
15/04/2026
Includes eight free UAD plug-ins
Universal Audio's latest bundle brings together a selection of their renowned plug-ins and virtual instruments, and is ...
15/04/2026
Maximum uptime for broadcasters: Rohde & Schwarz launches R&S BroadcastShield at...
15/04/2026
Image courtesy of MD Helicopters...
15/04/2026
Virginia Gov. Abigail Spanberger, L3Harris VP Mark Farley, and state and local l...
15/04/2026
U.S. Space Forces Ground-Based Optical Sensor System upgrade at the Maui Space S...
15/04/2026
NBCU-Versant notches 13.1% of TV viewing in February, its best since August 2024...
15/04/2026
New data reveals older Kiwis are financially resilient, loyal to local products,...
15/04/2026
aconnic AG (ISIN: DE000A0LBKW6), Munich, announces the market launch of the ACCE...
15/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
15/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
15/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
15/04/2026
Evergent introduces its Agentic Revenue Orchestration Platform, transforming how subscription businesses across direct-to-consumer streaming, pay-TV, telecommun...
15/04/2026
Harmonic's XOS Media Processor Delivers Exceptional Video Quality to More than Half of U.S. Public Media Viewership
Harmonic (NASDAQ: HLIT) today announce...
15/04/2026
LONGMONT, COLORADO, APRIL 15, 2026 DPA Microphones N Series Digital Wireless System users in North America can now take full advantage of the system's exc...
15/04/2026
Cobalt Iron, a leading provider of SaaS-based enterprise data protection, today announced the launch of Compass Tape Gateway (CTG), a transformative enhancemen...
15/04/2026
Disguise to Showcase Cutting-Edge Experience Tech for Sports, Broadcast and More...
15/04/2026
Arooj Aftab Makes the Music She Wants to Hear The singular artist explores the juxtaposition of grief and joy, dark and light, in her distinctive sound.
Apri...
15/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
15/04/2026
Interra Systems, a provider of end-to-end quality assurance solutions for the digital media industry, is proud to announce its central role in the digital trans...