
NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA DGX back in 2016. The collaborative AI innovation continues with the OpenAI gpt-oss-20b and gpt-oss-120b launch. NVIDIA has optimized both new open-weight models for accelerated inference performance on NVIDIA Blackwell architecture, delivering up to 1.5 million tokens per second (TPS) on an NVIDIA GB200 NVL72 system.
The gpt-oss models are text-reasoning LLMs with chain-of-thought and tool-calling capabilities using the popular mixture of experts (MoE) architecture with SwigGLU activations. The attention layers use RoPE with 128k context, alternating between full context and a sliding 128-token window. The models are released in FP4 precision, which fits on a single 80 GB data center GPU and is natively supported by Blackwell.
The models were trained on NVIDIA H100 Tensor Core GPUs, with gpt-oss-120b requiring over 2.1 million hours and gpt-oss-20b about 10x less. NVIDIA worked with several top open-source frameworks such as Hugging Face Transformers, Ollama, and vLLM, in addition to NVIDIA TensorRT-LLM for optimized kernels and model enhancements. This blog post showcases how NVIDIA has integrated gpt-oss across the software platform to meet developers' needs.
Model name Transformer Blocks Total Parameters Active Params per Token # of Experts Active Experts per Token Input Context Length
gpt-oss-20b 24 20B 3.6B 32 4 128K
gpt-oss-120b 36 117B 5.1B 128 4 128K
Table 1. OpenAI gpt-oss-20b and gpt-oss-120b model specifications, including total parameters, active parameters, number of experts, and input context length NVIDIA also worked with OpenAI and the community to maximize performance, adding features such as:
TensorRT-LLM Gen for attention prefill, attention decode, and MoE low-latency on Blackwell.
CUTLASS MoE kernels on Blackwell.
XQA kernel for specialized attention on Hopper.
Optimized attention and MoE routing kernels are available through the FlashInfer kernel-serving library for LLMs.
OpenAI Triton kernel MoE support, which is used in both TensorRT-LLM and vLLM.
Deploy using vLLM In collaboration with vLLM, NVIDIA worked together to verify accuracy while also analyzing and optimizing performance for Hopper and Blackwell architectures. Data center developers can use NVIDIA optimized kernels through the FlashInfer LLM serving kernel library.
vLLM recommends using uv for Python dependency management. You can use vLLM to spin up an OpenAL-compatible web server. The following command will automatically download the model and start the server. Refer to the documentation and vLLM Cookbook guide for more details.
uv run --with vllm vllm serve openai/gpt-oss-20b
Deploy using TensorRT-LLM The optimizations are available on the NVIDIA/TensorRT-LLM GitHub repository, where developers can use the deployment guide to launch their high-performance server. The guide downloads the model checkpoints from Hugging Face. NVIDIA collaborated on the developer experience using the Transformers library with the new models. The guide then provides a Docker container and guidance on how to configure performance for both low-latency and max-throughput cases.
More than a million tokens per second with GB200 NVL72 NVIDIA engineers partnered closely with OpenAI to ensure that the new gpt-oss-120b and gpt-oss-20b models deliver accelerated performance on Day 0 across both the NVIDIA Blackwell and NVIDIA Hopper platforms.
At launch, based on early performance measurements, a single GB200 NVL72 rack-scale system is expected to serve the larger, more computationally demanding gpt-oss-120b model at 1.5 million tokens per second, or about 50,000 concurrent users. Blackwell features many architectural capabilities that accelerate inference performance. These include a second-generation Transformer Engine with FP4 Tensor Cores and fifth-generation NVIDIA NVLink and NVIDIA NVLink Switch, for high bandwidth, enabling 72 Blackwell GPUs to act as a single, massive GPU.
The performance, versatility, and pace of innovation of the NVIDIA platform enable the ecosystem to serve the latest models on Day 0 with high throughput and low cost per token.
Try the optimized model with NVIDIA Launchable Deploying with TensorRT-LLM is also available using the Python API in a JupyterLab notebook on the Open AI Cookbook as an NVIDIA Launchable directly in the build platform where developers can test out GPUs from multiple cloud platforms. You can deploy the optimized model with a single click in a pre-configured environment.
data-src=https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-png.webp alt=The image shows the console at brev.dev for users to select which type of GPU option in the Select your Compute' page, the user can select between boxes in a row of H200, H100, A100, L40s, A10 and A100 shown. class=lazyload wp-image-104187 data-srcset=https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-png.webp 1324w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-300x113-png.webp 300w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-625x236-png.webp 625w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-179x68-png.webp 179w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-768x290-png.webp 768w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-645x244-png.webp 645w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-500x189-png.webp 500w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-160x60-png.webp 160w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-362x137-png.webp 362w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-291x110-png.webp 291w, https://developer-blogs.nvidia.com/wp-content/uploads/202
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
15/10/2025
NEW YORK The NBA is making major changes to the NBA App and NBA TV as it takes control of them from TNT Sports, which has long managed the league's digital ...
15/10/2025
SAN MATEO, Calif. In what promises to be a major expansion of interactive features and personalized content on the DirecTV platform, the operator and Glance hav...
15/10/2025
SAN JOSE, Calif. Roku has launched changes to its user interface (UI) that the streaming platform says will better showcase original programming on the platform...
15/10/2025
LOS ANGELES Software-defined data storage and data services provider OpenDrives has elevated Alex Dunfey to chief technology officer, responsible for driving th...
15/10/2025
Series coming in 2026 stars Tom Vaughan-Lawlor, Justine Mitchell and Jason O'Mara released today
RT today released first look images of new comedy-drama ...
14/10/2025
SVG Europe Summit 2025: All Sessions Now Available to Watch on SVG PLAYNetworking event that preceded IBC2025 shone a light on elite live sports innovation acro...
14/10/2025
SVG Sit-Down: Author Rich Podolsky on Writing Madden & Summerall: How They Revo...
14/10/2025
SVG All-Stars: Michael Reiners, Coordinating Producer, FloRacingThe Illinois State grad steers a vast schedule of motorsports events at tracks across the countr...
14/10/2025
Content protection: Getting the right management for your DRM By Neal Romanek
Friday, October 10, 2025 - 10:11
Print This Story
Eluvio power the EPCR'...
14/10/2025
As League Takes Over Ops, NBA TV and NBA App Add 60 Games, Weekday Studio Show, ...
14/10/2025
Time and effort: World's largest student-led broadcast prepares to go On Air...
14/10/2025
(L-R) Guest, Kimberly Robinson Jones, Geeta Gandbhir, Pamela Dias, and Takema Ro...
14/10/2025
Lossless ist jetzt mit Spotify Premium verf gbar.
Verlustfreies Audio war eine...
14/10/2025
La qualit Lossless est disponible sur Spotify Premium.
Le format sans perte de...
14/10/2025
For the seventh edition of Spotify and FC Barcelona's artist jersey series, ...
14/10/2025
Spotify is committed to bringing the best listening experience to all our users, and that includes parents and families. That's why we're expanding mana...
14/10/2025
Since its debut, the Spotify Original podcast Caso 63 has been more than just a story; it's been a cultural sensation. The science fiction thriller captivat...
14/10/2025
Desde su debut, el podcast original de Spotify Caso 63 ha sido mucho m s que una historia: se ha convertido en un fen meno cultural. Este thriller de ciencia fi...
14/10/2025
Lossless p Spotify Premium r h r.
Lossless-ljud har varit en av de mest efterl ngtade funktionerna p Spotify och nu, ntligen, har den b rjat rullas ut til...
14/10/2025
Early next year, your favorite video podcasts are getting a bigger stage. Spotify and Netflix are teaming up to bring sports, culture, lifestyle, and true crime...
14/10/2025
Last week, the 4th global Safety Day took place at all SGL Carbon sites.
This years Safety Day focused on hazardous substances. Various information events, wor...
14/10/2025
From bowser to basket, 9 in 10 Aussies are feeling the impact of rising prices
26% of households earn over $160k, but are still concerned about rising prices...
14/10/2025
New players take a bite out of big bank share as consumers increasingly value tr...
14/10/2025
56% of Aussies are looking for a coastal holiday, while 40% are planning a road ...
14/10/2025
51% of Aussies want a hybrid car and 36% want a full EV
Toyota leads the market
75% research online before a new car purchase
Sydney - October 14, 2025 - Aus...
14/10/2025
Unilever leads the market
Beverages, smartphones, and food dominate category sp...
14/10/2025
Top insurance advertisers
Biggest growth categories
Sector ad spend up 4.7...
14/10/2025
WAYNE, Pa. Private-equity firm Saothair Capital Partners said it has completed the acquisition of GatesAir through a newly-formed affiliate....
14/10/2025
Media Excel, a leading provider of encoding and transcoding solutions, today announced that Space Norway, a leading provider of satellite services and operator ...
14/10/2025
Jason Tyler has joined ZTransform, a leader in media environment innovation, as Inside Sales and Procurement Manager bringing commercial and operational focus t...
14/10/2025
14 10 2025 - Media release Tiny toys, big missions: Knee High Spies launches on ABC this November
Knee High Spies
Kids, assemble! The ABC and Screen Australi...
14/10/2025
Abu Dhabi, UAE October 14, 2025: Space42 (ADX: SPACE42), the AI-powered SpaceT...
14/10/2025
Abu Dhabi, UAE October 14, 2025: Space42 (ADX: SPACE42), the UAE-based AI-powered SpaceTech company with a global reach, has signed a Memorandum of Understand...
14/10/2025
Joe Wilkinson and David Earl will explore their favourite sitcoms together with help from stars such as Ricky Gervais
14th October, London: Comedians, writers,...
14/10/2025
October 14th, 2025 ANNA SARGENT, VICTOR SLEZAK, ALI AHN, MARCELINE HUGOT, AND S...
14/10/2025
The Sky Original event series - a symphony of genius, rivalry and vengeance - al...
14/10/2025
ESA awards Rohde & Schwarz for contributions to 30 years European Satellite Navi...
14/10/2025
The Hollywood Professional Association (HPA) today unveiled key highlights of the 2026 HPA Tech Retreat, scheduled for Feb. 15-19 at the Westin Rancho Mirage Go...
14/10/2025
Rena Ayer Joins Red Seat Ventures as Senior Vice President, Content & Talent Par...
14/10/2025
Imelda May explores her relationship with the Irish language through songs and sean-n s singing
Friday 17 October, 8.30pm on RT One and RT Player
Watch tr...
14/10/2025
AI is transforming the way enterprises build, deploy and scale intelligent applications. As demand surges for enterprise-grade AI applications that offer speed,...
14/10/2025
At Oracle AI World, NVIDIA and Oracle announced they are deepening their collabo...
13/10/2025
Spectrum Brings Selected L.A. Lakers Games to Apple Vision Pro With New Immersiv...
13/10/2025
Media Climate Accord aims to offer united approach to M&E industry sustainabilit...
13/10/2025
Riot Games streamlines production of Valorant Champions Paris with ST 2110 flypa...
13/10/2025
Feeling the NRG: Riot Games puts on a show for Valorant Champions Paris final By Jo Ruddock
Monday, October 13, 2025 - 09:17
Print This Story
After more t...
13/10/2025
FOX Sports MLB Postseason Audio Aims To Make Officials' Calls More AccurateA1 Joe Carpenter hopes to bring some baseball CSI' to the ABS ump-cam system...
13/10/2025
By Katie Arthurs
Whether told through dance, ceremony, spoken word, or visual a...
13/10/2025
New SBS and NITV Original RECKLESS a Deadly Funny Thriller Straight Out of Fre...