
NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA DGX back in 2016. The collaborative AI innovation continues with the OpenAI gpt-oss-20b and gpt-oss-120b launch. NVIDIA has optimized both new open-weight models for accelerated inference performance on NVIDIA Blackwell architecture, delivering up to 1.5 million tokens per second (TPS) on an NVIDIA GB200 NVL72 system.
The gpt-oss models are text-reasoning LLMs with chain-of-thought and tool-calling capabilities using the popular mixture of experts (MoE) architecture with SwigGLU activations. The attention layers use RoPE with 128k context, alternating between full context and a sliding 128-token window. The models are released in FP4 precision, which fits on a single 80 GB data center GPU and is natively supported by Blackwell.
The models were trained on NVIDIA H100 Tensor Core GPUs, with gpt-oss-120b requiring over 2.1 million hours and gpt-oss-20b about 10x less. NVIDIA worked with several top open-source frameworks such as Hugging Face Transformers, Ollama, and vLLM, in addition to NVIDIA TensorRT-LLM for optimized kernels and model enhancements. This blog post showcases how NVIDIA has integrated gpt-oss across the software platform to meet developers' needs.
Model name Transformer Blocks Total Parameters Active Params per Token # of Experts Active Experts per Token Input Context Length
gpt-oss-20b 24 20B 3.6B 32 4 128K
gpt-oss-120b 36 117B 5.1B 128 4 128K
Table 1. OpenAI gpt-oss-20b and gpt-oss-120b model specifications, including total parameters, active parameters, number of experts, and input context length NVIDIA also worked with OpenAI and the community to maximize performance, adding features such as:
TensorRT-LLM Gen for attention prefill, attention decode, and MoE low-latency on Blackwell.
CUTLASS MoE kernels on Blackwell.
XQA kernel for specialized attention on Hopper.
Optimized attention and MoE routing kernels are available through the FlashInfer kernel-serving library for LLMs.
OpenAI Triton kernel MoE support, which is used in both TensorRT-LLM and vLLM.
Deploy using vLLM In collaboration with vLLM, NVIDIA worked together to verify accuracy while also analyzing and optimizing performance for Hopper and Blackwell architectures. Data center developers can use NVIDIA optimized kernels through the FlashInfer LLM serving kernel library.
vLLM recommends using uv for Python dependency management. You can use vLLM to spin up an OpenAL-compatible web server. The following command will automatically download the model and start the server. Refer to the documentation and vLLM Cookbook guide for more details.
uv run --with vllm vllm serve openai/gpt-oss-20b
Deploy using TensorRT-LLM The optimizations are available on the NVIDIA/TensorRT-LLM GitHub repository, where developers can use the deployment guide to launch their high-performance server. The guide downloads the model checkpoints from Hugging Face. NVIDIA collaborated on the developer experience using the Transformers library with the new models. The guide then provides a Docker container and guidance on how to configure performance for both low-latency and max-throughput cases.
More than a million tokens per second with GB200 NVL72 NVIDIA engineers partnered closely with OpenAI to ensure that the new gpt-oss-120b and gpt-oss-20b models deliver accelerated performance on Day 0 across both the NVIDIA Blackwell and NVIDIA Hopper platforms.
At launch, based on early performance measurements, a single GB200 NVL72 rack-scale system is expected to serve the larger, more computationally demanding gpt-oss-120b model at 1.5 million tokens per second, or about 50,000 concurrent users. Blackwell features many architectural capabilities that accelerate inference performance. These include a second-generation Transformer Engine with FP4 Tensor Cores and fifth-generation NVIDIA NVLink and NVIDIA NVLink Switch, for high bandwidth, enabling 72 Blackwell GPUs to act as a single, massive GPU.
The performance, versatility, and pace of innovation of the NVIDIA platform enable the ecosystem to serve the latest models on Day 0 with high throughput and low cost per token.
Try the optimized model with NVIDIA Launchable Deploying with TensorRT-LLM is also available using the Python API in a JupyterLab notebook on the Open AI Cookbook as an NVIDIA Launchable directly in the build platform where developers can test out GPUs from multiple cloud platforms. You can deploy the optimized model with a single click in a pre-configured environment.
data-src=https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-png.webp alt=The image shows the console at brev.dev for users to select which type of GPU option in the Select your Compute' page, the user can select between boxes in a row of H200, H100, A100, L40s, A10 and A100 shown. class=lazyload wp-image-104187 data-srcset=https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-png.webp 1324w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-300x113-png.webp 300w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-625x236-png.webp 625w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-179x68-png.webp 179w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-768x290-png.webp 768w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-645x244-png.webp 645w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-500x189-png.webp 500w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-160x60-png.webp 160w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-362x137-png.webp 362w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/08/Brev-291x110-png.webp 291w, https://developer-blogs.nvidia.com/wp-content/uploads/202
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
01/05/2026
January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...
01/04/2026
January 4 2026, 18:00 (PST) DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION
Douyin Users Can Now Create And Share Videos With Stun...
15/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
15/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
15/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
15/01/2026
NVIDIA kicked off the year at CES, where the crowd buzzed about the latest gaming announcements - including the native GeForce NOW app for Linux and Amazon Fire...
14/01/2026
Staines-upon-Thames, UK, 13th January, 2026 ITV, one of the UKs leading broadcasters, has selected Yospace, the global leader in Dynamic Ad Insertion (DAI), to ...
14/01/2026
Tech Focus: Audio Consoles, Part 2 - New Options for Virtual MixingA variety of solutions offer both technical and economic benefitsBy Dan Daley, Audio Editor
...
14/01/2026
Tech Focus: Audio Consoles, Part 1 - Key Component Evolves Toward the Totally Vi...
14/01/2026
SVG Summit 2025: Audio from Monday Workshops Now AvailableListen to sessions from Live Production Innovation, AI Production Tools, Cloud Production, Content Wor...
14/01/2026
The L3Harris large T7 robotic systems will provide U.S. Navy and U.S. Marines wi...
14/01/2026
Steiger Media's adoption of Calrec's compact Argo M console not only makes its innovative new hybrid truck faster, more efficient, and agile, but also e...
14/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
14/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
14/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
14/01/2026
January 14th, 2026
TRIBECA ANNOUNCES BEST NEW YORK SHORT AWARD FOR 25TH ANNIVERSARY FESTIVAL
In Celebration of Its 25th Anniversary, Tribeca Introduces a N...
14/01/2026
Wednesday 14 January 2026
Sky News announces Cathy Newman to lead flagship new political programme
Sky News today announces that award-winning journalist and ...
14/01/2026
Back to All News
State of Fear, The First Spin-Off of a Netflix Brazil Producti...
14/01/2026
The first stamp of An Post's 2026 Stamp Programme, marking 100 Years of Broadcasting, was unveiled at the GPO by Patrick O'Donovan TD, Minister for Cult...
14/01/2026
It's official! Beverley Callard has landed in Carrigstown. The beloved actor, known for her unforgettable roles and iconic screen presence, is joining the c...
13/01/2026
Independent media in Brazil and Colombia is facing an urgent crisis of traditional business models alongside a deteriorating security environment, according to ...
13/01/2026
NHL Situation Room 2.0: How Sony Hawk-Eye Powers Centralized Officiating, Player...
13/01/2026
NBC Sports Ices the Audio for the 2026 Prevagen U.S. Figure Skating Championship...
13/01/2026
DMF and MXL in practice: Which vendors are adopting it, and how fast is the ecos...
13/01/2026
CES 2026: Five Important Sports-Tech BuzzwordsThe terms highlight innovations for sports production at the showBy Daniel Frankel, SVG Contributor
Tuesday, Jan...
13/01/2026
For TGL Season 2, Unity 6 Boosts Virtual-Graphic Quality; COSM 360 Cameras Impro...
13/01/2026
Resetting Expectations? The State of the Sports Industry with Devoncroft's J...
13/01/2026
Top Row L-R: Ana Katz, Natalia Almada, Bao Nguyen, Tatiana Maslany, A.V. Rockwell, Dr. Heather Berlin
Second Row L-R: Sophie Barthes, Azazel Jacobs, Janicza Br...
13/01/2026
DoW to invest $1B in planned independently traded Missile Solutions business...
13/01/2026
L3Harris Chairman and CEO Christopher Kubasik and Under Secretary of War for Acq...
13/01/2026
April 10, 2025
First Gulf has taken a significant step in its U.S. expansion with the launch of its first industrial development in the country.
First Westla...
13/01/2026
April 11, 2025
Canadian footwear retailer SoftMoc has signed a lease for 145,600 square feet at 901 Hopkins Street in Whitby, where the space will serve as a w...
13/01/2026
April 14, 2025
First Gulf is proud to announce that 25 Ontario has officially received its occupancy permit, marking the transition from an active construction...
13/01/2026
April 28, 2025
First Gulf has been awarded a design-build lease for a new 350,000 square foot office and warehouse facility for Sherwin-Williams. This project ...
13/01/2026
August 13, 2025
First Gulf Expands U.S. Industrial Footprint with First Savanna...
13/01/2026
August 13, 2025
First Gulf is proud to partner with Toromont Industries Ltd. to...
13/01/2026
October 10, 2025
First Gulf is pleased to announce that PPFD, a leading third-party logistics company, has leased 146,536 square feet at 901 Hopkins Street in ...
13/01/2026
Singapore - January 13, 2026 - Nielsen today announced the appointment of Matty Lin to its Commercial Organization as APAC regional sales leader.
Based in Sing...
13/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
13/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
13/01/2026
Nine-week performance series brings music, dance, theatre, and storytelling to downtown Durham, January - March 2026 (Durham, NC) The Chamber Orchestra of the T...
13/01/2026
Berklee Launches AIMS, an Artist-Centered Summit on Music and AI Hosted by the Berklee Emerging Artistic Technology Lab (BEATL), the event will focus on the i...
13/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
13/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
13/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
13/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
13/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
13/01/2026
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...