
Editor's note: This post is part of Think SMART, a series focused on how leading AI service providers, developers and enterprises can boost their inference performance and return on investment with the latest advancements from NVIDIA's full-stack inference platform.
AI models are becoming increasingly complex and collaborative through multi-agent workflows. To keep up, AI inference must now scale across entire clusters to serve millions of concurrent users and deliver faster responses.
Much like it did for large-scale AI training, Kubernetes - the industry standard for containerized application management - is well-positioned to manage the multi-node inference needed to support advanced models.
The NVIDIA Dynamo platform works together with Kubernetes to streamline the management of both single- and multi-node AI inference. Read on to learn how the shift to multi-node inference is driving performance, as well as how cloud platforms are putting these technologies to work.
Tapping Disaggregated Inference for Optimized Performance For AI models that fit on a single GPU or server, developers often run many identical replicas of the model in parallel across multiple nodes to deliver high throughput. In a recent paper, Russ Fellows, principal analyst at Signal65, showed that this approach achieved an industry-first record aggregate throughput of 1.1 million tokens per second with 72 NVIDIA Blackwell Ultra GPUs.
When scaling AI models to serve many concurrent users in real time, or when managing demanding workloads with long input sequences, using a technique called disaggregated serving unlocks further performance and efficiency gains.
Serving AI models involves two phases: processing the input prompt (prefill) and generating the output (decode). Traditionally, both phases run on the same GPUs, which can create inefficiencies and resource bottlenecks.
Disaggregated serving solves this by intelligently assigning these tasks to independently optimized GPUs. This approach ensures that each part of the workload runs with the optimization techniques best suited for it, maximizing overall performance. For today's large AI reasoning models, such as DeepSeek-R1, disaggregated serving is essential.
NVIDIA Dynamo seamlessly brings multi-node inference optimization features such as disaggregated serving to production scale across GPU clusters.
It's already delivering value.
Baseten, for example, used NVIDIA Dynamo to speed up inference serving for long-context code generation by 2x and increase throughput by 1.6x, all without incremental hardware costs. Such software-driven performance boosts enable AI providers to significantly reduce the costs to manufacture intelligence.
In addition, recent SemiAnalysis InferenceMAX benchmarks demonstrated that disaggregated serving with Dynamo on NVIDIA GB200 NVL72 systems delivers the lowest cost per million tokens for mixture-of-experts reasoning models like DeepSeek-R1, among platforms tested.
Scaling Disaggregated Inference in the Cloud As disaggregated serving scales across dozens or even hundreds of nodes for enterprise-scale AI deployments, Kubernetes provides the critical orchestration layer. With NVIDIA Dynamo now integrated into managed Kubernetes services from all major cloud providers, customers can scale multi-node inference across NVIDIA Blackwell systems, including GB200 and GB300 NVL72, with the performance, flexibility and reliability that enterprise AI deployments demand.
Amazon Web Services is accelerating generative AI inference for its customers with NVIDIA Dynamo and integrated with Amazon EKS.
Google Cloud is providing a NVIDIA Dynamo recipe to optimize large language model (LLM) inference at enterprise scale on its AI Hypercomputer.
OCI is enabling multi-node LLM inferencing with OCI Superclusters and NVIDIA Dynamo.
The push towards enabling large-scale, multi-node inference extends beyond hyperscalers.
Nebius, for example, is designing its cloud to serve inference workloads at scale, built on NVIDIA accelerated computing infrastructure and working with NVIDIA Dynamo as an ecosystem partner.
Simplifying Inference on Kubernetes With NVIDIA Grove in NVIDIA Dynamo Disaggregated AI inference requires coordinating a team of specialized components - prefill, decode, routing and more - each with different needs. The challenge for Kubernetes is no longer about running more parallel copies of a model, but rather about masterfully conducting these distinct components as one cohesive, high-performance system.
NVIDIA Grove, an application programming interface now available within NVIDIA Dynamo, allows users to provide a single, high-level specification that describes their entire inference system.
For example, in that single specification, a user could simply declare their requirements: I need three GPU nodes for prefill and six GPU nodes for decode, and I require all nodes for a single model replica to be placed on the same high-speed interconnect for the quickest possible response.
From that specification, Grove automatically handles all the intricate coordination: scaling related components together while maintaining correct ratios and dependencies, starting them in the right order and placing them strategically across the cluster for fast, efficient communication. Learn more about how to get started with NVIDIA Grove in this technical deep dive.
As AI inference becomes increasingly distributed, the combination of Kubernetes and NVIDIA Dynamo with NVIDIA Grove simplifies how developers build and scale intelligent applications.
Explore how these technologies come together to make cluster-scale AI easy and production-ready by joining NVIDIA at KubeCon, running through Thursday, Nov. 13, in Atlanta.
Most recent headlines
11/12/2025
Dalet, a leading provider of cloud-native, end-to-end media workflow solutions, ...
06/12/2025
In a live broadcast from the Reagan National Defense Forum, L3Harris Chair and CEO Christopher Kubasik joined Morgan Brennan on CNBCs Closing Bell: Overtime. Ku...
06/12/2025
FORT LAUDERDALE, Fla. A new survey from Pixitmedia by Datacore revealed a major shift in the Media & Entertainment industry in media archiving, with 85% of resp...
06/12/2025
HACKENSACK, N.J. LiveU has announced that the national public broadcaster Czech Television has completed one of the largest LiveU live production deployments fo...
06/12/2025
NEW YORK The National Academy of Television Arts and Sciences (NATAS) presented the Excellence in Production Technology Emmy Award to NASA+ and Dr. Tom Leight...
05/12/2025
2025 Sports Broadcasting Hall of Fame: Curt Gowdy Jr. - Master Storyteller, Nati...
05/12/2025
SVG Sit-Down: Veritone's Sean King on the Power of Mining Video, Audio DataThe company's Data Refinery offers users total control and governance over da...
05/12/2025
Platinum White Paper: Inside the Nashville Predators' Unified, Flexible, Sca...
05/12/2025
Netflix Reaches Agreement To Acquire Warner Bros. Following Planned WBD SplitThe deal does not include WBDs sports assets like TNT Sports (US, UK, LatAm), Euros...
05/12/2025
FOX Sports Returns to Indianapolis for Primetime Broadcast of Big Ten Championsh...
05/12/2025
SVG Summit 2025 Preview: Digital Engagement & Monetization Workshop Tackles the ...
05/12/2025
Atlanta United Lights Up New Emory Healthcare Studio With First Live Broadcast f...
05/12/2025
As Messi Takes the Pitch, MLS, Apple, NEP Roll Out Largest MLS Cup Production Ev...
05/12/2025
ESPN Enters College Football's Most Intense Month With Elevated Workflows fo...
05/12/2025
It's about that time! Awards season is in full swing, and the Film Independe...
05/12/2025
Every year, Spotify Wrapped offers a personalized look back at the audio that defined your year. It's a snapshot of your listening habits, designed to tell ...
05/12/2025
In 2025, Spotify's EQUAL, GLOW, and RADAR programs celebrated women, LGBTQIA , and emerging artists who turned moments into milestones. From breaking record...
05/12/2025
In our latest blog, we explain how Wi-Fi 7 rollouts can drive consumer loyalty with value-add services such as consumer cybersecurity. We also explore how this ...
05/12/2025
LOS ANGELES Netflix announced it has entered into an agreement to acquire the assets of Warner Bros. for $82.7 billion....
05/12/2025
NEW YORK Nielsens Gracenote has launched Gracenote Content Connect, a new ad platform that provides agencies, brands, supply-side platforms (SSPs) and demand-si...
05/12/2025
NEW YORK In an most important update to the workings of deal-based programmatic advertising, IAB Tech Lab has released version 1.0 of its Deals API for public c...
05/12/2025
NEW YORK Pass the turkey. Pass the stuffing. Pass the cranberry sauce. All are common requests of Americans celebrating Thanksgiving Day with family and f...
05/12/2025
NEW YORK Iris, the new cloud-connected camera control platform, has officially launched with features that turn virtually any PTZ camera into a software-connect...
05/12/2025
HOLLYWOOD, Calif. Netflix announced today that it has entered into an agreement to acquire the assets of Warner Bros. for $82.7 billion....
05/12/2025
NEW YORK Iris, the new cloud-connected camera control platform, has officially launched with features that turn virtually any PTZ camera into a software-connect...
05/12/2025
WASHINGTON The Federal Communications Commission has approved AT&T's $1.02 billion acquisition of spectrum from UScellular in a decision that was issued sho...
05/12/2025
The Best Coldplay Songs: 21 Tracks That Shoot for the Stars From Yellow to Viva La Vida, Fix You to Paradise, this playlist goes back to the start.
December ...
05/12/2025
Zafris Lecture Series Brings Nabil Ayers to Berklee The 32nd annual James G. Zafris Distinguished Lecture series was held on Thursday, November 13, with guest...
05/12/2025
Introducing New Perks to Help You Get Even More from LinkedIn Premium Published on Dec 5, 2025 Categories: Company News, Product News
LinkedIn Corporate Co...
05/12/2025
Friday 5 December 2025
A new Game of Thrones Tale: Official trailer for Sky Exc...
05/12/2025
Back to All News
Don Lee, Lee Jin-uk, and Lalisa Manobal to Star in Netflix Act...
05/12/2025
Tis the season of giving once again and this year we've taken our Give Back Fridays' concept and turned it on its head.
In the autumn we were approach...
05/12/2025
Brayden Gogis doesn't remember a time when he wasn't completely fixated on games in all forms. In preschool, when they asked us to dress up as what we ...
05/12/2025
The Grinch steals the spotlight as the theme for The Late Late Toy Show 2025
Tune in tonight at 9:35pm on RT One and worldwide on RT Player
#LateLateToyShow...
05/12/2025
RT Announces New Presenters of Flagship News Programmes
New RT Six One News co-presenter Tommy Meskill
Sarah McInerney & Justin McCarthy join Morning Ir...
04/12/2025
ToolsOnAir Blackmagic Design HyperDeck Event Presets for just:in mac pro 2025 & ...
04/12/2025
ToolsOnAir AJA Ki Pro Event Presets for just:in mac pro 2025 & just:in linux
More Details:Starting with version 5.5, both just:in mac pro and just:in linux sol...
04/12/2025
Wangu Kanuri from Kenya and Godwin Asediba from Ghana are two of this years finalists for Thomsons Young Journalist of the Year Award. The pair are runners-up i...
04/12/2025
SVG Sit-Down: ProximaVision's Claudio Lisman on Why Tethered Drones Could Be...
04/12/2025
SVG Campus Shot Callers: Imry Halevi, Senior Associate Director of Athletics, Co...
04/12/2025
Platinum White Paper: LiveU Lightweight Sports Production: A Step Change in Spor...
04/12/2025
London to Riyadh: DAZN brings the boxing glamour to new production levels for Be...
04/12/2025
Analysis: Paramount bets on the battering ram' with Champions League play By Callum McCarthy, Editor-at-Large
Tuesday, December 2, 2025 - 10:12
Print ...
04/12/2025
Space City Home Network Launches SCHN DTC App for Astros and RocketsThe Rockets and Astros were previously the lone NBA and MLB teams without a DTC appBy Jason...
04/12/2025
SVG Summit 2025 Preview: Content Workflows Workshop Spotlights Evolution of Spor...
04/12/2025
New Sponsor Spotlight: Geotech's Patrick Wambold On the Unreal Engine Revolu...
04/12/2025
Curt Gowdy Jr. - Master Storyteller, Nationally and RegionallyBy Jason Dachman, Editorial Director, U.S.
Thursday, December 4, 2025 - 1:52 pm
Print This Sto...
04/12/2025
(L-R) Rebecca Lichtenfeld, Mohammadreza Eyni, Sara Khaki, and Judith Helfand att...
04/12/2025
SBS launches Future Frames initiative to support emerging First Nations video ed...