
Editor's note: This post is part of Think SMART, a series focused on how leading AI service providers, developers and enterprises can boost their inference performance and return on investment with the latest advancements from NVIDIA's full-stack inference platform.
AI models are becoming increasingly complex and collaborative through multi-agent workflows. To keep up, AI inference must now scale across entire clusters to serve millions of concurrent users and deliver faster responses.
Much like it did for large-scale AI training, Kubernetes - the industry standard for containerized application management - is well-positioned to manage the multi-node inference needed to support advanced models.
The NVIDIA Dynamo platform works together with Kubernetes to streamline the management of both single- and multi-node AI inference. Read on to learn how the shift to multi-node inference is driving performance, as well as how cloud platforms are putting these technologies to work.
Tapping Disaggregated Inference for Optimized Performance For AI models that fit on a single GPU or server, developers often run many identical replicas of the model in parallel across multiple nodes to deliver high throughput. In a recent paper, Russ Fellows, principal analyst at Signal65, showed that this approach achieved an industry-first record aggregate throughput of 1.1 million tokens per second with 72 NVIDIA Blackwell Ultra GPUs.
When scaling AI models to serve many concurrent users in real time, or when managing demanding workloads with long input sequences, using a technique called disaggregated serving unlocks further performance and efficiency gains.
Serving AI models involves two phases: processing the input prompt (prefill) and generating the output (decode). Traditionally, both phases run on the same GPUs, which can create inefficiencies and resource bottlenecks.
Disaggregated serving solves this by intelligently assigning these tasks to independently optimized GPUs. This approach ensures that each part of the workload runs with the optimization techniques best suited for it, maximizing overall performance. For today's large AI reasoning models, such as DeepSeek-R1, disaggregated serving is essential.
NVIDIA Dynamo seamlessly brings multi-node inference optimization features such as disaggregated serving to production scale across GPU clusters.
It's already delivering value.
Baseten, for example, used NVIDIA Dynamo to speed up inference serving for long-context code generation by 2x and increase throughput by 1.6x, all without incremental hardware costs. Such software-driven performance boosts enable AI providers to significantly reduce the costs to manufacture intelligence.
In addition, recent SemiAnalysis InferenceMAX benchmarks demonstrated that disaggregated serving with Dynamo on NVIDIA GB200 NVL72 systems delivers the lowest cost per million tokens for mixture-of-experts reasoning models like DeepSeek-R1, among platforms tested.
Scaling Disaggregated Inference in the Cloud As disaggregated serving scales across dozens or even hundreds of nodes for enterprise-scale AI deployments, Kubernetes provides the critical orchestration layer. With NVIDIA Dynamo now integrated into managed Kubernetes services from all major cloud providers, customers can scale multi-node inference across NVIDIA Blackwell systems, including GB200 and GB300 NVL72, with the performance, flexibility and reliability that enterprise AI deployments demand.
Amazon Web Services is accelerating generative AI inference for its customers with NVIDIA Dynamo and integrated with Amazon EKS.
Google Cloud is providing a NVIDIA Dynamo recipe to optimize large language model (LLM) inference at enterprise scale on its AI Hypercomputer.
OCI is enabling multi-node LLM inferencing with OCI Superclusters and NVIDIA Dynamo.
The push towards enabling large-scale, multi-node inference extends beyond hyperscalers.
Nebius, for example, is designing its cloud to serve inference workloads at scale, built on NVIDIA accelerated computing infrastructure and working with NVIDIA Dynamo as an ecosystem partner.
Simplifying Inference on Kubernetes With NVIDIA Grove in NVIDIA Dynamo Disaggregated AI inference requires coordinating a team of specialized components - prefill, decode, routing and more - each with different needs. The challenge for Kubernetes is no longer about running more parallel copies of a model, but rather about masterfully conducting these distinct components as one cohesive, high-performance system.
NVIDIA Grove, an application programming interface now available within NVIDIA Dynamo, allows users to provide a single, high-level specification that describes their entire inference system.
For example, in that single specification, a user could simply declare their requirements: I need three GPU nodes for prefill and six GPU nodes for decode, and I require all nodes for a single model replica to be placed on the same high-speed interconnect for the quickest possible response.
From that specification, Grove automatically handles all the intricate coordination: scaling related components together while maintaining correct ratios and dependencies, starting them in the right order and placing them strategically across the cluster for fast, efficient communication. Learn more about how to get started with NVIDIA Grove in this technical deep dive.
As AI inference becomes increasingly distributed, the combination of Kubernetes and NVIDIA Dynamo with NVIDIA Grove simplifies how developers build and scale intelligent applications.
Explore how these technologies come together to make cluster-scale AI easy and production-ready by joining NVIDIA at KubeCon, running through Thursday, Nov. 13, in Atlanta.
North America Stories
11/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/03/2026
Matrox Video will showcase its vision for the future of live production at NAB 2026 in Las Vegas, April 19-22, highlighting how broadcasters and media organizat...
11/03/2026
Geneva-based technology company, GlobalM SA, is presenting its GMX Distributed Video Gateway, a software-defined IP media transport platform designed to replace...
11/03/2026
Backlight (booth #N2829), the company behind Iconik and Wildmoka, which power video workflows for large media and entertainment organizations, has released the ...
11/03/2026
QuickLink, a leading provider of award-winning video production and remote guest contribution solutions, presents its latest StudioEdge models at The NAB Show ...
11/03/2026
Telestream, a global leader in media workflow technologies, today announced the expansion of Telestream Cloud Services with the introduction of UP, a new cloud-...
11/03/2026
Operative, the preferred advertising management provider for the world's leading media brands, today announced the launch of AOS for digital media, an AI-po...
11/03/2026
Calrec will be located in Central Hall, on Booth C6907
Choice without compromise
The broadcast industry is going through a rapid evolution that s signalling a...
11/03/2026
The new service is hosted and operated entirely in the Netherlands, combining data sovereignty, resilience, scalability, and predictable costs without relying...
11/03/2026
Ease Live, an Evertz company and leader in interactive graphical overlays, today announced the successful deployment of its platform on Red Bull TV for Premier ...
11/03/2026
Mediagenix, a global leader in smart content solutions to profitably connect the right content to the right audience, is advancing its Semantic Intelligence cap...
11/03/2026
Emergent, a leading provider of AI-enhanced media production solutions, today announced the official launch of Fusion, a powerful, no-code application builder d...
11/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/03/2026
Utah Scientific today announced the expansion of its Technology Partner Program with the addition of Audinate, Bitfocus, and Skaarhoj, three industry leaders wh...
11/03/2026
DigitalGlue, creator of the creative.space on-premise managed storage platform, today revealed plans to launch creative.space Intelligence (CSI) at NAB 2026 (Bo...
11/03/2026
Maxon, maker of powerful, approachable software solutions for creators working in 2D and 3D design, motion graphics, visual effects, gaming, and more, has annou...
11/03/2026
Composer and Re-recording Mixer Michael Phillips Keeley has built his career around immersive storytelling. Working from his Dolby Atmos-equipped studio, Sound ...
11/03/2026
Leading video software provider Synamedia today announced that YES, the pay-TV subsidiary of the telco Bezeq (TASE: BEZQ), has selected Synamedia Iris to delive...
11/03/2026
As media companies face increasing cost pressures and operational complexity, at the 2026 NAB Show in Las Vegas, Viaccess-Orca (VO), a global leader in OTT / TV...
11/03/2026
Digital Alert Systems, a global leader in emergency communications solutions for media providers, today announced the release of Version 6 software for its DASD...
11/03/2026
Foundry releases Nuke 17.0
Brie Clayton March 1, 2026
0 Comments
Native Gaussian Splat support, new 3D system based on USD, expanded machine learning ca...
11/03/2026
Preserving UNESCO World Heritage with URSA Cine Immersive
Brie Clayton March 1, 2026
0 Comments
The Explorers turned to France's cultural landmark...
11/03/2026
I Clicked This By Accident And It Made After Effects SO Much Faster
Graham Quince March 1, 2026
0 Comments
Discover how Region of Interest in Adobe A...
11/03/2026
Cine Gear Connect Brings a Focused All-Day Experience to Industry City, NY
Brie Clayton March 4, 2026
0 Comments
Registration is now open for Cine Gea...
11/03/2026
La Vor gine Edited and Finished with DaVinci Resolve Studio
Brie Clayton March 4, 2026
0 Comments
One of Colombia's most ambitious projects goes g...
11/03/2026
SoundMarket Launches 18,000 Tracks of Real Music by Award-Winning Composers for...
11/03/2026
Capta Center Supports NOVO19 Remote Production with Blackmagic Design
Brie Clayton March 5, 2026
0 Comments
The facility provides production and playo...
11/03/2026
DigitalGlue Ends the Post-Production Tax: creative.space Intelligence (CSI) Unif...
11/03/2026
Kochi Sun Sun Uses Blackmagic Replay for High School Volleyball Finals
Brie Clayton March 9, 2026
0 Comments
Versatile Blackmagic Replay system proves...
11/03/2026
Richard Bona Joins Berklee for Signature Series Concert The Grammy-winning Cameroonian bassist and vocalist collaborates with students and faculty in a progra...
11/03/2026
Launched today, NVIDIA Nemotron 3 Super is a 120 billion parameter open model with 12 billion active parameters designed to run complex agentic AI systems at sc...
11/03/2026
Scripps Research scientists awarded nearly $5 million by NIH to investigate cancer growth Researchers will investigate how a common dietary nutrient may control...
10/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
10/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
10/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
10/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
10/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
10/03/2026
Game developers and artists are building cinematic worlds and iconic characters ...
10/03/2026
Game development teams are working across larger worlds, more complex pipelines and more distributed teams than ever. At the same time, many studios still rely ...
10/03/2026
The Cat 306 CR mini-excavator weighs just under eight tons and fits inside a standard shipping container. It's the machine a contractor rents when the job s...
10/03/2026
NVIDIA and Thinking Machines Lab announced today a multiyear strategic partnersh...
09/03/2026
Foos Gone Wild and Combate Global have teamed up to create a twist on combat sports competition, announcing the launch of a special amateur Mixed Martial Arts (...
09/03/2026
At the 2026 NAB Show, Harmonic will introduce significant enhancements to its video appliances and SaaS solutions, highlighted by a next-generation media server...
09/03/2026
ESPN's March 3 spring training matchup between Team USA and the San Francisc...
09/03/2026
Most Valuable Promotions (MVP) announces the launch of MVPW, a new global platfo...