Sony Pixel Power calrec Sony

Think SMART: New NVIDIA Dynamo Integrations Simplify AI Inference at Data Center Scale

10/11/2025

Editor's note: This post is part of Think SMART, a series focused on how leading AI service providers, developers and enterprises can boost their inference performance and return on investment with the latest advancements from NVIDIA's full-stack inference platform.

AI models are becoming increasingly complex and collaborative through multi-agent workflows. To keep up, AI inference must now scale across entire clusters to serve millions of concurrent users and deliver faster responses.

Much like it did for large-scale AI training, Kubernetes - the industry standard for containerized application management - is well-positioned to manage the multi-node inference needed to support advanced models.

The NVIDIA Dynamo platform works together with Kubernetes to streamline the management of both single- and multi-node AI inference. Read on to learn how the shift to multi-node inference is driving performance, as well as how cloud platforms are putting these technologies to work.

Tapping Disaggregated Inference for Optimized Performance For AI models that fit on a single GPU or server, developers often run many identical replicas of the model in parallel across multiple nodes to deliver high throughput. In a recent paper, Russ Fellows, principal analyst at Signal65, showed that this approach achieved an industry-first record aggregate throughput of 1.1 million tokens per second with 72 NVIDIA Blackwell Ultra GPUs.

When scaling AI models to serve many concurrent users in real time, or when managing demanding workloads with long input sequences, using a technique called disaggregated serving unlocks further performance and efficiency gains.

Serving AI models involves two phases: processing the input prompt (prefill) and generating the output (decode). Traditionally, both phases run on the same GPUs, which can create inefficiencies and resource bottlenecks.

Disaggregated serving solves this by intelligently assigning these tasks to independently optimized GPUs. This approach ensures that each part of the workload runs with the optimization techniques best suited for it, maximizing overall performance. For today's large AI reasoning models, such as DeepSeek-R1, disaggregated serving is essential.

NVIDIA Dynamo seamlessly brings multi-node inference optimization features such as disaggregated serving to production scale across GPU clusters.

It's already delivering value.

Baseten, for example, used NVIDIA Dynamo to speed up inference serving for long-context code generation by 2x and increase throughput by 1.6x, all without incremental hardware costs. Such software-driven performance boosts enable AI providers to significantly reduce the costs to manufacture intelligence.

In addition, recent SemiAnalysis InferenceMAX benchmarks demonstrated that disaggregated serving with Dynamo on NVIDIA GB200 NVL72 systems delivers the lowest cost per million tokens for mixture-of-experts reasoning models like DeepSeek-R1, among platforms tested.

Scaling Disaggregated Inference in the Cloud As disaggregated serving scales across dozens or even hundreds of nodes for enterprise-scale AI deployments, Kubernetes provides the critical orchestration layer. With NVIDIA Dynamo now integrated into managed Kubernetes services from all major cloud providers, customers can scale multi-node inference across NVIDIA Blackwell systems, including GB200 and GB300 NVL72, with the performance, flexibility and reliability that enterprise AI deployments demand.

Amazon Web Services is accelerating generative AI inference for its customers with NVIDIA Dynamo and integrated with Amazon EKS.

Google Cloud is providing a NVIDIA Dynamo recipe to optimize large language model (LLM) inference at enterprise scale on its AI Hypercomputer.

OCI is enabling multi-node LLM inferencing with OCI Superclusters and NVIDIA Dynamo.

The push towards enabling large-scale, multi-node inference extends beyond hyperscalers.

Nebius, for example, is designing its cloud to serve inference workloads at scale, built on NVIDIA accelerated computing infrastructure and working with NVIDIA Dynamo as an ecosystem partner.

Simplifying Inference on Kubernetes With NVIDIA Grove in NVIDIA Dynamo Disaggregated AI inference requires coordinating a team of specialized components - prefill, decode, routing and more - each with different needs. The challenge for Kubernetes is no longer about running more parallel copies of a model, but rather about masterfully conducting these distinct components as one cohesive, high-performance system.

NVIDIA Grove, an application programming interface now available within NVIDIA Dynamo, allows users to provide a single, high-level specification that describes their entire inference system.

For example, in that single specification, a user could simply declare their requirements: I need three GPU nodes for prefill and six GPU nodes for decode, and I require all nodes for a single model replica to be placed on the same high-speed interconnect for the quickest possible response.

From that specification, Grove automatically handles all the intricate coordination: scaling related components together while maintaining correct ratios and dependencies, starting them in the right order and placing them strategically across the cluster for fast, efficient communication. Learn more about how to get started with NVIDIA Grove in this technical deep dive.

As AI inference becomes increasingly distributed, the combination of Kubernetes and NVIDIA Dynamo with NVIDIA Grove simplifies how developers build and scale intelligent applications.

Explore how these technologies come together to make cluster-scale AI easy and production-ready by joining NVIDIA at KubeCon, running through Thursday, Nov. 13, in Atlanta.
LINK: https://blogs.nvidia.com/blog/think-smart-dynamo-ai-inference-data-cen...
See more stories from nvidia

North America Stories

11/03/2026

Calrec To Unlock Hybrid Workflows At 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

11/03/2026

Matrox Video Enables the Next Era of Software-Defined Med...

Matrox Video will showcase its vision for the future of live production at NAB 2026 in Las Vegas, April 19-22, highlighting how broadcasters and media organizat...

11/03/2026

GlobalM Showcases Distributed Video Gateway Architecture...

Geneva-based technology company, GlobalM SA, is presenting its GMX Distributed Video Gateway, a software-defined IP media transport platform designed to replace...

11/03/2026

Video is King - 2026 Iconik Media Stats Report Finds Vide...

Backlight (booth #N2829), the company behind Iconik and Wildmoka, which power video workflows for large media and entertainment organizations, has released the ...

11/03/2026

QuickLinks Latest StudioEdge Models to Make North America...

QuickLink, a leading provider of award-winning video production and remote guest contribution solutions, presents its latest StudioEdge models at The NAB Show ...

11/03/2026

Telestream Expands Its Cloud Services with the Introducti...

Telestream, a global leader in media workflow technologies, today announced the expansion of Telestream Cloud Services with the introduction of UP, a new cloud-...

11/03/2026

Operative Launches AOS Configuration for Digital-First Mo...

Operative, the preferred advertising management provider for the world's leading media brands, today announced the launch of AOS for digital media, an AI-po...

11/03/2026

Calrec Redefines Broadcast Workflows at NAB 2026

Calrec will be located in Central Hall, on Booth C6907 Choice without compromise The broadcast industry is going through a rapid evolution that s signalling a...

11/03/2026

Worldstream and Cubbit launch sovereign S3 cloud storage...

The new service is hosted and operated entirely in the Netherlands, combining data sovereignty, resilience, scalability, and predictable costs without relying...

11/03/2026

Ease Live powers interactive Premier Padel experiences on...

Ease Live, an Evertz company and leader in interactive graphical overlays, today announced the successful deployment of its platform on Red Bull TV for Premier ...

11/03/2026

Mediagenix Title Management Accelerates Content Monetizat...

Mediagenix, a global leader in smart content solutions to profitably connect the right content to the right audience, is advancing its Semantic Intelligence cap...

11/03/2026

Emergent Launches Fusion- The Interactive Anything Platfo...

Emergent, a leading provider of AI-enhanced media production solutions, today announced the official launch of Fusion, a powerful, no-code application builder d...

11/03/2026

Techex Names Matt McKee as Senior Director of Sales, Americas

Share Copy link Facebook X Linkedin Bluesky Email...

11/03/2026

IAB Tech Lab Announces Content Monetization Protocol for AI LLMs

Share Copy link Facebook X Linkedin Bluesky Email...

11/03/2026

Mondae Hott Joins Kokusai Denki as Northeast Sales Manager

Share Copy link Facebook X Linkedin Bluesky Email...

11/03/2026

Gray Media to Air Cincinnati Reds' Games on WXIX FOX19

Share Copy link Facebook X Linkedin Bluesky Email...

11/03/2026

Shure Audio Solutions Deliver Super Bowl Win

Share Copy link Facebook X Linkedin Bluesky Email...

11/03/2026

UK's First Live Broadcast Using New n40 Private 5G Spectrum

Share Copy link Facebook X Linkedin Bluesky Email...

11/03/2026

Utah Scientific Expands Technology Partner Program With I...

Utah Scientific today announced the expansion of its Technology Partner Program with the addition of Audinate, Bitfocus, and Skaarhoj, three industry leaders wh...

11/03/2026

DigitalGlue Ends the Post Production Tax creativespace In...

DigitalGlue, creator of the creative.space on-premise managed storage platform, today revealed plans to launch creative.space Intelligence (CSI) at NAB 2026 (Bo...

11/03/2026

Maxon and Tencent Cloud Partner to Integrate HY 3D into C...

Maxon, maker of powerful, approachable software solutions for creators working in 2D and 3D design, motion graphics, visual effects, gaming, and more, has annou...

11/03/2026

NUGEN Audio Halo Vision Plug In Serves as Spatial Compass...

Composer and Re-recording Mixer Michael Phillips Keeley has built his career around immersive storytelling. Working from his Dolby Atmos-equipped studio, Sound ...

11/03/2026

YES selects Synamedia Iris to power advanced advertising

Leading video software provider Synamedia today announced that YES, the pay-TV subsidiary of the telco Bezeq (TASE: BEZQ), has selected Synamedia Iris to delive...

11/03/2026

Cost Savings Scalability and Smarter Monetization Viacces...

As media companies face increasing cost pressures and operational complexity, at the 2026 NAB Show in Las Vegas, Viaccess-Orca (VO), a global leader in OTT / TV...

11/03/2026

Digital Alert Systems Unveils Version 6 Software for DASD...

Digital Alert Systems, a global leader in emergency communications solutions for media providers, today announced the release of Version 6 software for its DASD...

11/03/2026

Foundry releases Nuke 17.0

Foundry releases Nuke 17.0 Brie Clayton March 1, 2026 0 Comments Native Gaussian Splat support, new 3D system based on USD, expanded machine learning ca...

11/03/2026

Preserving UNESCO World Heritage with URSA Cine Immersive

Preserving UNESCO World Heritage with URSA Cine Immersive Brie Clayton March 1, 2026 0 Comments The Explorers turned to France's cultural landmark...

11/03/2026

I Clicked This By Accident And It Made After Effects SO Much Faster

I Clicked This By Accident And It Made After Effects SO Much Faster Graham Quince March 1, 2026 0 Comments Discover how Region of Interest in Adobe A...

11/03/2026

Cine Gear Connect Brings a Focused All-Day Experience to Industry City, NY

Cine Gear Connect Brings a Focused All-Day Experience to Industry City, NY Brie Clayton March 4, 2026 0 Comments Registration is now open for Cine Gea...

11/03/2026

La Vorgine Edited and Finished with DaVinci Resolve Studio

La Vor gine Edited and Finished with DaVinci Resolve Studio Brie Clayton March 4, 2026 0 Comments One of Colombia's most ambitious projects goes g...

11/03/2026

SoundMarket Launches 18,000+ Tracks of Real Music by Award-Winning Composers for Editors and Post Professionals

SoundMarket Launches 18,000 Tracks of Real Music by Award-Winning Composers for...

11/03/2026

Capta Center Supports NOVO19 Remote Production with Blackmagic Design

Capta Center Supports NOVO19 Remote Production with Blackmagic Design Brie Clayton March 5, 2026 0 Comments The facility provides production and playo...

11/03/2026

DigitalGlue Ends the Post-Production Tax: creative.space Intelligence (CSI) Unifies On-Premise Storage with Forensic AI at NAB 2026

DigitalGlue Ends the Post-Production Tax: creative.space Intelligence (CSI) Unif...

11/03/2026

Kochi Sun Sun Uses Blackmagic Replay for High School Volleyball Finals

Kochi Sun Sun Uses Blackmagic Replay for High School Volleyball Finals Brie Clayton March 9, 2026 0 Comments Versatile Blackmagic Replay system proves...

11/03/2026

Richard Bona Joins Berklee for Signature Series Concert

Richard Bona Joins Berklee for Signature Series Concert The Grammy-winning Cameroonian bassist and vocalist collaborates with students and faculty in a progra...

11/03/2026

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

Launched today, NVIDIA Nemotron 3 Super is a 120 billion parameter open model with 12 billion active parameters designed to run complex agentic AI systems at sc...

11/03/2026

March 10, 2026

Scripps Research scientists awarded nearly $5 million by NIH to investigate cancer growth Researchers will investigate how a common dietary nutrient may control...

10/03/2026

Harvey Arnold, Bert Goldman to Be Honored at the 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

10/03/2026

Senators Urge FCC to Preserve Citizens Broadband Radio Service

Share Copy link Facebook X Linkedin Bluesky Email...

10/03/2026

SCTE TechExpo26 Issues Call for Content, Technical Papers

Share Copy link Facebook X Linkedin Bluesky Email...

10/03/2026

Zefr Receives MRC Accreditation

Share Copy link Facebook X Linkedin Bluesky Email...

10/03/2026

Study: Overloaded Sports Fans Fed Up with Fragmented Viewing Options

Share Copy link Facebook X Linkedin Bluesky Email...

10/03/2026

NVIDIA and ComfyUI Streamline Local AI Video Generation for Game Developers and Creators at GDC

Game developers and artists are building cinematic worlds and iconic characters ...

10/03/2026

NVIDIA Virtualizes Game Development With RTX PRO Server

Game development teams are working across larger worlds, more complex pipelines and more distributed teams than ever. At the same time, many studios still rely ...

10/03/2026

As Open Models Spark AI Boom, NVIDIA Jetson Brings It to Life at the Edge

The Cat 306 CR mini-excavator weighs just under eight tons and fits inside a standard shipping container. It's the machine a contractor rents when the job s...

10/03/2026

NVIDIA and Thinking Machines Lab Announce Long-Term Gigawatt-Scale Strategic Partnership

NVIDIA and Thinking Machines Lab announced today a multiyear strategic partnersh...

09/03/2026

Foos Gone Wild, Combate Global Launch New Televised MMA Fight Series

Foos Gone Wild and Combate Global have teamed up to create a twist on combat sports competition, announcing the launch of a special amateur Mixed Martial Arts (...

09/03/2026

Harmonic Accelerates Streaming and Broadcast Transformations

At the 2026 NAB Show, Harmonic will introduce significant enhancements to its video appliances and SaaS solutions, highlighted by a next-generation media server...

09/03/2026

ESPN Delivers Most-Watched MLB Spring Training Game in 10 years with Team USA vs. San Francisco Giants

ESPN's March 3 spring training matchup between Team USA and the San Francisc...

09/03/2026

Most Valuable Promotions Launches Women's Boxing Platform, Signs Multi-Year Deal with ESPN

Most Valuable Promotions (MVP) announces the launch of MVPW, a new global platfo...