Sony Pixel Power calrec Sony

Think SMART: New NVIDIA Dynamo Integrations Simplify AI Inference at Data Center Scale

10/11/2025

Editor's note: This post is part of Think SMART, a series focused on how leading AI service providers, developers and enterprises can boost their inference performance and return on investment with the latest advancements from NVIDIA's full-stack inference platform.

AI models are becoming increasingly complex and collaborative through multi-agent workflows. To keep up, AI inference must now scale across entire clusters to serve millions of concurrent users and deliver faster responses.

Much like it did for large-scale AI training, Kubernetes - the industry standard for containerized application management - is well-positioned to manage the multi-node inference needed to support advanced models.

The NVIDIA Dynamo platform works together with Kubernetes to streamline the management of both single- and multi-node AI inference. Read on to learn how the shift to multi-node inference is driving performance, as well as how cloud platforms are putting these technologies to work.

Tapping Disaggregated Inference for Optimized Performance For AI models that fit on a single GPU or server, developers often run many identical replicas of the model in parallel across multiple nodes to deliver high throughput. In a recent paper, Russ Fellows, principal analyst at Signal65, showed that this approach achieved an industry-first record aggregate throughput of 1.1 million tokens per second with 72 NVIDIA Blackwell Ultra GPUs.

When scaling AI models to serve many concurrent users in real time, or when managing demanding workloads with long input sequences, using a technique called disaggregated serving unlocks further performance and efficiency gains.

Serving AI models involves two phases: processing the input prompt (prefill) and generating the output (decode). Traditionally, both phases run on the same GPUs, which can create inefficiencies and resource bottlenecks.

Disaggregated serving solves this by intelligently assigning these tasks to independently optimized GPUs. This approach ensures that each part of the workload runs with the optimization techniques best suited for it, maximizing overall performance. For today's large AI reasoning models, such as DeepSeek-R1, disaggregated serving is essential.

NVIDIA Dynamo seamlessly brings multi-node inference optimization features such as disaggregated serving to production scale across GPU clusters.

It's already delivering value.

Baseten, for example, used NVIDIA Dynamo to speed up inference serving for long-context code generation by 2x and increase throughput by 1.6x, all without incremental hardware costs. Such software-driven performance boosts enable AI providers to significantly reduce the costs to manufacture intelligence.

In addition, recent SemiAnalysis InferenceMAX benchmarks demonstrated that disaggregated serving with Dynamo on NVIDIA GB200 NVL72 systems delivers the lowest cost per million tokens for mixture-of-experts reasoning models like DeepSeek-R1, among platforms tested.

Scaling Disaggregated Inference in the Cloud As disaggregated serving scales across dozens or even hundreds of nodes for enterprise-scale AI deployments, Kubernetes provides the critical orchestration layer. With NVIDIA Dynamo now integrated into managed Kubernetes services from all major cloud providers, customers can scale multi-node inference across NVIDIA Blackwell systems, including GB200 and GB300 NVL72, with the performance, flexibility and reliability that enterprise AI deployments demand.

Amazon Web Services is accelerating generative AI inference for its customers with NVIDIA Dynamo and integrated with Amazon EKS.

Google Cloud is providing a NVIDIA Dynamo recipe to optimize large language model (LLM) inference at enterprise scale on its AI Hypercomputer.

OCI is enabling multi-node LLM inferencing with OCI Superclusters and NVIDIA Dynamo.

The push towards enabling large-scale, multi-node inference extends beyond hyperscalers.

Nebius, for example, is designing its cloud to serve inference workloads at scale, built on NVIDIA accelerated computing infrastructure and working with NVIDIA Dynamo as an ecosystem partner.

Simplifying Inference on Kubernetes With NVIDIA Grove in NVIDIA Dynamo Disaggregated AI inference requires coordinating a team of specialized components - prefill, decode, routing and more - each with different needs. The challenge for Kubernetes is no longer about running more parallel copies of a model, but rather about masterfully conducting these distinct components as one cohesive, high-performance system.

NVIDIA Grove, an application programming interface now available within NVIDIA Dynamo, allows users to provide a single, high-level specification that describes their entire inference system.

For example, in that single specification, a user could simply declare their requirements: I need three GPU nodes for prefill and six GPU nodes for decode, and I require all nodes for a single model replica to be placed on the same high-speed interconnect for the quickest possible response.

From that specification, Grove automatically handles all the intricate coordination: scaling related components together while maintaining correct ratios and dependencies, starting them in the right order and placing them strategically across the cluster for fast, efficient communication. Learn more about how to get started with NVIDIA Grove in this technical deep dive.

As AI inference becomes increasingly distributed, the combination of Kubernetes and NVIDIA Dynamo with NVIDIA Grove simplifies how developers build and scale intelligent applications.

Explore how these technologies come together to make cluster-scale AI easy and production-ready by joining NVIDIA at KubeCon, running through Thursday, Nov. 13, in Atlanta.
LINK: https://blogs.nvidia.com/blog/think-smart-dynamo-ai-inference-data-cen...
See more stories from nvidia

Most recent headlines

25/12/2025

Make Spirits Bright With Holiday Hits on GeForce NOW

Holiday lights are twinkling, hot cocoa's on the stove and gamers are settling in for a well-earned break. Whether staying in or heading on a winter getawa...

24/12/2025

What is AI good for?

What is AI good for? Posted by MTI Film on December 24, 2025 What is AI good for? What is AI good for? It's been three years since ChatGPT first cap...

24/12/2025

AI in 2026: More Collaboration, Less Hype

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

24/12/2025

Carr Lays Out FCCs 'Key Wins in 2025'

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

24/12/2025

CES: Cineverse Unveils New Features for Cinesearch

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

24/12/2025

IES, AES Promote Graham Kirk, Brienne Willcock

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

24/12/2025

Ad Tech and CTV Experts Forecast 2026's Biggest Trends

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

24/12/2025

The Boyfriend' Season 2 Unveils Heartwarming Trailer, Key Art, and Participants' Profiles

Back to All News The Boyfriend' Season 2 Unveils Heartwarming Trailer, Key...

24/12/2025

Love, Fights, and Everything in Between: Badly in Love' Returns for Season 2

Back to All News Love, Fights, and Everything in Between: Badly in Love' Returns for Season 2 Entertainment 24 December 2025 GlobalJapan Link copied t...

24/12/2025

December 23, 2025

Scripps Research study links sleep variability with sleep apnea and hypertension How consumers' digital activity trackers could enable personalized health s...

23/12/2025

How guilas Cibaeas Dominican Winter League Games Are Locally Produced for Global Audience

How guilas Cibae as Dominican Winter League Games Are Locally Produced for Glob...

23/12/2025

CAMB.AI Enables European Athletics to Offer Multi-Language Support

CAMB.AI Enables European Athletics to Offer Multi-Language SupportPlan is to eventually offer translation into all languages spoken in EuropeBy Ken Kerschbaumer...

23/12/2025

Analysis: As Sports Media Values Trend Negative, Scarcity and Quality Are King

Analysis: As sports media values trend negative, scarcity and quality are king By Callum McCarthy, Editor-at-Large Monday, December 22, 2025 - 14:08 Print ...

23/12/2025

ESPN, Disney, and NBA Return to the Animated Altcast Fray With Second Edition of Dunk the Halls'

ESPN, Disney, and NBA Return to the Animated Altcast Fray With Second Edition of...

23/12/2025

End the Year on a High Note and Donate to the Sports Broadcasting Fund Today!

End the Year on a High Note and Donate to the Sports Broadcasting Fund Today! By Ken Kerschbaumer, Editorial Director Tuesday, December 23, 2025 - 12:25 pm ...

23/12/2025

Find Your Perfect Holiday Romance Listen With These Swoon-Worthy Audiobooks

The year is winding down, the weather outside is frightful, and it's the perfect time to escape into a story that warms the heart. For listeners looking for...

23/12/2025

L3Harris Receives Letter of Intent from Kratos Defense for Production of Large Hypersonic Solid Rocket Motors

A Zeus motor is hot fire tested at L3Harris' Camden, Arkansas, solid rocket ...

23/12/2025

FCC Bans All New Foreign-Made Drones

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

23/12/2025

Gray Media Renews Its NBC Affiliation Agreements

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

23/12/2025

Lightware to showcase breakthrough Google Meet and TPN MM...

Lightware will exhibit several major product innovations at ISE 2026, including the new USB-C BOOSTER-V1, Google Meet. integration for various Taurus UCX models...

23/12/2025

Nielsen, Roku Expand Measurement Partnership

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

23/12/2025

PwC: Streaming Market Shifting to 'Scale and Sustainability'

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

23/12/2025

Inside the Gray Innovation Lab

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

23/12/2025

ESPN Renews Deal for Heisman Trophy Coverage

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

23/12/2025

Gray Media to Acquire WBBJ from Bahakel Communications

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

23/12/2025

Taking the Stage at Carnegie Hall-On a Global Scale

Taking the Stage at Carnegie Hall-On a Global Scale Boston Conservatory Orchestra students reflect on their epic concert marking the 80th session of the UN Gene...

23/12/2025

Netflix's 'The Great Flood' and 'Culinary Class Wars 2' Top Global Charts Simultaneously

Back to All News Netflix's The Great Flood and Culinary Class Wars 2 Top Gl...

23/12/2025

'Stranger Things' By the Numbers: How the Global Phenomenon Shaped Culture

Back to All News Stranger Things By the Numbers: How the Global Phenomenon Shap...

23/12/2025

Boost Performance with a System Effectiveness Review

Experience the power of WO Automation for Radio's newest service, the System Effectiveness Review. Designed to help you achieve more, a System Effectiveness...

23/12/2025

VEON's Beeline Kazakhstan and Rakuten Symphony Collaborate to Advance Next-Generation Connectivity and Digital Infrastructure

23 Dec 2025 VEON's Beeline Kazakhstan and Rakuten Symphony Collaborate to A...

23/12/2025

How Steamy Can It Get? Single's Inferno' Season 5 Premieres January 20, Previews All-Out Flirting War in Sizzling Teaser

Back to All News How Steamy Can It Get? Single's Inferno' Season 5 Pre...

23/12/2025

33 Million Global Viewers on Netflix Watched Jake Paul vs. Anthony Joshua's Epic Six-Round Battle

Back to All News 33 Million Global Viewers on Netflix Watched Jake Paul vs. Ant...

23/12/2025

December 22, 2025

New technique lights up where drugs go in the body, cell by cell Scripps Research scientists developed a technique that maps drug binding in individual cells th...

22/12/2025

SVG New Sponsor Spotlight: Presidio's Neerav Shah on the Role of Its Captivate and Resonate Platforms in Sports Production

SVG New Sponsor Spotlight: Presidio's Neerav Shah on the Role of Its Captiva...

22/12/2025

Hitting the Bullseye: Sky Sports Readies Itself for the Biggest PDC World Darts Championship to Hit Ally Pally Yet

Hitting the bullseye: Sky Sports readies itself for the biggest PDC World Darts ...

22/12/2025

Unique Skillset: Bringing New Directors to the World of Darts at The Worlds with Sky Sports

Unique skillset: Bringing new directors to the world of darts at The Worlds with...

22/12/2025

Gravity Media Prepares for a Flight of Fancy With the PDC World Darts Championship 2025 for Sky Sports

Gravity Media prepares for a flight of fancy with the PDC World Darts Championsh...

22/12/2025

One Hundred and Eighty: Gravity Media on Hitting the Production Bullseye at the World Darts Championship 2025

One hundred and eighty: Gravity Media on hitting the production bullseye at the ...

22/12/2025

The Famous Group's Jon Slusser on Fascinating Fans Through Immersive Content Experiences

The Famous Group's Jon Slusser on Fascinating Fans Through Immersive Content...

22/12/2025

ESPN's Meg Aronowitz on Continuing High-Quality Broadcasts of Collegiate Sports, Expanding Growth of Internal Production Team

ESPN's Meg Aronowitz on Continuing High-Quality Broadcasts of Collegiate Spo...

22/12/2025

ESPN Takes Data-Driven Storytelling to New Heights with MNF Playbook with Next Gen Stats' NFL Altcasts

ESPN Takes Data-Driven Storytelling to New Heights with MNF Playbook with Next ...

22/12/2025

A Decade of Giving: Fest & Flauschig' Christmas Circus Celebrates Record Turnout and Generosity

For a decade, popular German podcast Fest & Flauschig has hosted an annual Chris...

22/12/2025

Paramount and Netflix Boast Double-Digit Gains in Nielsen's November Media Distributor Gauge

Paramount Scores Largest Share Increase Among Distributors as Paramount and CBS...

22/12/2025

Nielsen and Roku Expand Strategic Measurement Partnership

New multi-year deal integrates Roku's data to fuel Nielsen's measurement suite Roku gains access to Nielsen's streaming ratings, showing The Roku C...

22/12/2025

Allen Media Group to Deploy Infillion TrueX for Streaming Services

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

22/12/2025

Berklee Wrapped 2025: Our Top News and Stories

Berklee Wrapped 2025: Our Top News and Stories A look back at a year highlighted by faculty milestones, major film and television projects, Bob Dylan's ho...

22/12/2025

Marine Biological Laboratory Explores Human Memory With AI and Virtual Reality

The works of Plato state that when humans have an experience, some level of change occurs in their brain, which is powered by memory - specifically long-term me...