
Inference has emerged as the new frontier of complexity in AI. Modern models are evolving into agentic systems capable of multi-step reasoning, persistent memory, and long-horizon context-enabling them to tackle complex tasks across domains such as software development, video generation, and deep research. These workloads place unprecedented demands on infrastructure, introducing new challenges in compute, memory, and networking that require a fundamental rethinking of how inference is scaled and optimized.
Among these challenges, processing massive context for a specific class of workloads has become increasingly critical. In software development, for example, AI systems must reason over entire codebases, maintain cross-file dependencies, and understand repository-level structure-transforming coding assistants from autocomplete tools into intelligent collaborators. Similarly, long-form video and research applications demand sustained coherence and memory across millions of tokens. These requirements are pushing the boundaries of what current infrastructure can support.
To address this shift, the NVIDIA SMART framework provides a path forward-optimizing inference across scale, multidimensional performance, architecture, ROI, and the broader technology ecosystem. It emphasizes a full-stack disaggregated infrastructure that enables efficient allocation of compute and memory resources. Platforms like NVIDIA Blackwell and NVIDIA GB200 NVL72, combined with NVFP4 for low-precision inference and open source software such as NVIDIA TensorRT-LLM and NVIDIA Dynamo, are redefining inference performance across the AI landscape.
This blog explores the next evolution in disaggregated inference infrastructure and introduces NVIDIA Rubin CPX-a purpose-built GPU designed to meet the demands of long-context AI workloads with greater efficiency and ROI.
Disaggregated inference: a scalable approach to AI complexity Inference consists of two distinct phases: the context phase and the generation phase, each placing fundamentally different demands on infrastructure. The context phase is compute-bound, requiring high-throughput processing to ingest and analyze large volumes of input data to produce the first token output result. In contrast, the generation phase is memory bandwidth-bound, relying on fast memory transfers and high-speed interconnects, such as NVLink, to sustain token-by-token output performance.
Disaggregated inference enables these phases to be processed independently, enabling targeted optimization of compute and memory resources. This architectural shift improves throughput, reduces latency, and enhances overall resource utilization (Figure 1).
data-src=https://developer-blogs.nvidia.com/wp-content/uploads/2025/09/Disaggregated-inference.gif alt=Diagram of a disaggregated inference pipeline. Documents/databases/videos feed a context processor (shown as GPU B with a swap to GPU A); its output goes to a key-value cache read by a GPU B generation node to produce results. Labels note GPU A is optimized for long-context processing, while GPU B delivers strong TCO for both context and generation. class=lazyload wp-image-105631/>Figure 1. Optimizing inference by aligning GPU capabilities with context and generation workloads
However, disaggregation introduces new layers of complexity, requiring precise coordination across low-latency KV cache transfers, LLM-aware routing, and efficient memory management. NVIDIA Dynamo serves as the orchestration layer for these components, and its capabilities played a pivotal role in the latest MLPerf Inference results. Learn how disaggregation with Dynamo on GB200 NVL72 set new performance records.
To capitalize on the benefits of disaggregated inference-particularly in the compute-intensive context phase-specialized acceleration is essential. Addressing this need, NVIDIA is introducing Rubin CPX GPU-a purpose-built solution designed to deliver high-throughput performance for high-value long-context inference workloads while seamlessly integrating into disaggregated infrastructure.
Rubin CPX: built to accelerate long-context processing The Rubin CPX GPU is designed to enhance long-context performance, complementing existing infrastructure while delivering scalable efficiency and maximizing ROI in context-aware inference deployments. Rubin CPX, built with the Rubin architecture, delivers breakthrough performance for the compute-intensive context phase of inference. It features 30 petaFLOPs of NVFP4 compute, 128 GB of GDDR7 memory, hardware support for video decoding and encoding, and 3x attention acceleration (compared to NVIDIA GB300 NVL72).
Optimized for efficiently processing long sequences, Rubin CPX is critical for high-value inference use cases like software application development and HD video generation. Designed to complement existing disaggregated inference architectures, it enhances throughput and responsiveness while maximizing ROI for large-scale generative AI workloads.
Rubin CPX works in tandem with NVIDIA Vera CPUs and Rubin GPUs for generation-phase processing, forming a complete, high-performance disaggregated serving solution for long-context use cases. The NVIDIA Vera Rubin NVL144 CPX rack integrates 144 Rubin CPX GPUs, 144 Rubin GPUs, and 36 Vera CPUs to deliver 8 exaFLOPs of NVFP4 compute-7.5 more than the GB300 NVL72-alongside 100 TB of high-speed memory and 1.7 PB/s of memory bandwidth, all within a single rack.
Using NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet, paired with NVIDIA ConnectX-9 SuperNICs and orchestrated by the Dynamo platform, the Vera Rubin NVL144 CPX is built to power the next wave of million-token context AI inference workloads-cutting inference costs and unlocking advanced capabilities for developers and creators worldwide.
At scale, the platform can deliver 30x to 50x return on investment, transl
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
06/10/2025
France T l visions, France's leading broadcaster, has received the 2025 EBU ...
12/09/2025
College Football Kickoff 2025: Fox Sports Ups Look as Canon, Sony Power Shallow ...
12/09/2025
ABC/ESPN Excited For WNBA Postseason Coverage In Revamped FormatThe Finals moves to a best-of-seven series in 2025By Mark J Burns, SVG Contributor
Friday, Sep...
12/09/2025
(L-R) Jade Croot, Rosy McEwen, and Bryn Chainey attend the 2025 Sundance Film Festival premiere of Rabbit Trap at Eccles Theatre on January 24, 2025, in Park ...
12/09/2025
For fans, we know how important it is to stay plugged into music culture and dis...
12/09/2025
Link ping, Sweden and Shipley, United Kingdom, September 12, 2025 - Agama, the expert in video observability and analytics for service quality and customer expe...
12/09/2025
IBC2025 began on Sept. 12, with exhibits and conferences running through Sept. 15 at the RAI Amsterdam Convention Center. Explore the full TV Tech coverage of t...
12/09/2025
The Best Fictional Bands (and the Artists Who Make Them Great) With Spinal Tap II: The End Continues hitting theaters and songs from Kpop Demon Hunters ruling...
12/09/2025
Industry veteran Tom Baldassare has joined Advanced Systems Group, LLC (ASG), a technology and services provider for media creatives and content owners, as a Se...
12/09/2025
Maxon, maker of powerful, approachable software solutions for creators working in 2D and 3D design, motion graphics, visual effects, and more, today announced a...
12/09/2025
PlayBox Neo, a leading provider of media playout solutions, has partnered with AI-Media, pioneering developers of AI-powered captioning technology, to integrate...
12/09/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
12/09/2025
New alliance strengthens the IT channel in Germany and Switzerland in protecting business-critical SaaS data.
Keepit, the world s only independent, cloud-nativ...
12/09/2025
Fincons Group, an international IT business consultancy and systems integrator company with more than 40 years of experience in the market, is proud to announce...
12/09/2025
Following its acquisition of Telemetrics, EVS continues its push into robotics with an announcement at IBC2025 that it is acquiring XD motion....
12/09/2025
TORONTO The North American Broadcasters Association (NABA) has announced the appointment of Eduardo Ruiz Sanchez, deputy director, broadcast operations at Telev...
12/09/2025
Ed Miller, a longtime broadcast engineer in Ohio and a former national president of the Society of Broadcast Engineers, has died....
12/09/2025
AMSTERDAM At this year's IBC2025, the Advanced HDR by Technicolor initiative will be pushing broadcasters to adopt a more dynamic, frame-by-frame conversion...
12/09/2025
Friday 12 September 2025
The Boomtown Rats, Nyah Grace, Soweto Kinch, Royal Ballet and Madness also announced to perform at the ceremony on Tuesday
Sky today ...
12/09/2025
Wuppertal September 12, 2025
Riedel Unveils Ultra-Light Bolero Mini Wireless Intercom BeltpackAt IBC2025 in Amsterdam, Riedel Communications unveiled Bolero M...
12/09/2025
Wuppertal September 12, 2025
Riedel Communications Acquires hi human interfaceRiedel Communications today announced the acquisition of hi human interface fro...
12/09/2025
Back to All News
New International Crime Series Road (WT)' Explores Twiste...
12/09/2025
Back to All News
First Look: Thai Crime Drama Everybody Loves Me When I'm ...
12/09/2025
Back to All News
Netflix Marks 10 Years in Japan, Announces Three New Series Th...
12/09/2025
CORE+ virtually removes distortion, setting a new standard for church sound and giving worship teams the clarity and confidence they need.
Read the full artic...
12/09/2025
The Late Late Show is back with a bang after the summer break, and Patrick Kielt...
12/09/2025
The World Athletics Championships, Ireland v France in the Women's Rugby World Cup quarter-final, the Irish Champions Festival, and two Sports Direct Men...
12/09/2025
The Records Show starts Sunday at 6.30pm on RT One and RT Player.
Katie Hanno...
11/09/2025
Report: Busy Live Sports Streaming Execs Have Low-hanging Fruit' in Front o...
11/09/2025
Inside Game Creek Video's Big Week as Ovation, Flagship Make NFL DebutsBy Ken Kerschbaumer, Editorial Director
Thursday, September 11, 2025 - 7:00 am
Pr...
11/09/2025
NFL Kickoff 2025: Prime Sports Starts New Season at Lambeau Field; Sets Sights o...
11/09/2025
College Football Kickoff 2025: NBC Sports Pushes HDR Image Quality, Aerial Drone...
11/09/2025
RADAR, Spotify's program for emerging talent, recently hit a major milestone...
11/09/2025
SBS shares Australian National Anthem in over 60 languages to foster belonging a...
11/09/2025
L3Harris showcases advanced Distributed Spectrum Collaboration and Operations (DiSCO) technology on Defence Science and Technology Laboratory's MAST-13 uncr...
11/09/2025
L3Harris recently signed an agreement with Kongsberg Defence & Aerospace to supp...
11/09/2025
Warsaw, Poland, 20.08.25: Nielsen, the global leader in audience measurement, data and analytics, has released its latest July All Screens Video Landscape repor...
11/09/2025
Warner Bros. Discovery to add key Big Data and Advanced Audience capabilities fr...
11/09/2025
Link ping, Sweden, September 11, 2025 - Agama, the expert in video observability & analytics for service quality and customer experience, announced today the la...
11/09/2025
Maxon today unveiled the latest Maxon One release, delivering new innovations across its unified creative ecosystem and introducing a fresh visual identity that...
11/09/2025
Lightware is pleased to announce the launch of the Lightware Enterprise Program, empowering corporations and organisations with its portfolio of scalable, relia...
11/09/2025
CueScript, the leading international developer of professional teleprompting solutions with over a decade of innovation and hands-on industry expertise, is intr...
11/09/2025
MASV (massive.io), the fastest and most reliable secure large file transfer platform for media professionals and an IDC Innovator 2025 for Media & Entertainment...
11/09/2025
CROSSPOINT/CROSSMEDIA (part of ES MEDIA Group), in partnership with AMPLIFY, has been selected to deliver the automatic metadata service for RTVE's historic...
11/09/2025
TSL is launching Hummingbird, a unified, interoperable ecosystem of control and monitoring applications and interfaces, designed to drive efficiency and reduce ...
11/09/2025
LucidLink, the cloud-native storage collaboration platform, today announced new innovations at IBC 2025 (Stand 6.A12) designed to deliver faster, more secure wo...
11/09/2025
Under the USD 89.6 Million award, SES Space & Defense will provide global commer...
11/09/2025
Leading Balkan DTH provider adds capacity to consolidate its m:Sat TV platform at 23.5 degrees East and serve more customers across the region
Luxembourg, 11 S...
11/09/2025
LONDON Vizrt has announced that it is providing automation technologies for live sports production to the Viaplay streaming service that are being used to cover...