
The Hao AI Lab research team at the University of California San Diego - at the forefront of pioneering AI model innovation - recently received an NVIDIA DGX B200 system to elevate their critical work in large language model inference.
Many LLM inference platforms in production today, such as NVIDIA Dynamo, use research concepts that originated in the Hao AI Lab, including DistServe.
How Is Hao AI Lab Using the DGX B200? Members of the Hao AI Lab standing with the NVIDIA DGX B200 system. With the DGX B200 now fully accessible to the Hao AI Lab and broader UC San Diego community at the School of Computing, Information and Data Sciences' San Diego Supercomputer Center, the research opportunities are boundless.
DGX B200 is one of the most powerful AI systems from NVIDIA to date, which means that its performance is among the best in the world, said Hao Zhang, assistant professor in the Hal c o lu Data Science Institute and department of computer science and engineering at UC San Diego. It enables us to prototype and experiment much faster than using previous-generation hardware.
Two Hao AI Lab projects the DGX B200 is accelerating are FastVideo and the Lmgame benchmark.
FastVideo focuses on training a family of video generation models to produce a five-second video based on a given text prompt - in just five seconds.
The research phase of FastVideo taps into NVIDIA H200 GPUs in addition to the DGX B200 system.
Lmgame-bench is a benchmarking suite that puts LLMs to the test using popular online games including Tetris and Super Mario Bros. Users can test one model at a time or put two models up against each other to measure their performance.
The illustrated workflow of Hao AI Lab's Lmgame-Bench project. Other ongoing projects at Hao AI Labs explore new ways to achieve low-latency LLM serving, pushing large language models toward real-time responsiveness.
Our current research uses the DGX B200 to explore the next frontier of low-latency LLM-serving on the awesome hardware specs the system gives us, said Junda Chen, a doctoral candidate in computer science at UC San Diego.
How DistServe Influenced Disaggregated Serving Disaggregated inference is a way to ensure large-scale LLM-serving engines can achieve the optimal aggregate system throughput while maintaining acceptably low latency for user requests.
The benefit of disaggregated inference lies in optimizing what DistServe calls goodput instead of throughput in the LLM-serving engine.
Here's the difference:
Throughput is measured by the number of tokens per second that the entire system can generate. Higher throughput means lower cost to generate each token to serve the user. For a long time, throughput was the only metric used by LLM-serving engines to measure their performance against one another.
While throughput measures the aggregate performance of the system, it doesn't directly correlate to the latency that a user perceives. If a user demands lower latency to generate the tokens, the system has to sacrifice throughput.
This natural trade-off between throughput and latency is what led the DistServe team to propose a new metric, goodput : the measure of throughput while satisfying the user-specified latency objectives, usually called service-level objectives. In other words, goodput represents the overall health of a system while satisfying user experience.
DistServe shows that goodput is a much better metric for LLM-serving systems, as it factors in both cost and service quality. Goodput leads to optimal efficiency and ideal output from a model.
How Can Developers Achieve Optimal Goodput? When a user makes a request in an LLM system, the system takes the user input and generates the first token, known as prefill. Then, the system creates numerous output tokens, one after another, predicting each token's future behavior based on past requests' outcomes. This process is known as decode.
https://blogs.nvidia.com/wp-content/uploads/2025/12/distserve.mp4
Prefill and decode have historically run on the same GPU, but the researchers behind DistServe found that splitting them onto different GPUs maximizes goodput.
Previously, if you put these two jobs on a GPU, they would compete with each other for resources, which could make it slow from a user perspective, Chen said. Now, if I split the jobs onto two different sets of GPUs - one doing prefill, which is compute intensive, and the other doing decode, which is more memory intensive - we can fundamentally eliminate the interference between the two jobs, making both jobs run faster.
This process is called prefill/decode disaggregation, or separating the prefill from decode to get greater goodput.
Increasing goodput and using the disaggregated inference method enables the continuous scaling of workloads without compromising on low-latency or high-quality model responses.
NVIDIA Dynamo - an open-source framework designed to accelerate and scale generative AI models at the highest efficiency levels with the lowest cost - enables scaling disaggregated inference.
In addition to these projects, cross-departmental collaborations, such as in healthcare and biology, are underway at UC San Diego to further optimize an array of research projects using the NVIDIA DGX B200, as researchers continue exploring how AI platforms can accelerate innovation.
Learn more about the NVIDIA DGX B200 system.
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
13/05/2026
New Adobe Premiere Color Grading Mode Accelerated on NVIDIA GPUs
Joel Pennington May 13, 2026
0 Comments
New NVIDIA RTX-accelerated features streamlin...
13/05/2026
Grass Valley announced that dB Broadcast has delivered new IP-based outside broadcast (OB) trucks for Cloudbass, featuring Grass Valley LDX 100 Series cameras a...
13/05/2026
Ikegami will exhibit the latest additions to its wide range of broadcast production cameras, control units, viewfinders and monitors on stand 5D3-1 at Broadcast...
13/05/2026
FISE, working with the founding members of the XR Sports Alliance (XRSA), Accedo, Qualcomm Technologies, Inc. and HBS, have collaborated to develop an immersive...
13/05/2026
Canon Unveils New EOS R6 V Full-Frame EOS Camera and RF20-50mm F4 L IS USM PZ Bu...
13/05/2026
Boston Conservatory at Berklee Honors Beth Morrison and Moses Pendleton at Comme...
13/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/05/2026
Creative software developer Foundry today announced the latest developments on Nuke Stage. A purpose-built application for end-to-end virtual production and in-...
13/05/2026
A definitive portrait of one of Ireland's most influential musicians
New TV documentary airs Monday 18 May on RT One and RT Player at 9.35pm
Watch the...
13/05/2026
Agentic AI is changing the way users get work done. Following the success of OpenClaw, the community is embracing new open source agentic frameworks. The latest...
13/05/2026
Reinforcement-learning agents - AI systems that learn by trial and error - can c...
12/05/2026
Beyond the Hype: A Strategic Post-Hoc Analysis of NAB 2026 If NAB Show 2026 had an underlying theme, it was a quiet, industry-wide pivot from the high-energy sp...
12/05/2026
Guntermann and Drunck (G&D), a Panoptec Technologies Group company, and CT Square, led by Chandresh Shah, have announced a joint venture to distribute G&D and V...
12/05/2026
With 30 days until the start of the FIFA World Cup 2026, Telemundo, the exclusive Spanish-language home of the tournament in the United States, has announced th...
12/05/2026
The NHL has announced the return of Stanley Pup for its third consecutive year, a 90-minute special featuring adoptable rescue dogs competing on a miniature rin...
12/05/2026
NBCUniversal presented its 2026 Upfront to advertisers at Radio City Music Hall, detailing upcoming programming across NBC, Peacock, Bravo, and Versant properti...
12/05/2026
FOX Sports has announced funding for the Fandom and Social Connection Initiative at Harvard Kennedy School's Shorenstein Center on Media, Politics, and Publ...
12/05/2026
TNDV and Live Media, both divisions of Live Media Group, supported live broadcast coverage around NCAA Final Four weekend in Indianapolis, including the March M...
12/05/2026
The European Football Alliance (EFA) has announced a content distribution agreement with Fubo Sports Network, the free ad-supported streaming TV (FAST) channel ...
12/05/2026
For the first time, Spanish-speaking fans in the U.S. will have two separate tel...
12/05/2026
CP Communications led a comprehensive spectrum management initiative on behalf of Churchill Downs during Kentucky Derby week, coordinating RF assets across the ...
12/05/2026
LiveU has announced a strategic partnership with DRONERESPONDERS, a 501(c)3 non-...
12/05/2026
Open Broadcast Systems has announced that BMC TV, a specialist in IP transport of broadcast content, has selected the Open Broadcast Systems 5G Flyaway solution...
12/05/2026
NEP Europe, part of NEP Group, has announced it will deliver broadcast solutions...
12/05/2026
Grass Valley has announced continued collaboration with Ravensbourne University ...
12/05/2026
Stats Perform has announced the launch of Opta Pulse, an AI-assisted video creation and distribution platform for leagues, rights holders, and broadcasters. The...
12/05/2026
FOX Sports has announced a collaboration with Sesame Workshop to integrate Sesame Street characters into FOX Sports' FIFA World Cup 2026 programming. Conten...
12/05/2026
To date, NHL Productions has produced 19 broadcasts with commentary in American Sign Language
NHL in ASL (American Sign Language) may be just one show, but the...
12/05/2026
Google's Brian Albert: creators, athletes, highlights, nostalgia, second-scr...
12/05/2026
A still from Past Lives by Celine Song, an official selection of the Premieres program at the 2023 Sundance Film Festival. (Courtesy of Sundance Institute | p...
12/05/2026
Spotify is where fans and artists come together, turning discovery into somethin...
12/05/2026
Features patented Marco-MMC clocking technology
Black Lion Audio's latest release combines the company's expertise in clocking with their renowned p...
12/05/2026
New Track Panel, sequencer upgrades & more
Following their recent public beta release, Reason Studios have announced the full release of Reason 14. With the...
12/05/2026
One month to go! SBS reveals expansive FIFA World Cup 2026 lineup beyond the pi...
12/05/2026
Rohde & Schwarz presents its advanced solutions for power electronics testing at...
12/05/2026
aconnic AG (ISIN: DE000A0LBKW6), Munich, has developed a modified fund raising p...
12/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/05/2026
Tyrell Corporation, specialists in high-end live sports and entertainment broadcasts, was tasked with delivering compelling broadcast coverage of premier equest...
12/05/2026
Registration is now open for IBC2026 as the global media, entertainment and technology community prepares to converge on the RAI Amsterdam from 11 14 September ...
12/05/2026
Ross Video, a global leader in live video production technology, will present its latest innovations and integrated production workflows at BroadcastAsia 2026, ...
12/05/2026
500 selected leaders from around the world across start-ups, corporates, and venture capital. Over 50bn in assets under management among attending investors, a...