
Large language models and the applications they power enable unprecedented opportunities for organizations to get deeper insights from their data reservoirs and to build entirely new classes of applications.
But with opportunities often come challenges.
Both on premises and in the cloud, applications that are expected to run in real time place significant demands on data center infrastructure to simultaneously deliver high throughput and low latency with one platform investment.
To drive continuous performance improvements and improve the return on infrastructure investments, NVIDIA regularly optimizes the state-of-the-art community models, including Meta's Llama, Google's Gemma, Microsoft's Phi and our own NVLM-D-72B, released just a few weeks ago.
Relentless Improvements Performance improvements let our customers and partners serve more complex models and reduce the needed infrastructure to host them. NVIDIA optimizes performance at every layer of the technology stack, including TensorRT-LLM, a purpose-built library to deliver state-of-the-art performance on the latest LLMs. With improvements to the open-source Llama 70B model, which delivers very high accuracy, we've already improved minimum latency performance by 3.5x in less than a year.
We're constantly improving our platform performance and regularly publish performance updates. Each week, improvements to NVIDIA software libraries are published, allowing customers to get more from the very same GPUs. For example, in just a few months' time, we've improved our low-latency Llama 70B performance by 3.5x.
NVIDIA has increased performance on the Llama 70B model by 3.5x. In the most recent round of MLPerf Inference 4.1, we made our first-ever submission with the Blackwell platform. It delivered 4x more performance than the previous generation.
This submission was also the first-ever MLPerf submission to use FP4 precision. Narrower precision formats, like FP4, reduces memory footprint and memory traffic, and also boost computational throughput. The process takes advantage of Blackwell's second-generation Transformer Engine, and with advanced quantization techniques that are part of TensorRT Model Optimizer, the Blackwell submission met the strict accuracy targets of the MLPerf benchmark.
Blackwell B200 delivers up to 4x more performance versus previous generation on MLPerf Inference v4.1's Llama 2 70B workload. Improvements in Blackwell haven't stopped the continued acceleration of Hopper. In the last year, Hopper performance has increased 3.4x in MLPerf on H100 thanks to regular software advancements. This means that NVIDIA's peak performance today, on Blackwell, is 10x faster than it was just one year ago on Hopper.
These results track progress on the MLPerf Inference Llama 2 70B Offline scenario over the past year. Our ongoing work is incorporated into TensorRT-LLM, a purpose-built library to accelerate LLMs that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM is built on top of the TensorRT Deep Learning Inference library and leverages much of TensorRT's deep learning optimizations with additional LLM-specific improvements.
Improving Llama in Leaps and Bounds More recently, we've continued optimizing variants of Meta's Llama models, including versions 3.1 and 3.2 as well as model sizes 70B and the biggest model, 405B. These optimizations include custom quantization recipes, as well as efficient use of parallelization techniques to more efficiently split the model across multiple GPUs, leveraging NVIDIA NVLink and NVSwitch interconnect technologies. Cutting-edge LLMs like Llama 3.1 405B are very demanding and require the combined performance of multiple state-of-the-art GPUs for fast responses.
Parallelism techniques require a hardware platform with a robust GPU-to-GPU interconnect fabric to get maximum performance and avoid communication bottlenecks. Each NVIDIA H200 Tensor Core GPU features fourth-generation NVLink, which provides a whopping 900GB/s of GPU-to-GPU bandwidth. Every eight-GPU HGX H200 platform also ships with four NVLink Switches, enabling every H200 GPU to communicate with any other H200 GPU at 900GB/s, simultaneously.
Many LLM deployments use parallelism over choosing to keep the workload on a single GPU, which can have compute bottlenecks. LLMs seek to balance low latency and high throughput, with the optimal parallelization technique depending on application requirements.
For instance, if lowest latency is the priority, tensor parallelism is critical, as the combined compute performance of multiple GPUs can be used to serve tokens to users more quickly. However, for use cases where peak throughput across all users is prioritized, pipeline parallelism can efficiently boost overall server throughput.
The table below shows that tensor parallelism can deliver over 5x more throughput in minimum latency scenarios, whereas pipeline parallelism brings 50% more performance for maximum throughput use cases.
For production deployments that seek to maximize throughput within a given latency budget, a platform needs to provide the ability to effectively combine both techniques like in TensorRT-LLM.
Read the technical blog on boosting Llama 3.1 405B throughput to learn more about these techniques.
Different scenarios have different requirements, and parallelism techniques bring optimal performance for each of these scenarios. The Virtuous Cycle Over the lifecycle of our architectures, we deliver significant performance gains from ongoing software tuning and optimization. These improvements translate into additional value for customers who train and deploy on our platforms. They're able to create more capable models and applications and deploy their existing models using less infrastructure, enhancing th
Most recent headlines
11/12/2025
Dalet, a leading provider of cloud-native, end-to-end media workflow solutions, ...
25/11/2025
Tracy Bonareri Onchoke, an investigative journalist from Kenya is the winner of the Thomson Foundation's Young Journalist Award 2025.
The 26-year-old-sele...
25/11/2025
SVG All-Stars: Blayke Scheer, Senior Director, Creative Content, YES NetworkThe Indiana alum has turned storytelling into an artform for more than two decadesBy...
25/11/2025
Op-Ed: With FCC's C-Band Auction on the Horizon, Broadcasters Need Proven, C...
25/11/2025
Analysis: Is Baller League really the future of sport? By Callum McCarthy, Editor-at-Large
Tuesday, November 25, 2025 - 10:10
Print This Story
With KSI on...
25/11/2025
Platinum Whitepaper: The Growth of Broadcast in the World of Major Large Scale E...
25/11/2025
SVG Summit 2025 Preview: SVG Women's Sports WorkshopBy Samantha Gabay
Tuesday, November 25, 2025 - 10:27 am
Print This Story | Subscribe
Story Highlig...
25/11/2025
SVG New Sponsor Spotlight: CacheFly's Matt Levine on the Evolving Role of th...
25/11/2025
Peacock's EA SPORTS Madden NFL Cast Levels Up on Thanksgiving With SkyCam as...
25/11/2025
Mathias Broe attends the 2025 Sundance Film Festival premiere of Sauna at Library Center Theatre. (Photo by Michael Hurcomb/Shutterstock for Sundance Film Fes...
25/11/2025
The best playlists, podcasts, and audiobooks bring a little extra magic to your daily routine. With new features and offerings, Spotify Premium delivers even mo...
25/11/2025
Comprehensive new research confirms what we already knew: Australian music fans love the quality, quantity, and access they have to new and local music on strea...
25/11/2025
Applicable Products
Objectives The purpose of this application note is to give a brief background on 5G (NR) wireless communication an explain the reason a SN...
25/11/2025
Nielsen will now measure both Lionsgate's FAST channel MovieSphere and Movie...
25/11/2025
FREMONT, Calif. Blackmagic Design has announced that The Associated Press (AP) has completed the transition of its global video editing platform to DaVinci Reso...
25/11/2025
Berklees Inaugural Nat King Cole and Natalie Cole Scholarship Awarded to Paris P...
25/11/2025
NEW YORK NFL and college football coverage, the MLB postseason and the new fall broadcast-TV season contributed to major gains for traditional media companies a...
25/11/2025
SAUGERTIES, N.Y. Tower Products, a manufacturer and distributor of pro video and audio equipment here, said President and CEO Jim Veltrie will retire from the c...
25/11/2025
Following last week's disclosure that it had acquired a 8.2% stake in E.W. Scripps, Sinclair has filed papers with the Securities and Exchange Commission pr...
25/11/2025
25 Nov 2025
VEON's QazCode and MeetKai Sign Agreement to Power National LLM...
25/11/2025
UKTV has acquired a high-profile slate of US dramas from Paramount Global Conten...
25/11/2025
A symphony of genius, rivalry and vengeance, boldly reimagined from Peter Shaffe...
25/11/2025
Article courtesy of Cinematography World
Read the article
FilmLight has finalised the prestigious 2025 FilmLight Colour Awards jury and welcomed award-winning...
25/11/2025
Article courtesy of Prensario
Read the article
La serie fue dirigida por Juli n de Tavira, Rodrigo Santos, y David Leche Ruiz, con direcci n de fotograf a a...
25/11/2025
Article courtesy of The Hollywood Reporter
Read the article
The awards, celebr...
25/11/2025
Article courtesy of Televisual
Read the article
Already live in Los Angeles and rolling out in New York and London, Nara gives producers, colourists, conform ...
25/11/2025
Article courtesy of Digital Media World
Read the article
ARTONE post-house in Tokyo is the first facility in Japan to integrate Baselight M, choosing its prec...
25/11/2025
Article courtesy of The Hollywood Reporter
Read the article
Once hidden in post-production suites, the artists who make movies and TV shows look the way they ...
25/11/2025
Article courtesy of Deadline
Read the article
The Brutalist' & Bad Bunny's Nuevayol' Music Video Among 2025 FilmLight Colour Award Winners - Cam...
25/11/2025
Black Forest Labs - the frontier AI research lab developing visual generative AI models - today released the FLUX.2 family of state-of-the-art image generation ...
24/11/2025
HBO's The Shuffle' Reveals Longtime Connection of Sports and Entertainm...
24/11/2025
2025 Sports Broadcasting Hall of Fame: Hiroshi Kiriyama, Sony Broadcast (and Ind...
24/11/2025
SVG Summit 2025 Preview: FIFA, NBC Olympics, Fox Sports, CBS Sports, Netflix, NF...
24/11/2025
Case Study: YES Network Streamlines Broadcast Operations with Beam DynamicsBy SVG Staff
Monday, November 24, 2025 - 11:18 am
Print This Story | Subscribe
...
24/11/2025
Platinum White Paper: More of Everything: How Broadcasters are Changing Their Ap...
24/11/2025
Versant Media USA Sports President Matt Hong on How Versant Has Best of Both Wor...
24/11/2025
SVG Sit-Down: NABA Director-General Rebecca Hanson on How FCC's C-Band Aucti...
24/11/2025
SVG New Sponsor Spotlight: Bolin Technology's Sapan Doshi on the Proliferati...
24/11/2025
Robyn made her long-awaited return to the stage this week, as Spotify and Acne Studios brought friends and top fans together for an unforgettable evening at the...
24/11/2025
After more than two years of redevelopment, FC Barcelona returned to its spiritual home on November 22, hosting Athletic Club in the first La Liga match at the ...
24/11/2025
The L3Harris next-generation imager for NOAA's GeoXO satellite system will c...
24/11/2025
Mumbai - November 24, 2025 - In a first-of-its-kind initiative, JioStar, in coll...
24/11/2025
Disney Achieves Largest Monthly Share Increase, Followed by FOX and Paramount, w...
24/11/2025
ROCHESTER, N.Y. Sinclair said it has elevated Sean LaRose, director of sales at WUHF and partner station WHAM here, to vice president and general manager, effec...
24/11/2025
Back to All News
Chaos and Connection: Meet the Unfiltered Cast of Reality Dati...
24/11/2025
24 Nov 2025
Kyivstar Launches Starlink Direct to Cell Satellite Connectivity in Ukraine Today's Launch Makes Ukraine the First Country in Europe Where Star...
24/11/2025
Editor's note: This post is part of the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copi...
24/11/2025
Ahead of the new ninth series of Dancing with the Stars, kicking off in January 2026, RT and Shinawil have announced the arrival of three new faces to the hit ...
22/11/2025
The deadline for entries for the 2025 Best in Market Awards has been extended to 23:59 PST on November 28, 2025....
22/11/2025
Clear-Com announced the upcoming launch of its 4-Channel HelixNet beltpack, a next-generation advancement of its widely used 2-channel model. The new beltpack...