
Large language models and the applications they power enable unprecedented opportunities for organizations to get deeper insights from their data reservoirs and to build entirely new classes of applications.
But with opportunities often come challenges.
Both on premises and in the cloud, applications that are expected to run in real time place significant demands on data center infrastructure to simultaneously deliver high throughput and low latency with one platform investment.
To drive continuous performance improvements and improve the return on infrastructure investments, NVIDIA regularly optimizes the state-of-the-art community models, including Meta's Llama, Google's Gemma, Microsoft's Phi and our own NVLM-D-72B, released just a few weeks ago.
Relentless Improvements Performance improvements let our customers and partners serve more complex models and reduce the needed infrastructure to host them. NVIDIA optimizes performance at every layer of the technology stack, including TensorRT-LLM, a purpose-built library to deliver state-of-the-art performance on the latest LLMs. With improvements to the open-source Llama 70B model, which delivers very high accuracy, we've already improved minimum latency performance by 3.5x in less than a year.
We're constantly improving our platform performance and regularly publish performance updates. Each week, improvements to NVIDIA software libraries are published, allowing customers to get more from the very same GPUs. For example, in just a few months' time, we've improved our low-latency Llama 70B performance by 3.5x.
NVIDIA has increased performance on the Llama 70B model by 3.5x. In the most recent round of MLPerf Inference 4.1, we made our first-ever submission with the Blackwell platform. It delivered 4x more performance than the previous generation.
This submission was also the first-ever MLPerf submission to use FP4 precision. Narrower precision formats, like FP4, reduces memory footprint and memory traffic, and also boost computational throughput. The process takes advantage of Blackwell's second-generation Transformer Engine, and with advanced quantization techniques that are part of TensorRT Model Optimizer, the Blackwell submission met the strict accuracy targets of the MLPerf benchmark.
Blackwell B200 delivers up to 4x more performance versus previous generation on MLPerf Inference v4.1's Llama 2 70B workload. Improvements in Blackwell haven't stopped the continued acceleration of Hopper. In the last year, Hopper performance has increased 3.4x in MLPerf on H100 thanks to regular software advancements. This means that NVIDIA's peak performance today, on Blackwell, is 10x faster than it was just one year ago on Hopper.
These results track progress on the MLPerf Inference Llama 2 70B Offline scenario over the past year. Our ongoing work is incorporated into TensorRT-LLM, a purpose-built library to accelerate LLMs that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM is built on top of the TensorRT Deep Learning Inference library and leverages much of TensorRT's deep learning optimizations with additional LLM-specific improvements.
Improving Llama in Leaps and Bounds More recently, we've continued optimizing variants of Meta's Llama models, including versions 3.1 and 3.2 as well as model sizes 70B and the biggest model, 405B. These optimizations include custom quantization recipes, as well as efficient use of parallelization techniques to more efficiently split the model across multiple GPUs, leveraging NVIDIA NVLink and NVSwitch interconnect technologies. Cutting-edge LLMs like Llama 3.1 405B are very demanding and require the combined performance of multiple state-of-the-art GPUs for fast responses.
Parallelism techniques require a hardware platform with a robust GPU-to-GPU interconnect fabric to get maximum performance and avoid communication bottlenecks. Each NVIDIA H200 Tensor Core GPU features fourth-generation NVLink, which provides a whopping 900GB/s of GPU-to-GPU bandwidth. Every eight-GPU HGX H200 platform also ships with four NVLink Switches, enabling every H200 GPU to communicate with any other H200 GPU at 900GB/s, simultaneously.
Many LLM deployments use parallelism over choosing to keep the workload on a single GPU, which can have compute bottlenecks. LLMs seek to balance low latency and high throughput, with the optimal parallelization technique depending on application requirements.
For instance, if lowest latency is the priority, tensor parallelism is critical, as the combined compute performance of multiple GPUs can be used to serve tokens to users more quickly. However, for use cases where peak throughput across all users is prioritized, pipeline parallelism can efficiently boost overall server throughput.
The table below shows that tensor parallelism can deliver over 5x more throughput in minimum latency scenarios, whereas pipeline parallelism brings 50% more performance for maximum throughput use cases.
For production deployments that seek to maximize throughput within a given latency budget, a platform needs to provide the ability to effectively combine both techniques like in TensorRT-LLM.
Read the technical blog on boosting Llama 3.1 405B throughput to learn more about these techniques.
Different scenarios have different requirements, and parallelism techniques bring optimal performance for each of these scenarios. The Virtuous Cycle Over the lifecycle of our architectures, we deliver significant performance gains from ongoing software tuning and optimization. These improvements translate into additional value for customers who train and deploy on our platforms. They're able to create more capable models and applications and deploy their existing models using less infrastructure, enhancing th
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
06/10/2025
France T l visions, France's leading broadcaster, has received the 2025 EBU ...
18/09/2025
DirecTV continues to expand its women's sports offerings by adding Sports Fanatics' and Whoopi Goldberg's Free Ad-Supported Television (FAST) All Wo...
18/09/2025
WASHINGTON NextGen TV viewers are beginning to see what high dynamic range (HDR) brings to their football enjoyment with the launch of the sport's fall seas...
18/09/2025
WASHINGTON After issuing audit letters seeking Equal Employment Opportunity data from a randomly selected group of TV and radio stations in August, the Federal ...
18/09/2025
NEW YORK Warner Bros. Discovery and Nielsen have signed a new, long-term, multi-year deal that covers measurement for all Warner Bros. Discovery platforms acros...
18/09/2025
NEW YORK As part of its ongoing efforts to develop better measurement solutions, the Coalition for Innovative Media Measurement (CIMM) announced the launch of t...
18/09/2025
IRVING, Texas Following threats from the Federal Communications Commission and the announcement that the nations largest station group, Nexstar Media Group, wou...
17/09/2025
Tech Focus: Audio Training, Part 2 - Manufacturers Offer Extensive Online Learni...
17/09/2025
Tech Focus: Audio Training, Part 1 - A1 Shortage Remains a Major-League Challeng...
17/09/2025
It was the ultimate convergence of pop culture and literary prestige: Last night, Dua Lipa brought her Service95 Book Club podcast to the stage for a special li...
17/09/2025
During August, streaming's share of TV viewing in Mexico showed an increase of 0.4% compared to the previous month, accounting for 25% of TV viewing.
Discl...
17/09/2025
CYPRESS, Calif. FOR-A America has named Jo Aun as senior manager of product engineering, a new role responsible for guiding the planning, development and rollou...
17/09/2025
PlayBox Neo, in partnership with CIS Group, a leading provider of media and broadcast technology solutions, has successfully deployed PlayBox Neo's Dual Cha...
17/09/2025
In a relationship that mirrors societal advances in sustainability, Brightline Lighting and the Federal Energy Regulatory Commission (FERC) Headquarters have en...
17/09/2025
Clear-Com is proud to support the world-class productions of Alley Theatre, one of the oldest and largest nonprofit resident theatres in the United States. With...
17/09/2025
Arch Platform Technologies (www.archpt.io), a pioneer in automated, scalable cloud infrastructure for high-performance workflows, today announced a Strategic Co...
17/09/2025
Over 300 selected decision-makers from start-ups, corporates, and VC funds worldwide will gather for the third edition of the event, united by a single goal: to...
17/09/2025
Telestream, a global leader in media workflow technologies, is excited to announce that its flagship Vantage platform and its next-generation AI capabilities re...
17/09/2025
Mediagenix, a global leader in smart content solutions that profitably connect the right content to the right audience, proudly announces its three Best of Show...
17/09/2025
In a move to further establish a firm foothold across South East Asia, PlayBox Neo, the well-respected name in broadcast playout and channel branding, has appoi...
17/09/2025
Wisycom, a global leader in advanced wireless audio solutions, announced two major wireless solutions at IBC 2025 (Stand 8.D30). This includes the Portable RF-o...
17/09/2025
Six Berklee Alumni Win Emmy Awards The recipients were recognized for their contributions to acclaimed programs Severance, The Studio, The Penguin, SNL50: The...
17/09/2025
Applications Open for Berklee in Santo Domingo The weeklong contemporary music program will run January 5-10, 2026.
By
Colette Greenstein
September 17, 2025
...
17/09/2025
Ukrainian Students Find Creative Consonance' at Berklee Valencia Through ELIA's UAx Platform, six students from Kyiv joined Berklee Valencia for a week...
17/09/2025
Earlier this year Avid announced Kenna Hilburn as its new senior vice president of product. Recently Hilburn was promoted to Avids new Chief Product Officer, su...
17/09/2025
Transatlantic collaboration combines experience and agility to drive innovation in network design and delivery
Luxembourg, September 16, 2025 - SES, a leading ...
17/09/2025
NEW YORK Madhive has announced that the Fox Television Stations have joined its Live Sports Marketplace....
17/09/2025
SYRACUSE, N.Y. Sony Electronics has announced that it is partnering with the Newhouse School at Syracuse University to provide state-of-the-art equipment, hands...
17/09/2025
SAN JOSE, Calif. Roku has announced that the first smart projector using its Roku TV operating system, the Aurzen Roku TV Smart Projector D1R Cube, is now avail...
17/09/2025
Today's creators are equal parts entertainer, producer and gamer, juggling game commentary, scene changes, replay clips, chat moderation and technical troub...
17/09/2025
Wednesday 17 September 2025
UK artists capture icons of stage and screen, inclu...
17/09/2025
Jo Returns to FOR-A as Senior Manager of Product Management and Engineering...
17/09/2025
For the Moon Safari anniversary tour, AIR opened the doors to their backstage. Just a few hours before the Paris concert, DPA met with two key figures of the te...
17/09/2025
Auditions will be held in Dublin, Cork and Galway
The County Parade returns f...
16/09/2025
SVG All-Stars: Leigh Michaud, Manager, Remote Operations, ESPNThe UConn grad rose from ESPN's mailroom to become one of its most valuable ops leadersBy Bran...
16/09/2025
Live From IBC 2025: Friday's Latest From Halls 1-4, Outdoor Exhibits in Amst...
16/09/2025
Live From IBC 2025: Saturday's Latest From Halls 5-7 in Amsterdam By SVG Staff
Friday, September 12, 2025 - 17:00
Print This Story
The SVG Europe and ...
16/09/2025
Live From IBC 2025: Sunday's Latest From Halls 8-10 in Amsterdam By SVG Staff
Saturday, September 13, 2025 - 17:00
Print This Story
The SVG Europe and...
16/09/2025
Live From IBC 2025: Monday's Latest From Halls 11-14 in Amsterdam By SVG Staff
Sunday, September 14, 2025 - 17:00
Print This Story
The SVG Europe and ...
16/09/2025
Amazon Prime Video Picks Up Four Hours of Early-Round Masters Coverage in 2026 By Jason Dachman, Editorial Director, U.S.
Tuesday, September 16, 2025 - 10:15...
16/09/2025
VERSANT Inks Deal for League One Volleyball as Women's Sports Rights Slate G...
16/09/2025
ESPN VP, Corporate Communications, Katina Arnold Named SVP, Disney Advertising C...
16/09/2025
IBC 2025 in Review: SVG Europe's Full Collection of Video Interviews From th...
16/09/2025
Hace una d cada, la m sica latina representaba apenas el 8% de las reproducciones globales en Spotify. Hoy, constituye m s de una cuarta parte (27%) de toda la ...
16/09/2025
A decade ago, Latin music made up just 8% of global Spotify streams. Today, it a...
16/09/2025
Spotify is expanding our video lineup with a new partnership with Zoo 55, part of ITV Studios. For the first time, acclaimed content from ITV Studios is landing...
16/09/2025
At DSEI 2025, James Dunne of L3Harris Maritime UK chaired a panel on aligning the supply chain to the warfighter, where leaders discussed modernising support fo...
16/09/2025
Calrec has strengthened its collaboration with audio metering expert RTW by integrating RTW's new TMxCore metering platform across its full range of Argo IP...
16/09/2025
College Football Scores Top Telecast in August with 16M+ Viewers on FOX, Followe...