Sony Pixel Power calrec Sony

What's the ROI? Getting the Most Out of LLM Inference

09/10/2024

Large language models and the applications they power enable unprecedented opportunities for organizations to get deeper insights from their data reservoirs and to build entirely new classes of applications.

But with opportunities often come challenges.

Both on premises and in the cloud, applications that are expected to run in real time place significant demands on data center infrastructure to simultaneously deliver high throughput and low latency with one platform investment.

To drive continuous performance improvements and improve the return on infrastructure investments, NVIDIA regularly optimizes the state-of-the-art community models, including Meta's Llama, Google's Gemma, Microsoft's Phi and our own NVLM-D-72B, released just a few weeks ago.

Relentless Improvements Performance improvements let our customers and partners serve more complex models and reduce the needed infrastructure to host them. NVIDIA optimizes performance at every layer of the technology stack, including TensorRT-LLM, a purpose-built library to deliver state-of-the-art performance on the latest LLMs. With improvements to the open-source Llama 70B model, which delivers very high accuracy, we've already improved minimum latency performance by 3.5x in less than a year.

We're constantly improving our platform performance and regularly publish performance updates. Each week, improvements to NVIDIA software libraries are published, allowing customers to get more from the very same GPUs. For example, in just a few months' time, we've improved our low-latency Llama 70B performance by 3.5x.

NVIDIA has increased performance on the Llama 70B model by 3.5x. In the most recent round of MLPerf Inference 4.1, we made our first-ever submission with the Blackwell platform. It delivered 4x more performance than the previous generation.

This submission was also the first-ever MLPerf submission to use FP4 precision. Narrower precision formats, like FP4, reduces memory footprint and memory traffic, and also boost computational throughput. The process takes advantage of Blackwell's second-generation Transformer Engine, and with advanced quantization techniques that are part of TensorRT Model Optimizer, the Blackwell submission met the strict accuracy targets of the MLPerf benchmark.

Blackwell B200 delivers up to 4x more performance versus previous generation on MLPerf Inference v4.1's Llama 2 70B workload. Improvements in Blackwell haven't stopped the continued acceleration of Hopper. In the last year, Hopper performance has increased 3.4x in MLPerf on H100 thanks to regular software advancements. This means that NVIDIA's peak performance today, on Blackwell, is 10x faster than it was just one year ago on Hopper.

These results track progress on the MLPerf Inference Llama 2 70B Offline scenario over the past year. Our ongoing work is incorporated into TensorRT-LLM, a purpose-built library to accelerate LLMs that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM is built on top of the TensorRT Deep Learning Inference library and leverages much of TensorRT's deep learning optimizations with additional LLM-specific improvements.

Improving Llama in Leaps and Bounds More recently, we've continued optimizing variants of Meta's Llama models, including versions 3.1 and 3.2 as well as model sizes 70B and the biggest model, 405B. These optimizations include custom quantization recipes, as well as efficient use of parallelization techniques to more efficiently split the model across multiple GPUs, leveraging NVIDIA NVLink and NVSwitch interconnect technologies. Cutting-edge LLMs like Llama 3.1 405B are very demanding and require the combined performance of multiple state-of-the-art GPUs for fast responses.

Parallelism techniques require a hardware platform with a robust GPU-to-GPU interconnect fabric to get maximum performance and avoid communication bottlenecks. Each NVIDIA H200 Tensor Core GPU features fourth-generation NVLink, which provides a whopping 900GB/s of GPU-to-GPU bandwidth. Every eight-GPU HGX H200 platform also ships with four NVLink Switches, enabling every H200 GPU to communicate with any other H200 GPU at 900GB/s, simultaneously.

Many LLM deployments use parallelism over choosing to keep the workload on a single GPU, which can have compute bottlenecks. LLMs seek to balance low latency and high throughput, with the optimal parallelization technique depending on application requirements.

For instance, if lowest latency is the priority, tensor parallelism is critical, as the combined compute performance of multiple GPUs can be used to serve tokens to users more quickly. However, for use cases where peak throughput across all users is prioritized, pipeline parallelism can efficiently boost overall server throughput.

The table below shows that tensor parallelism can deliver over 5x more throughput in minimum latency scenarios, whereas pipeline parallelism brings 50% more performance for maximum throughput use cases.

For production deployments that seek to maximize throughput within a given latency budget, a platform needs to provide the ability to effectively combine both techniques like in TensorRT-LLM.

Read the technical blog on boosting Llama 3.1 405B throughput to learn more about these techniques.

Different scenarios have different requirements, and parallelism techniques bring optimal performance for each of these scenarios. The Virtuous Cycle Over the lifecycle of our architectures, we deliver significant performance gains from ongoing software tuning and optimization. These improvements translate into additional value for customers who train and deploy on our platforms. They're able to create more capable models and applications and deploy their existing models using less infrastructure, enhancing th
LINK: https://blogs.nvidia.com/blog/llm-inference-roi/...
See more stories from nvidia

Most recent headlines

25/11/2025

Tracy Bonareri Onchoke: Winner, Young Journalist Award 2025

Tracy Bonareri Onchoke, an investigative journalist from Kenya is the winner of the Thomson Foundation's Young Journalist Award 2025. The 26-year-old-sele...

25/11/2025

SVG All-Stars: Blayke Scheer, Senior Director, Creative Content, YES Network

SVG All-Stars: Blayke Scheer, Senior Director, Creative Content, YES NetworkThe Indiana alum has turned storytelling into an artform for more than two decadesBy...

25/11/2025

Op-Ed: With FCC's C-Band Auction on the Horizon, Broadcasters Need Proven, Cost-Effective Alternatives

Op-Ed: With FCC's C-Band Auction on the Horizon, Broadcasters Need Proven, C...

25/11/2025

Analysis: Is Baller League Really the Future of Sport?

Analysis: Is Baller League really the future of sport? By Callum McCarthy, Editor-at-Large Tuesday, November 25, 2025 - 10:10 Print This Story With KSI on...

25/11/2025

Platinum Whitepaper: The Growth of Broadcast in the World of Major Large Scale Events with SOS Global

Platinum Whitepaper: The Growth of Broadcast in the World of Major Large Scale E...

25/11/2025

SVG Summit 2025 Preview: SVG Women's Sports Workshop

SVG Summit 2025 Preview: SVG Women's Sports WorkshopBy Samantha Gabay Tuesday, November 25, 2025 - 10:27 am Print This Story | Subscribe Story Highlig...

25/11/2025

SVG New Sponsor Spotlight: CacheFly's Matt Levine on the Evolving Role of the CDN and Prioritizing Throughput

SVG New Sponsor Spotlight: CacheFly's Matt Levine on the Evolving Role of th...

25/11/2025

Peacock's EA SPORTS Madden NFL Cast Levels Up on Thanksgiving With SkyCam as the Primary Angle and More Madden Elements

Peacock's EA SPORTS Madden NFL Cast Levels Up on Thanksgiving With SkyCam as...

25/11/2025

Sauna Is an Intimate Exploration of Queer Love and Identity

Mathias Broe attends the 2025 Sundance Film Festival premiere of Sauna at Library Center Theatre. (Photo by Michael Hurcomb/Shutterstock for Sundance Film Fes...

25/11/2025

5 Reasons to Try Spotify Premium This Holiday Season

The best playlists, podcasts, and audiobooks bring a little extra magic to your daily routine. With new features and offerings, Spotify Premium delivers even mo...

25/11/2025

New Study Reveals Australians Love Discovering New Music

Comprehensive new research confirms what we already knew: Australian music fans love the quality, quantity, and access they have to new and local music on strea...

25/11/2025

Why Use a SIM Card With The SNYPER-5G

Applicable Products Objectives The purpose of this application note is to give a brief background on 5G (NR) wireless communication an explain the reason a SN...

25/11/2025

Lionsgate and Nielsen expand partnership to deliver first-ever combined FAST channel and digital network measurement

Nielsen will now measure both Lionsgate's FAST channel MovieSphere and Movie...

25/11/2025

AP Switches to DaVinci Resolve Studio for Global News Production

FREMONT, Calif. Blackmagic Design has announced that The Associated Press (AP) has completed the transition of its global video editing platform to DaVinci Reso...

25/11/2025

Berklees Inaugural Nat King Cole and Natalie Cole Scholarship Awarded to Paris Pineyro

Berklees Inaugural Nat King Cole and Natalie Cole Scholarship Awarded to Paris P...

25/11/2025

Traditional TV Players Gained Viewers in October: Nielsen Gauge

NEW YORK NFL and college football coverage, the MLB postseason and the new fall broadcast-TV season contributed to major gains for traditional media companies a...

25/11/2025

Tower Products CEO Jim Veltrie to Retire Dec. 30

SAUGERTIES, N.Y. Tower Products, a manufacturer and distributor of pro video and audio equipment here, said President and CEO Jim Veltrie will retire from the c...

25/11/2025

Sinclair Makes Unsolicited Bid to Buy Scripps at $7 a Share

Following last week's disclosure that it had acquired a 8.2% stake in E.W. Scripps, Sinclair has filed papers with the Securities and Exchange Commission pr...

25/11/2025

VEON's QazCode and MeetKai Sign Agreement to Power National LLM Training and Local-Language Agentic Services Across VEON Markets

25 Nov 2025 VEON's QazCode and MeetKai Sign Agreement to Power National LLM...

25/11/2025

UKTV acquires three shows from Paramount Global Content Distribution for U, U&W and U&alibi

UKTV has acquired a high-profile slate of US dramas from Paramount Global Conten...

25/11/2025

Will Sharpe, Paul Bettany and Gabrielle Creevy star in a spectacular five-part event series Amadeus: Full Trailer Released

A symphony of genius, rivalry and vengeance, boldly reimagined from Peter Shaffe...

25/11/2025

Bradford Young named 2025 FilmLight Colour Awards Jury President'

Article courtesy of Cinematography World Read the article FilmLight has finalised the prestigious 2025 FilmLight Colour Awards jury and welcomed award-winning...

25/11/2025

Correccin de color en Chespirito: Sin Querer Queriendo

Article courtesy of Prensario Read the article La serie fue dirigida por Juli n de Tavira, Rodrigo Santos, y David Leche Ruiz, con direcci n de fotograf a a...

25/11/2025

Nosferatu,' Sinners,' The Studio' and Severance' Colourists Nominated for FilmLight Colour Awards

Article courtesy of The Hollywood Reporter Read the article The awards, celebr...

25/11/2025

Harbor rolls out Nara globally

Article courtesy of Televisual Read the article Already live in Los Angeles and rolling out in New York and London, Nara gives producers, colourists, conform ...

25/11/2025

ARTONE FILM integrates Baselight M

Article courtesy of Digital Media World Read the article ARTONE post-house in Tokyo is the first facility in Japan to integrate Baselight M, choosing its prec...

25/11/2025

Inside the Secret World of Hollywood's Master Colourists

Article courtesy of The Hollywood Reporter Read the article Once hidden in post-production suites, the artists who make movies and TV shows look the way they ...

25/11/2025

FilmLight Colour Awards The Winners

Article courtesy of Deadline Read the article The Brutalist' & Bad Bunny's Nuevayol' Music Video Among 2025 FilmLight Colour Award Winners - Cam...

25/11/2025

FLUX.2 Image Generation Models Now Released, Optimized for NVIDIA RTX GPUs

Black Forest Labs - the frontier AI research lab developing visual generative AI models - today released the FLUX.2 family of state-of-the-art image generation ...

24/11/2025

HBO's The Shuffle' Reveals Longtime Connection of Sports and Entertainment

HBO's The Shuffle' Reveals Longtime Connection of Sports and Entertainm...

24/11/2025

2025 Sports Broadcasting Hall of Fame: Hiroshi Kiriyama, Sony Broadcast (and Industry) Technology Icon

2025 Sports Broadcasting Hall of Fame: Hiroshi Kiriyama, Sony Broadcast (and Ind...

24/11/2025

SVG Summit 2025 Preview: FIFA, NBC Olympics, Fox Sports, CBS Sports, Netflix, NFL, NBA, MLB, USTA Power Dec. 16 Conversations

SVG Summit 2025 Preview: FIFA, NBC Olympics, Fox Sports, CBS Sports, Netflix, NF...

24/11/2025

Case Study: YES Network Streamlines Broadcast Operations with Beam Dynamics

Case Study: YES Network Streamlines Broadcast Operations with Beam DynamicsBy SVG Staff Monday, November 24, 2025 - 11:18 am Print This Story | Subscribe ...

24/11/2025

Platinum White Paper: More of Everything: How Broadcasters are Changing Their Approach to Meet Rises in Consumer Demand with Calrec

Platinum White Paper: More of Everything: How Broadcasters are Changing Their Ap...

24/11/2025

Versant Media USA Sports President Matt Hong on How Versant Has Best of Both Worlds: a Start-Up Mentality and $7 Billion Revenues

Versant Media USA Sports President Matt Hong on How Versant Has Best of Both Wor...

24/11/2025

SVG Sit-Down: NABA Director-General Rebecca Hanson on How FCC's C-Band Auction Will Impact Broadcasters

SVG Sit-Down: NABA Director-General Rebecca Hanson on How FCC's C-Band Aucti...

24/11/2025

SVG New Sponsor Spotlight: Bolin Technology's Sapan Doshi on the Proliferation of PTZ Cameras for Sports Venues and Broadcasters

SVG New Sponsor Spotlight: Bolin Technology's Sapan Doshi on the Proliferati...

24/11/2025

Spotify and Acne Studios Welcome Robyn Back to the Stage in Los Angeles

Robyn made her long-awaited return to the stage this week, as Spotify and Acne Studios brought friends and top fans together for an unforgettable evening at the...

24/11/2025

Bara Is Back: The New Spotify Camp Nou Opens Its Gates

After more than two years of redevelopment, FC Barcelona returned to its spiritual home on November 22, hosting Athletic Club in the first La Liga match at the ...

24/11/2025

L3Harris' Next-Generation Weather Imager Ready to Deliver Life-Saving Weather Data Under Critical NOAA Satellite Program

The L3Harris next-generation imager for NOAA's GeoXO satellite system will c...

24/11/2025

JioStar and Nielsen Unveil Breakthrough Cross-Screen Measurement Study, Redefining Advertising Effectiveness in Live Sports

Mumbai - November 24, 2025 - In a first-of-its-kind initiative, JioStar, in coll...

24/11/2025

Fall Sports Plus Fresh Broadcast Slate Equals Big Gains, Reshuffled Company Rankings in Nielsen's October Media Distributor Gauge

Disney Achieves Largest Monthly Share Increase, Followed by FOX and Paramount, w...

24/11/2025

Sinclair Promotes Sean LaRose to VP and General Manager in Rochester

ROCHESTER, N.Y. Sinclair said it has elevated Sean LaRose, director of sales at WUHF and partner station WHAM here, to vice president and general manager, effec...

24/11/2025

Chaos and Connection: Meet the Unfiltered Cast of Reality Dating Series Badly in Love' as Trailer Debuts

Back to All News Chaos and Connection: Meet the Unfiltered Cast of Reality Dati...

24/11/2025

Kyivstar Launches Starlink Direct to Cell Satellite Connectivity in Ukraine

24 Nov 2025 Kyivstar Launches Starlink Direct to Cell Satellite Connectivity in Ukraine Today's Launch Makes Ukraine the First Country in Europe Where Star...

24/11/2025

AI On: 3 Ways Specialized AI Agents Are Reshaping Businesses

Editor's note: This post is part of the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copi...

24/11/2025

THREE NEW PRO DANCERS JOIN THE CAST OF DANCING WITH THE STARS FOR 2026

Ahead of the new ninth series of Dancing with the Stars, kicking off in January 2026, RT and Shinawil have announced the arrival of three new faces to the hit ...

22/11/2025

Deadline Extended for 2025 Best in Market Awards

The deadline for entries for the 2025 Best in Market Awards has been extended to 23:59 PST on November 28, 2025....

22/11/2025

Clear-Com Unveils 4-Channel HelixNet Beltpack - Expanding...

Clear-Com announced the upcoming launch of its 4-Channel HelixNet beltpack, a next-generation advancement of its widely used 2-channel model. The new beltpack...