
Large language models and the applications they power enable unprecedented opportunities for organizations to get deeper insights from their data reservoirs and to build entirely new classes of applications.
But with opportunities often come challenges.
Both on premises and in the cloud, applications that are expected to run in real time place significant demands on data center infrastructure to simultaneously deliver high throughput and low latency with one platform investment.
To drive continuous performance improvements and improve the return on infrastructure investments, NVIDIA regularly optimizes the state-of-the-art community models, including Meta's Llama, Google's Gemma, Microsoft's Phi and our own NVLM-D-72B, released just a few weeks ago.
Relentless Improvements Performance improvements let our customers and partners serve more complex models and reduce the needed infrastructure to host them. NVIDIA optimizes performance at every layer of the technology stack, including TensorRT-LLM, a purpose-built library to deliver state-of-the-art performance on the latest LLMs. With improvements to the open-source Llama 70B model, which delivers very high accuracy, we've already improved minimum latency performance by 3.5x in less than a year.
We're constantly improving our platform performance and regularly publish performance updates. Each week, improvements to NVIDIA software libraries are published, allowing customers to get more from the very same GPUs. For example, in just a few months' time, we've improved our low-latency Llama 70B performance by 3.5x.
NVIDIA has increased performance on the Llama 70B model by 3.5x. In the most recent round of MLPerf Inference 4.1, we made our first-ever submission with the Blackwell platform. It delivered 4x more performance than the previous generation.
This submission was also the first-ever MLPerf submission to use FP4 precision. Narrower precision formats, like FP4, reduces memory footprint and memory traffic, and also boost computational throughput. The process takes advantage of Blackwell's second-generation Transformer Engine, and with advanced quantization techniques that are part of TensorRT Model Optimizer, the Blackwell submission met the strict accuracy targets of the MLPerf benchmark.
Blackwell B200 delivers up to 4x more performance versus previous generation on MLPerf Inference v4.1's Llama 2 70B workload. Improvements in Blackwell haven't stopped the continued acceleration of Hopper. In the last year, Hopper performance has increased 3.4x in MLPerf on H100 thanks to regular software advancements. This means that NVIDIA's peak performance today, on Blackwell, is 10x faster than it was just one year ago on Hopper.
These results track progress on the MLPerf Inference Llama 2 70B Offline scenario over the past year. Our ongoing work is incorporated into TensorRT-LLM, a purpose-built library to accelerate LLMs that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM is built on top of the TensorRT Deep Learning Inference library and leverages much of TensorRT's deep learning optimizations with additional LLM-specific improvements.
Improving Llama in Leaps and Bounds More recently, we've continued optimizing variants of Meta's Llama models, including versions 3.1 and 3.2 as well as model sizes 70B and the biggest model, 405B. These optimizations include custom quantization recipes, as well as efficient use of parallelization techniques to more efficiently split the model across multiple GPUs, leveraging NVIDIA NVLink and NVSwitch interconnect technologies. Cutting-edge LLMs like Llama 3.1 405B are very demanding and require the combined performance of multiple state-of-the-art GPUs for fast responses.
Parallelism techniques require a hardware platform with a robust GPU-to-GPU interconnect fabric to get maximum performance and avoid communication bottlenecks. Each NVIDIA H200 Tensor Core GPU features fourth-generation NVLink, which provides a whopping 900GB/s of GPU-to-GPU bandwidth. Every eight-GPU HGX H200 platform also ships with four NVLink Switches, enabling every H200 GPU to communicate with any other H200 GPU at 900GB/s, simultaneously.
Many LLM deployments use parallelism over choosing to keep the workload on a single GPU, which can have compute bottlenecks. LLMs seek to balance low latency and high throughput, with the optimal parallelization technique depending on application requirements.
For instance, if lowest latency is the priority, tensor parallelism is critical, as the combined compute performance of multiple GPUs can be used to serve tokens to users more quickly. However, for use cases where peak throughput across all users is prioritized, pipeline parallelism can efficiently boost overall server throughput.
The table below shows that tensor parallelism can deliver over 5x more throughput in minimum latency scenarios, whereas pipeline parallelism brings 50% more performance for maximum throughput use cases.
For production deployments that seek to maximize throughput within a given latency budget, a platform needs to provide the ability to effectively combine both techniques like in TensorRT-LLM.
Read the technical blog on boosting Llama 3.1 405B throughput to learn more about these techniques.
Different scenarios have different requirements, and parallelism techniques bring optimal performance for each of these scenarios. The Virtuous Cycle Over the lifecycle of our architectures, we deliver significant performance gains from ongoing software tuning and optimization. These improvements translate into additional value for customers who train and deploy on our platforms. They're able to create more capable models and applications and deploy their existing models using less infrastructure, enhancing th
Most recent headlines
11/12/2025
Dalet, a leading provider of cloud-native, end-to-end media workflow solutions, ...
13/11/2025
NASHVILLE, Tenn. Field & Stream and Outdoor America have formed a strategic partnership to launch Field & Stream TV, rebranding Outdoor America's free ad-su...
13/11/2025
PHOENIX, Ariz. Silicondust has announced it is now an ATSC 3.0 Certificate Authority for NextGen TV and said that it is offering an Online Certificate Status Pr...
13/11/2025
NEW YORK Nielsen has announced that Peter Naylor, an ad sales executive who has worked at some of the largest media companies in the world, will be its first ch...
13/11/2025
PHILADELPHIA After more than 20 years of at CBS Philadelphia and an award-winning career spanning nearly four decades, Jim Donovan, anchor of CBS News Philadelp...
13/11/2025
BOSTON Frontline, PBS's investigative documentary series produced at GBH in Boston, has announced the newest class of partners for its Local Journalism Init...
13/11/2025
A groundbreaking new study by the BBC and the European Broadcasting Union (EBU) has found serious problems with news summaries generated by AI assistants....
12/11/2025
For me, no story is too small if it speaks to the ordinary Kenyan, says Wangu Kanuri, a multimedia journalist and contributor to the Nation Media Group working...
12/11/2025
Tracy Bonareri Onchoke is an investigative journalist from Kenya who strives to tell stories that amplify voices pushed to the margins' in her reports for ...
12/11/2025
Godwin Asediba who is an investigative journalist, producer and news anchor with TV3 and 3FM in Ghana, has received death threats for his work exposing injustic...
12/11/2025
SVG TranSPORT 2025: All Sessions Now Available to Watch on SVG PLAYEvent addressed the latest in live sports video contribution and distribution technologyBy SV...
12/11/2025
L-R: Ed Harris, Gyula Gazdag
Inaugural Robert Redford Luminary Award to Honor E...
12/11/2025
By Bailey Pennick
One of the most exciting things about the Sundance Film Festi...
12/11/2025
In 2023, Morgan Wallen made history when Last Night became the first solo coun...
12/11/2025
Calrec delivers future-focused production for Whisper Cymru at Wales's first-ever dedicated remote production hub Supporting a growing roster of live sports...
12/11/2025
LONDON, England November 11, 2025 - Blue Lucy, a leading provider of media management and workflow automation solutions, is pleased to announce the renewal o...
12/11/2025
ALAMEDA, Calif. Clear-Com says its communications gear was recently deployed for the ADAC RAVENOL 24h Race at Germany's N rburgring circuit, which set a rec...
12/11/2025
BRUSSELS Mediagenix has announced that it has joined the Amazon Web Services (AWS) Independent Software Vendor (ISV) Accelerate Program (ISV). This acceptance f...
12/11/2025
HUELVA, Spain Alfalite, Europe's only LED screen manufacturer, has announced a strategic partnership with Adistec Corp, a leading distributor of infrastruct...
12/11/2025
MONTREAL Stingray Group Inc. has announced that it has entered into a definitive agreement to acquire TuneIn Holdings, Inc. ( 'TuneIn''), a pioneer ...
12/11/2025
Vubiquity, an Amdocs company and global leader in technology-led media services, today announced it has achieved the Amazon Web Services (AWS) Media & Entertain...
12/11/2025
Over 200 upgraded sites now delivering 2G and 3G mobile data services to more th...
12/11/2025
NEW YORK and WASHINGTON DirecTV Advertising has launched DirecTV Elect, a new digital platform powered by AI that is specifically designed for political adverti...
12/11/2025
WASHINGTON Federal Communications Commission Chair Brendan Carr has weighed in on the blackout of ABC, ESPN and other Disney programming on YouTube TV with a po...
12/11/2025
12 Nov 2025
VEON Wins Corporate Governance Awards for Kyivstar Listing and Tech...
12/11/2025
GROWING DATA DEMAND CONTINUES TO BE THE MAIN DRIVER OF MAGYAR TELEKOM'S RESU...
12/11/2025
Wednesday 12 November 2025
Sky unveils first of its kind clean power system for film and TV production
Sky has today unveiled a major new clean energy system ...
12/11/2025
Back to All News
The Accident 2 Welcomes B rbara de Regil to the Cast and Premi...
12/11/2025
Wednesday 12th November - Bel m, Brazil - Today, leading organizations IEC, ISO and ULSE, initiators of the Standards Pavilion at UNFCCC COP30, published a join...
12/11/2025
Arvato Systems Becomes Preferred Business Partner of the German Bundesverband E-...
12/11/2025
RT Choice Music Prize
In association with IMRO and IRMA
2 0 2 6 K E Y D A T E S
Irish Album of the Year 2025 Shortlist 19th January
Irish Song of the ...
12/11/2025
In the age of AI reasoning, training smarter, more capable models is critical to scaling intelligence. Delivering the massive performance to meet this new age r...
12/11/2025
Parents jailed for over two years after bringing their daughter to hospital for ...
12/11/2025
Large language model (LLM)-based AI assistants are powerful productivity tools, but without the right context and information, they can struggle to provide nuan...
11/11/2025
SVG Sit-Down: How Pixellot's Automated-Production-Tech Stack Is Evolving in ...
11/11/2025
Introducing SVG's New Platinum White Papers' PlatformTop technology providers detail how they are innovating in sports productionBy SVG Staff
Tuesday...
11/11/2025
SVG All-Stars: Vanessa Lindsey, Senior Director, Technical and Remote Operations...
11/11/2025
Lesson Plan: How Big Ten Network's StudentU Produces Broadcast Pros - and 2,...
11/11/2025
Peacock Performance View Feature Now Available for All NBA Games on PeacockBy Jason Dachman, Editorial Director, U.S.
Tuesday, November 11, 2025 - 2:10 pm
P...
11/11/2025
Today, Spotify and the National Music Publishers' Association (NMPA) launche...
11/11/2025
This year, SGL Carbons Willich site is celebrating a special anniversary. For 30...
11/11/2025
Rural connectivity rising fast
Traditional media still matters
Rural Filipinos...
11/11/2025
Wohler has said it has added three Secure Reliable Transport (SRT) connections to its new iVAM2-MPEG monitor....
11/11/2025
OpenDrives, Inc., a leader in software-defined data storage and data services, recently hosted an exclusive event in Los Angeles to celebrate the soft launch of...
11/11/2025
NAKIVO Inc., a fast-growing software company specialising in data protection and disaster recovery solutions for virtual, physical, cloud, and SaaS environments...
11/11/2025
Amagi, a cloud-based SaaS technology solutions provider for broadcast and streaming TV, today announced that Kogan Australia, one of the country's leading e...
11/11/2025
The Romanian Radio Broadcasting Company (SRR) has commissioned a new state-of-the-art radio production and broadcast facility centred on a DHD RX2 and TX2 conso...
11/11/2025
Alfalite, Europe's only LED screen manufacturer, has announced a strategic partnership with Adistec Corp, a leading value-added distributor of infrastructur...
11/11/2025
Delivering dedicated remote production facilities across a range of live sports for a growing roster of broadcast clients, the Cymru Broadcast Centre (CBC) at W...
11/11/2025
Luxembourg, 6 November 2025 -- SES S.A. fully consolidates Intelsat from 17 July 2025 and announces financial results for the nine months and three months ended...