
NVIDIA today announced optimizations across all its platforms to accelerate Meta Llama 3, the latest generation of the large language model (LLM).
The open model combined with NVIDIA accelerated computing equips developers, researchers and businesses to innovate responsibly across a wide variety of applications.
Trained on NVIDIA AI Meta engineers trained Llama 3 on computer clusters packing 24,576 NVIDIA H100 Tensor Core GPUs, linked with RoCE and NVIDIA Quantum-2 InfiniBand networks.
To further advance the state of the art in generative AI, Meta recently described plans to scale its infrastructure to 350,000 H100 GPUs.
Putting Llama 3 to Work Versions of Llama 3, accelerated on NVIDIA GPUs, are available today for use in the cloud, data center, edge and PC.
From a browser, developers can try Llama 3 at ai.nvidia.com. It's packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.
Businesses can fine-tune Llama 3 with their data using NVIDIA NeMo, an open-source framework for LLMs that's part of the secure, supported NVIDIA AI Enterprise platform. Custom models can be optimized for inference with NVIDIA TensorRT-LLM and deployed with NVIDIA Triton Inference Server.
Taking Llama 3 to Devices and PCs Llama 3 also runs on NVIDIA Jetson Orin for robotics and edge computing devices, creating interactive agents like those in the Jetson AI Lab.
What's more, NVIDIA RTX and GeForce RTX GPUs for workstations and PCs speed inference on Llama 3. These systems give developers a target of more than 100 million NVIDIA-accelerated systems worldwide.
Get Optimal Performance with Llama 3 Best practices in deploying an LLM for a chatbot involves a balance of low latency, good reading speed and optimal GPU use to reduce costs.
Such a service needs to deliver tokens - the rough equivalent of words to an LLM - at about twice a user's reading speed which is about 10 tokens/second.
Applying these metrics, a single NVIDIA H200 Tensor Core GPU generated about 3,000 tokens/second - enough to serve about 300 simultaneous users - in an initial test using the version of Llama 3 with 70 billion parameters.
That means a single NVIDIA HGX server with eight H200 GPUs could deliver 24,000 tokens/second, further optimizing costs by supporting more than 2,400 users at the same time.
For edge devices, the version of Llama 3 with eight billion parameters generated up to 40 tokens/second on Jetson AGX Orin and 15 tokens/second on Jetson Orin Nano.
Advancing Community Models An active open-source contributor, NVIDIA is committed to optimizing community software that helps users address their toughest challenges. Open-source models also promote AI transparency and let users broadly share work on AI safety and resilience.
Learn more about how NVIDIA's AI inference platform, including how NIM, TensorRT-LLM and Triton use state-of-the-art techniques such as low-rank adaptation to accelerate the latest LLMs.
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
06/09/2026
June 9 2026, 23:00 (PDT) Dolby and MagentaTV Bring Fans Closer to the FIFA Worl...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
01/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
01/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
01/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
01/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
01/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
01/07/2026
Manfrotto Introduces UNCOVER, the new premium camera bag collection for modern h...
01/07/2026
Blackmagic Design Powers Houston Tamil Sangam Literacy Competition
Brie Clayton July 1, 2026
0 Comments
Volunteers use ATEM Mini Pro, Blackmagic Desig...
01/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
01/07/2026
PlayBox Technology has published State of Broadcast Infrastructure 2026, an in-depth industry research report examining the technologies, operational challenges...
01/07/2026
LONDON, UK, 1 JULY Jigsaw24 has appointed Alan Henry as Head of Sales for Media and Entertainment, reinforcing its continued investment in helping broadcaster...
01/07/2026
Content Vault, the patent-pending secure content distribution platform protecting high-value media from disclosures, theft and unauthorised access, today announ...
01/07/2026
Bitcentral, Inc. a leading provider of enterprise software and digital media solutions for news, sports and entertainment broadcasters, as well as streaming pla...
01/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
01/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
01/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
01/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
01/07/2026
Groundbreaking First Nations Screen Business Accelerator launched through nation...
01/07/2026
Chyron Launches the All-New Chyron Academy: A Reimagined, Hands-On Learning Expe...
01/07/2026
Amplium Captures Kawasaki Brave Thunders Game with Blackmagic URSA Cine Immersiv...
01/07/2026
Boris FX Optics Expands Plugin Support to Apple Photos, Capture One, and Affinit...
01/07/2026
9.2 million streams on RT Player between 11 and 28 June
Reach of 2.9 million viewers during Group stages of FIFA World Cup on RT 2
23 million video views acr...
30/06/2026
Could your journalism reach an international stage?
Entries are now open for the Thomson Foundation's Young Journalist Award 2026, one of the most prestigi...
30/06/2026
As Brazil's only way for fans to see all 104 matches, YouTube channel proves the power of digital...
30/06/2026
Brings together saturation & lo-fi effects
Following on from the release of their Voxcraft vocal-processing plug-in, UJAM have announced the launch of Retro...
30/06/2026
New IR reverb engine, Juno-inspired chorus & more
The latest version of Rapid Flow's hardware-emulation synth plug-in expands on its predecessor with a ...
30/06/2026
Excels at heavy-handed VCA compression
For their latest release, Shy Audio have recreated the crunchy' sound of a rackmount compressor that found its w...
30/06/2026
Component scarcity drives cost increases
Shortly after Apple's CEO Tim Cook acknowledged that cost increases would soon be inevitable , the company hav...
30/06/2026
Statement regarding GetUp Save Our SBS' campaign
30 June, 2026
Media releases
The GetUp Save Our SBS' campaign is an independent initiative. SBS ...
30/06/2026
Hitachi and Bank Pekao S.A. have completed the installation of the first Hitachi...
30/06/2026
eds3_5_jq(document).ready(function($) { $(#eds_sliderM519).chameleonSlider_2_1({ content_source:......
30/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
30/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
30/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
30/06/2026
MainStreaming, the award-winning and innovative Edge Video Delivery Network, today announced that it has been selected by ITV to support the delivery of ITVX, I...
30/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
30/06/2026
When Wheel of Fortune and Jeopardy! needed to upgrade their wireless communications system, they turned to Clear-Com FreeSpeak wireless for their iconic televi...
30/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
30/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
30/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
30/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
30/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
30/06/2026
Other World Computing Launches New Atlas Core Line with 256GB CFExpress 4.0 Type...
30/06/2026
DaVinci Resolve Studio Used for Taketoshi Sado's Perfume Cold Sleep -25 year...
30/06/2026
Life sciences has entered an era of computational scale, and for more than a dec...
30/06/2026
Fernando Cruz and Jaz Wray Join as Regional Sales Managers; Bringing Extensive S...
30/06/2026
As organizations move from AI pilots to production AI factories, infrastructure decisions have shifted from peak chip specifications to cost per token: how many...