
NVIDIA today announced optimizations across all its platforms to accelerate Meta Llama 3, the latest generation of the large language model (LLM).
The open model combined with NVIDIA accelerated computing equips developers, researchers and businesses to innovate responsibly across a wide variety of applications.
Trained on NVIDIA AI Meta engineers trained Llama 3 on computer clusters packing 24,576 NVIDIA H100 Tensor Core GPUs, linked with RoCE and NVIDIA Quantum-2 InfiniBand networks.
To further advance the state of the art in generative AI, Meta recently described plans to scale its infrastructure to 350,000 H100 GPUs.
Putting Llama 3 to Work Versions of Llama 3, accelerated on NVIDIA GPUs, are available today for use in the cloud, data center, edge and PC.
From a browser, developers can try Llama 3 at ai.nvidia.com. It's packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.
Businesses can fine-tune Llama 3 with their data using NVIDIA NeMo, an open-source framework for LLMs that's part of the secure, supported NVIDIA AI Enterprise platform. Custom models can be optimized for inference with NVIDIA TensorRT-LLM and deployed with NVIDIA Triton Inference Server.
Taking Llama 3 to Devices and PCs Llama 3 also runs on NVIDIA Jetson Orin for robotics and edge computing devices, creating interactive agents like those in the Jetson AI Lab.
What's more, NVIDIA RTX and GeForce RTX GPUs for workstations and PCs speed inference on Llama 3. These systems give developers a target of more than 100 million NVIDIA-accelerated systems worldwide.
Get Optimal Performance with Llama 3 Best practices in deploying an LLM for a chatbot involves a balance of low latency, good reading speed and optimal GPU use to reduce costs.
Such a service needs to deliver tokens - the rough equivalent of words to an LLM - at about twice a user's reading speed which is about 10 tokens/second.
Applying these metrics, a single NVIDIA H200 Tensor Core GPU generated about 3,000 tokens/second - enough to serve about 300 simultaneous users - in an initial test using the version of Llama 3 with 70 billion parameters.
That means a single NVIDIA HGX server with eight H200 GPUs could deliver 24,000 tokens/second, further optimizing costs by supporting more than 2,400 users at the same time.
For edge devices, the version of Llama 3 with eight billion parameters generated up to 40 tokens/second on Jetson AGX Orin and 15 tokens/second on Jetson Orin Nano.
Advancing Community Models An active open-source contributor, NVIDIA is committed to optimizing community software that helps users address their toughest challenges. Open-source models also promote AI transparency and let users broadly share work on AI safety and resilience.
Learn more about how NVIDIA's AI inference platform, including how NIM, TensorRT-LLM and Triton use state-of-the-art techniques such as low-rank adaptation to accelerate the latest LLMs.
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
22/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
22/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
22/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
21/05/2026
Game Creek Video Columbia and Celtic, NEP Supershooter 8 will house onsite produ...
21/05/2026
Freshly graduated, this upstart producer, director, and camera operator is already working as an AP on videoboard shows for the Philadelphia Phillies
In the li...
21/05/2026
Media Links has announced a channel partnership with Clearcast Asia, a broadcast...
21/05/2026
SiriusXM and NASCAR have announced a multi-year renewal of their broadcasting agreement. SiriusXM will continue to carry live broadcasts of every NASCAR Cup Ser...
21/05/2026
Audio-Technica held a demonstration event at its Technica House location in New ...
21/05/2026
Ateme has announced that RTL Deutschland has selected Ateme's software-based...
21/05/2026
LiveU has announced that BCC Live deployed the LU900Q intelligent production unit for the first time during the 2026 Memorial Hermann IRONMAN Texas North Americ...
21/05/2026
ATSC has announced that Mark Aitken, President of ONE Media and Senior VP of Advanced Technology at Sinclair Broadcast Group, will receive the 2026 Mark Richer ...
21/05/2026
BBright has published a technical analysis of the Media eXchange Layer (MXL), de...
21/05/2026
The Esports Foundation has announced that the 2026 Esports World Cup (EWC) will be hosted in Paris, France, from July 6 through August 23. The event marks the f...
21/05/2026
Chyron has announced PRIME Scorebug, a scorebug solution built on the PRIME Platform for on-premises sports production, and has expanded Chyron LIVE with purpos...
21/05/2026
Media Links has announced the integration of its Xscend IP transport platform with Skyline Communications' DataMiner xOps platform. The integration will be ...
21/05/2026
As live sports productions continue to demand more flexible, scalable, and cost-...
21/05/2026
In advance of this year's Sports Emmy Awards, SVG is taking a deep dive into...
21/05/2026
Hey Miami & Atlanta post-production folks!
Shade is hosting a free private suite at a Braves game (6/2) and Marlins game (6/5) and have about a dozen extra tic...
21/05/2026
The Suns and Mercury become the first NBA and WNBA teams to make games available under a single broadcast partner across both over-the-air and streaming....
21/05/2026
iPhones are part of the the regular production rotation for Friday Night Baseba...
21/05/2026
In advance of this year's Sports Emmy Awards, SVG is taking a deep dive into...
21/05/2026
Heather Matarazzo as Dawn Wiener in Todd Solondz's Welcome to the Dollhouse...
21/05/2026
Spotify has always been about helping you find something you want to listen to. And over the years, we've learned your taste and the moments that matter to ...
21/05/2026
Getting concert tickets today can feel like a race you're set up to lose.
You show up at the right time, refresh endlessly, and still miss out. Too often, ...
21/05/2026
In 2022, Spotify entered a new chapter by introducing audiobooks to our platform. Since then, we've grown our catalog to include more than 700,000 titles, e...
21/05/2026
Opening remarks
ALEX
Good morning everyone, I'm Alex [Norstr m].
GUSTAV
And I'm Gustav [S derstr m].
ALEX
Whether you've been following our j...
21/05/2026
Today, Spotify hosted our third Investor Day in New York City, offering the fina...
21/05/2026
Spotify hat heute seinen dritten Investor Day in New York City veranstaltet und der Finanzwelt tiefere Einblicke in das Gesch ft, die Produktstrategie und die l...
21/05/2026
Aujourd'hui, Spotify a organis son troisi me Investor Day New York. En pl...
21/05/2026
Oggi, a New York City, Spotify ha presentato il suo terzo Investor Day, offrendo...
21/05/2026
Hoy Spotify celebr su tercer Investor Day en Nueva York, donde ofrecimos a la c...
21/05/2026
Hari ini, Spotify menyelenggarakan Investor Day yang ketiga di New York City, me...
21/05/2026
2026 : (Investor Day) , , . ...
21/05/2026
Hoje, o Spotify realizou seu terceiro Investor Day em Nova York, oferecendo co...
21/05/2026
Spotify Investor Day ...
21/05/2026
Spotify bug n, 20'nci y l d n m m z kutlad m z bu y lda, finans camias na, i modelimiz, r n stratejimiz ve uzun vadeli vizyonumuz hakk nda daha detayl ...
21/05/2026
Two new Story Packs join orchestral instrument line-up
Sonuscore have just introduced two new additions to The Score, marking the instrument's first maj...
21/05/2026
30,000 samples, 99 presets & 504 loops
Heavyocity are well known for their hard-hitting cinematic instruments, and their latest release is no exception to t...
21/05/2026
Rohde & Schwarz AI powered voice to data: The future of air traffic control take...
21/05/2026
SKY RAIDER II INTERNATIONAL's modular open systems architecture delivers expanded operational reach and mission flexibility....
21/05/2026
ASO-enabled WESCAM MX-10 systems conduct systematic wide-area maritime search patterns, autonomously managing sensor scan operations to expand coverage, reduce ...
21/05/2026
HBO Max, a new addition to Gracenote Data Hub, is home to the most sports programming among major streamers
NEW YORK May 21, 2026 New analysis by Gracenote...
21/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
21/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
21/05/2026
The UK's leading event for the creative industries united thousands of professionals for two days of networking, debate, industry insight and getting hands-...