Wide Open: NVIDIA Accelerates Inference on Meta Llama 3
18/04/2024
The open model combined with NVIDIA accelerated computing equips developers, researchers and businesses to innovate responsibly across a wide variety of applications.
Trained on NVIDIA AI Meta engineers trained Llama 3 on computer clusters packing 24,576 NVIDIA H100 Tensor Core GPUs, linked with RoCE and NVIDIA Quantum-2 InfiniBand networks.
To further advance the state of the art in generative AI, Meta recently described plans to scale its infrastructure to 350,000 H100 GPUs.
Putting Llama 3 to Work Versions of Llama 3, accelerated on NVIDIA GPUs, are available today for use in the cloud, data center, edge and PC.
From a browser, developers can try Llama 3 at ai.nvidia.com. It's packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.
Businesses can fine-tune Llama 3 with their data using NVIDIA NeMo, an open-source framework for LLMs that's part of the secure, supported NVIDIA AI Enterprise platform. Custom models can be optimized for inference with NVIDIA TensorRT-LLM and deployed with NVIDIA Triton Inference Server.
Taking Llama 3 to Devices and PCs Llama 3 also runs on NVIDIA Jetson Orin for robotics and edge computing devices, creating interactive agents like those in the Jetson AI Lab.
What's more, NVIDIA RTX and GeForce RTX GPUs for workstations and PCs speed inference on Llama 3. These systems give developers a target of more than 100 million NVIDIA-accelerated systems worldwide.
Get Optimal Performance with Llama 3 Best practices in deploying an LLM for a chatbot involves a balance of low latency, good reading speed and optimal GPU use to reduce costs.
Such a service needs to deliver tokens - the rough equivalent of words to an LLM - at about twice a user's reading speed which is about 10 tokens/second.
Applying these metrics, a single NVIDIA H200 Tensor Core GPU generated about 3,000 tokens/second - enough to serve about 300 simultaneous users - in an initial test using the version of Llama 3 with 70 billion parameters.
That means a single NVIDIA HGX server with eight H200 GPUs could deliver 24,000 tokens/second, further optimizing costs by supporting more than 2,400 users at the same time.
For edge devices, the version of Llama 3 with eight billion parameters generated up to 40 tokens/second on Jetson AGX Orin and 15 tokens/second on Jetson Orin Nano.
Advancing Community Models An active open-source contributor, NVIDIA is committed to optimizing community software that helps users address their toughest challenges. Open-source models also promote AI transparency and let users broadly share work on AI safety and resilience.
Learn more about how NVIDIA's AI inference platform, including how NIM, TensorRT-LLM and Triton use state-of-the-art techniques such as low-rank adaptation to accelerate the latest LLMs.
LINK: | https://blogs.nvidia.com/blog/meta-llama3-inference-acceleration/... |
See more stories from nvidia |
More from Nvidia
21/05/2024
New Performance Optimizations Supercharge NVIDIA RTX AI PCs for Gamers, Creators and Developers
NVIDIA today announced at Microsoft Build new AI performance optimizations and i...
21/05/2024
NVIDIA Expands Collaboration With Microsoft to Help Developers Build, Deploy AI Applications Faster
If optimized AI workflows are like a perfectly tuned orchestra - where each comp...
21/05/2024
A Superbloom of Updates in the May Studio Driver Gives Fresh Life to Content Creation
Editor's note: This post is part of our In the NVIDIA Studio series, which c...
20/05/2024
Every Company to Be an Intelligence Manufacturer,' Declares NVIDIA CEO Jensen Huang at Dell Technologies World
AI heralds a new era of innovation for every business in every industry, NVIDIA ...
16/05/2024
Fight for Honor in Men of War II' on GFN Thursday
Whether looking for new adventures, epic storylines or games to play with a friend, GeForce NOW members are covered. Start off with the much-anticipated sequel...
15/05/2024
NVIDIA, Teradyne and Siemens Gather in the City of Robotics' to Discuss Autonomous Machines and AI
Senior executives from NVIDIA, Siemens and Teradyne Robotics gathered this week ...
15/05/2024
Fire It Up: Mozilla Firefox Adds Support for AI-Powered NVIDIA RTX Video
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, ...
15/05/2024
How Basecamp Research Helps Catalog Earth's Biodiversity
Basecamp Research is on a mission to capture the vastness of life on Earth at an unprecedented scale. Phil Lorenz, CTO at Basecamp Research, discusses using AI ...
15/05/2024
Needle-Moving AI Research Trains Surgical Robots in Simulation
A collaboration between NVIDIA and academic researchers is prepping robots for surgery. ORBIT-Surgical - developed by researchers from the University of Toront...
14/05/2024
Gemma, Meet NIM: NVIDIA Teams Up With Google DeepMind to Drive Large Language Model Innovation
Large language models that power generative AI are seeing intense innovation - m...
13/05/2024
Drug Discovery, STAT! NVIDIA, Recursion Speed Pharma R&D With AI Supercomputer
Described as the largest system in the pharmaceutical industry, BioHive-2 at the Salt Lake City headquarters of Recursion debuts today at No. 35, up more than 1...
13/05/2024
Drug Discovery, STAT! NVIDIA, Recursion Speed Pharma R&D With AI Supercomputer
Described as the largest system in the pharmaceutical industry, BioHive-2 at the...
12/05/2024
Dial It In: Data Centers Need New Metric for Energy Efficiency
Data centers need an upgraded dashboard to guide their journey to greater energy efficiency, one that shows progress running real-world applications. The formu...
12/05/2024
Generating Science: NVIDIA AI Accelerates HPC Research
Generative AI is taking root at national and corporate labs, accelerating high-performance computing for business and science. Researchers at Sandia National L...
12/05/2024
NVIDIA Blackwell Platform Pushes the Boundaries of Scientific Computing
Quantum computing. Drug discovery. Fusion energy. Scientific computing and physics-based simulations are poised to make giant steps across domains that benefit ...
09/05/2024
Through the Wormhole: Media.Monks' Vision for Enhancing Media and Marketing With AI
Meet Media.Monks' Wormhole, an alien-like, conversational robot with a quirk...
09/05/2024
Honkai: Star Rail' Blasts Off on GeForce NOW
Gear up, Trailblazers - Honkai: Star Rail lands on GeForce NOW this week, along with an in-game reward for members to celebrate the title's launch in the cl...
08/05/2024
Get On the Train' NVIDIA CEO Says at ServiceNow's Knowledge 2024
Now's the time to hop aboard AI, NVIDIA founder and CEO Jensen Huang declared Wednesday as ServiceNow unveiled a demo of futuristic AI avatars together with...
08/05/2024
‘Get On the Train,’ NVIDIA CEO Says at ServiceNow's Knowledge 2024
Now's the time to hop aboard AI, NVIDIA founder and CEO Jensen Huang declare...
08/05/2024
NVIDIA CEO Jensen Huang to Deliver Keynote Ahead of COMPUTEX 2024
Amid an AI revolution sweeping through trillion-dollar industries worldwide, NVIDIA founder and CEO Jensen Huang will deliver a keynote address ahead of COMPUTE...
08/05/2024
AI Decoded: New DaVinci Resolve Tools Bring RTX-Accelerated Renaissance to Editors
AI tools accelerated by NVIDIA RTX have made it easier than ever to edit and wor...
07/05/2024
NVIDIA DGX SuperPOD to Power US Government Generative AI
In support of President Biden's executive order on AI, the U.S. government will use an NVIDIA DGX SuperPOD to produce generative AI advances in climate scie...
06/05/2024
NVIDIA and Alphabet's Intrinsic Put Next-Gen Robotics Within Grasp
Intrinsic, a software and AI robotics company at Alphabet, has integrated NVIDIA AI and Isaac platform technologies to advance the complex field of autonomous r...
06/05/2024
A Mighty Meeting: Generative AI, Cybersecurity Connect at RSA
Cybersecurity experts at the RSA Conference this week will be on the hunt for ways to secure their operations in the era of generative AI. They'll find man...
02/05/2024
GeForce NOW Delivers 24 A-May-zing Games This Month
GeForce NOW brings 24 new games for members this month. Ninja Theory's highly anticipated Senua's Saga: Hellblade II will be coming to the cloud soon -...
02/05/2024
NVIDIA AI Microservices for Drug Discovery, Digital Health Now Integrated With AWS
Harnessing optimized AI models for healthcare is easier than ever as NVIDIA NIM,...
01/05/2024
Explainable AI: Insights from Arthur's Adam Wenchel
Arthur.ai enhances the performance of AI systems across various metrics like accuracy, explainability and fairness. In this episode of the NVIDIA AI Podcast, re...
01/05/2024
AI Takes a Bow: Interactive GLaDOS Robot Among 9 Winners in Hackster.io Challenge
YouTube robotics influencer Dave Niewinski has developed robots for everything f...
01/05/2024
Say It Again: ChatRTX Adds New AI Models, Features in Latest Update
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, ...
29/04/2024
SEA.AI Navigates the Future With AI at the Helm
Talk about commitment. When startup SEA.AI, an NVIDIA Metropolis partner, set out to create a system that would use AI to scan the seas to enhance maritime safe...
25/04/2024
AI Drives Future of Transportation at Asia's Largest Automotive Show
The latest trends and technologies in the automotive industry are in the spotlight at the Beijing International Automotive Exhibition, aka Auto China, which ope...
25/04/2024
Into the Omniverse: Unlocking the Future of Manufacturing With OpenUSD on Siemens Teamcenter X
Editor's note: This post is part of Into the Omniverse, a series focused on ...
25/04/2024
Blast From the Past: Stream StarCraft' and Diablo' on GeForce NOW
Support for Battle.net on GeForce NOW expands this GFN Thursday, as titles from the iconic StarCraft and Diablo series come to the cloud. StarCraft Remastered,...
24/04/2024
Rays Up: Decoding AI-Powered DLSS 3.5 Ray Reconstruction
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, ...
24/04/2024
Forecasting the Future: AI2's Christopher Bretherton Discusses Using Machine Learning for Climate Modeling
Can machine learning help predict extreme weather events and climate change? Chr...
24/04/2024
NVIDIA to Acquire GPU Orchestration Software Provider Run:ai
To help customers make more efficient use of their AI computing resources, NVIDIA today announced it has entered into a definitive agreement to acquire Run:ai, ...
24/04/2024
How Virtual Factories Are Making Industrial Digitalization a Reality
To address the shift to electric vehicles, increased semiconductor demand, manufacturing onshoring, and ambitions for greater sustainability, manufacturers are ...
23/04/2024
Small and Mighty: NVIDIA Accelerates Microsoft's Open Phi-3 Mini Language Models
NVIDIA announced today its acceleration of Microsoft's new Phi-3 Mini open l...
22/04/2024
Climate Tech Startups Integrate NVIDIA AI for Sustainability Applications
Whether they're monitoring miniscule insects or delivering insights from satellites in space, NVIDIA-accelerated startups are making every day Earth Day. S...
18/04/2024
Wide Open: NVIDIA Accelerates Inference on Meta Llama 3
NVIDIA today announced optimizations across all its platforms to accelerate Meta Llama 3, the latest generation of the large language model (LLM). The open mod...
18/04/2024
Up to No Good: No Rest for the Wicked' Early Access Launches on GeForce NOW
It's time to get a little wicked. Members can now stream No Rest for the Wicked from the cloud. It leads six new games joining the GeForce NOW library of m...
18/04/2024
NVIDIA Honors Partners of the Year in Europe, Middle East, Africa
NVIDIA today recognized 18 partners in Europe, the Middle East and Africa for their achievements and commitment to driving AI adoption. The recipients were hon...
17/04/2024
Seeing Beyond: Living Optics CEO Robin Wang on Democratizing Hyperspectral Imaging
Step into the realm of the unseen with Robin Wang, CEO of Living Optics. The sta...
17/04/2024
Moving Pictures: Transform Images Into 3D Scenes With NVIDIA Instant NeRF
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, ...
16/04/2024
New NVIDIA RTX A400 and A1000 GPUs Enhance AI-Powered Design and Productivity Workflows
AI integration across design and productivity applications is becoming the new s...
16/04/2024
To Cut a Long Story Short: Video Editors Benefit From DaVinci Resolve's New AI Features Powered by RTX
Editor's note: This post is part of our In the NVIDIA Studio series, which c...
15/04/2024
AI Is Tech's Greatest Contribution to Social Elevation,' NVIDIA CEO Tells Oregon State Students
AI promises to bring the full benefits of the digital revolution to billions acr...
10/04/2024
The Building Blocks of AI: Decoding the Role and Significance of Foundation Models
Editor's note: This post is part of the AI Decoded series, which demystifies...
10/04/2024
Combating Corruption With Data: Cleanlab and Berkeley Research Group on Using AI-Powered Investigative Analytics
Talk about scrubbing data. Curtis Northcutt, cofounder and CEO of Cleanlab, and ...
09/04/2024
NVIDIA Joins $110 Million Partnership to Help Universities Teach AI Skills
The Biden Administration has announced a new $110 million AI partnership between Japan and the United States that includes an initiative to fund research throug...