AI, Go Fetch! New NVIDIA NeMo Retriever Microservices Boost LLM Accuracy and Throughput
23/07/2024
To help developers efficiently fetch the best proprietary data to generate knowledgeable responses for their AI applications, NVIDIA today announced four new NVIDIA NeMo Retriever NIM inference microservices.
Combined with NVIDIA NIM inference microservices for the Llama 3.1 model collection, also announced today, NeMo Retriever NIM microservices enable enterprises to scale to agentic AI workflows - where AI applications operate accurately with minimal intervention or supervision - while delivering the highest accuracy retrieval-augmented generation, or RAG.
NeMo Retriever allows organizations to seamlessly connect custom models to diverse business data and deliver highly accurate responses for AI applications using RAG. In essence, the production-ready microservices enable highly accurate information retrieval for building highly accurate AI applications.
For example, NeMo Retriever can boost model accuracy and throughput for developers creating AI agents and customer service chatbots, analyzing security vulnerabilities or extracting insights from complex supply chain information.
NIM inference microservices enable high-performance, easy-to-use, enterprise-grade inferencing. And with NeMo Retriever NIM microservices, developers can benefit from all of this - superpowered by their data.
These new NeMo Retriever embedding and reranking NIM microservices are now generally available:
NV-EmbedQA-E5-v5, a popular community base embedding model optimized for text question-answering retrieval
NV-EmbedQA-Mistral7B-v2, a popular multilingual community base model fine-tuned for text embedding for high-accuracy question answering
Snowflake-Arctic-Embed-L, an optimized community model, and
NV-RerankQA-Mistral4B-v3, a popular community base model fine-tuned for text reranking for high-accuracy question answering.
They join the collection of NIM microservices easily accessible through the NVIDIA API catalog.
Embedding and Reranking Models NeMo Retriever NIM microservices comprise two model types - embedding and reranking - with open and commercial offerings that ensure transparency and reliability.
Example RAG pipeline using NVIDIA NIM microservices for Llama 3.1 and NeMo Retriever embedding and reranking NIM microservices for a customer service AI chatbot application. An embedding model transforms diverse data - such as text, images, charts and video - into numerical vectors, stored in a vector database, while capturing their meaning and nuance. Embedding models are fast and computationally less expensive than traditional large language models, or LLMs.
A reranking model ingests data and a query, then scores the data according to its relevance to the query. Such models offer significant accuracy improvements while being computationally complex and slower than embedding models.
NeMo Retriever provides the best of both worlds. By casting a wide net of data to be retrieved with an embedding NIM, then using a reranking NIM to trim the results for relevancy, developers tapping NeMo Retriever can build a pipeline that ensures the most helpful, accurate results for their enterprise.
With NeMo Retriever, developers get access to state-of-the-art open, commercial models for building text Q&A retrieval pipelines that provide the highest accuracy. When compared with alternate models, NeMo Retriever NIM microservices provided 30% fewer inaccurate answers for enterprise question answering.
Comparison of NeMo Retriever embedding NIM and embedding plus reranking NIM microservices performance versus lexical search and an alternative embedder. Top Use Cases From RAG and AI agent solutions to data-driven analytics and more, NeMo Retriever powers a wide range of AI applications.
The microservices can be used to build intelligent chatbots that provide accurate, context-aware responses. They can help analyze vast amounts of data to identify security vulnerabilities. They can assist in extracting insights from complex supply chain information. And they can boost AI-enabled retail shopping advisors that offer natural, personalized shopping experiences, among other tasks.
NVIDIA AI workflows for these use cases provide an easy, supported starting point for developing generative AI-powered technologies.
Dozens of NVIDIA data platform partners are working with NeMo Retriever NIM microservices to boost their AI models' accuracy and throughput.
DataStax has integrated NeMo Retriever embedding NIM microservices in its Astra DB and Hyper-Converged platforms, enabling the company to bring accurate, generative AI-enhanced RAG capabilities to customers with faster time to market.
Cohesity will integrate NVIDIA NeMo Retriever microservices with its AI product, Cohesity Gaia, to help customers put their data to work to power insightful, transformative generative AI applications through RAG.
Kinetica will use NVIDIA NeMo Retriever to develop LLM agents that can interact with complex networks in natural language to respond more quickly to outages or breaches - turning insights into immediate action.
NetApp is collaborating with NVIDIA to connect NeMo Retriever microservices to exabytes of data on its intelligent data infrastructure. Every NetApp ONTAP customer will be able to seamlessly talk to their data to access proprietary business insights without having to compromise the security or privacy of their data.
NVIDIA global system integrator partners including Accenture, Deloitte, Infosys, LTTS, Tata Consultancy Services, Tech Mahindra and Wipro, as well as service delivery partners Data Monsters, EXLService (Ireland) Limited, Latentview, Quantiphi, Slalom, SoftServe and Tredence, are developing services to help enterprises add NeMo Retriever NIM microservices into their AI pipelines.
Use
More from Nvidia
17/09/2024
New AI Innovation Hub in Tunisia Drives Technological Advancement Across Africa
A new AI innovation hub for developers across Tunisia launched today in Novation City, a technology park that's designed to cultivate a vibrant, innovation ...
17/09/2024
Upgrade Livestreams With Twitch Enhanced Broadcasting and the NVIDIA Encoder
At TwitchCon - a global convention for the Twitch livestreaming platform-livestreamers and content creators this week can experience the latest technologies for...
12/09/2024
GeForce NOW to Bring Dead Rising Deluxe Remaster' to the Cloud at Launch
Rise and shine - Capcom's latest action-adventure game, Dead Rising Deluxe Remaster, heads to the cloud at launch next week. It's part of nine new titl...
11/09/2024
AI on the Air: Behind the Scenes at IBC With Holoscan for Media
AI is transforming the broadcast industry by enhancing the way content is created, distributed and consumed - but integrating the technology can be challenging....
11/09/2024
NVIDIA and Oracle to Accelerate AI and Data Processing for Enterprises
Enterprises are looking for increasingly powerful compute to support their AI workloads and accelerate data processing. The efficiency gained can translate to b...
11/09/2024
Ready to Roll: Nuro to License Its Autonomous Driving System
To accelerate autonomous vehicle development and deployment timelines, Nuro announced today it will license its Nuro Driver autonomous driving system directly t...
09/09/2024
Live Media Reimagined: NVIDIA Holoscan for Media Now Available for Production
Companies in broadcast, sports and streaming are transitioning to software-defined infrastructure to benefit from flexible deployment and to more easily adopt t...
06/09/2024
How AI Is Personalizing Customer Service Experiences Across Industries
Customer service departments across industries are facing increased call volumes, high customer service agent turnover, talent shortages and shifting customer e...
05/09/2024
19 New Games to Drop for GeForce NOW in September
Fall will be here soon, so leaf it to GeForce NOW to bring the games, with 19 joining the cloud in September. Get started with the seven games available to str...
05/09/2024
Three Ways to Ride the Flywheel of Cybersecurity AI
The business transformations that generative AI brings come with risks that AI itself can help secure in a kind of flywheel of progress. Companies who were qui...
04/09/2024
Volvo Cars EX90 SUV Rolls Out, Built on NVIDIA Accelerated Computing and AI
Volvo Cars' new, fully electric EX90 is making its way from the automaker's assembly line in Charleston, South Carolina, to dealerships around the U.S. ...
04/09/2024
Do the Math: New RTX AI PC Hardware Delivers More AI, Faster
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...
04/09/2024
Hammer Time: Machina Labs' Edward Mehr on Autonomous Blacksmith Bots and More
Edward Mehr works where AI meets the anvil. The company he cofounded, Machina L...
04/09/2024
Manufacturing Intelligence: Deltia AI Delivers Assembly Line Gains With NVIDIA Metropolis and Jetson
It all started at Berlin's Merantix venture studio in 2022, when Silviu Homo...
29/08/2024
From RAG to Richness: Startup Uplevels Retrieval-Augmented Generation for Enterprises
Well before OpenAI upended the technology industry with its release of ChatGPT i...
29/08/2024
Crystal-Clear Gaming: Visions of Mana' Sharpens on GeForce NOW
It's time to mana-fest the spirit of adventure with Square Enix's highly anticipated action role-playing game, Visions of Mana, launching today in the c...
28/08/2024
NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Debut
As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large l...
28/08/2024
More Than Fine: Multi-LoRA Support Now Available in NVIDIA RTX AI Toolkit
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...
27/08/2024
From Prototype to Prompt: NVIDIA NIM Agent Blueprints Fast-Forward Next Wave of Enterprise Generative AI
The initial wave of generative AI was driven by its use in internet services tha...
27/08/2024
Better Molecules, Faster: NVIDIA NIM Agent Blueprint Redefines Hit Identification With Generative AI-Based Virtual Screening
Aiming at making the process faster and smarter, NVIDIA on Wednesday released th...
26/08/2024
NVIDIA Launches NIM Microservices for Generative AI in Japan, Taiwan
Nations around the world are pursuing sovereign AI to produce artificial intelligence using their own computing infrastructure, data, workforce and business net...
23/08/2024
NVIDIA to Present Innovations at Hot Chips That Boost Data Center Performance and Energy Efficiency
A deep technology conference for processor and system architects from industry a...
22/08/2024
Straight Out of Gamescom and Into Xbox PC Games, GeForce NOW Newly Supports Automatic Xbox Sign-In
Straight out of Gamescom, NVIDIA introduced GeForce NOW support for Xbox automat...
21/08/2024
How Snowflake Is Unlocking the Value of Data With Large Language Models
Snowflake is using AI to help enterprises transform data into insights and applications. In this episode of NVIDIA's AI Podcast, host Noah Kravitz and Baris...
21/08/2024
Lightweight Champ: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy
Developers of generative AI typically face a tradeoff between model size and acc...
21/08/2024
SLMming Down Latency: How NVIDIA's First On-Device Small Language Model Makes Digital Humans More Lifelike
Editor's note: This post is part of the AI Decoded series, which demystifies...
20/08/2024
NVIDIA Showcases New AI Capabilities With ACE, RTX Games and More at Gamescom 2024
At Gamescom, the world's biggest gaming expo, NVIDIA has once again pushed t...
20/08/2024
High-Tech Highways: India Uses NVIDIA Accelerated Computing to Ease Tollbooth Traffic
India is home to the globe's second-largest road network, spanning nearly 4 ...
20/08/2024
Level Up: NVIDIA, MediaTek to Bring G-SYNC Display Technologies to More Gamers
Picture this: NVIDIA and MediaTek are working together to make the industry's best gaming display technologies more accessible to gamers globally. The comp...
20/08/2024
NVIDIA Announces First Digital Human Technologies On-Device Small Language Model, Improving Conversation for Game Characters
NVIDIA's first digital human technology small language model is being demons...
20/08/2024
At Gamescom 2024, GeForce NOW Brings Black Myth: Wukong' and FINAL FANTASY XVI Demo' to the Cloud
Each week, GeForce NOW elevates cloud gaming by bringing top PC games and new up...
19/08/2024
AI Chases the Storm: New NVIDIA Research Boosts Weather Prediction, Climate Simulation
As hurricanes, tornadoes and other extreme weather events occur with increased f...
15/08/2024
GeForce NOW and CurseForge Bring Mod Support to World of Warcraft: The War Within' in the Cloud
Time to be wowed: GeForce NOW members can now stream World of Warcraft on suppor...
14/08/2024
Decoding NVIDIA Edify - The Technology That Helps Developers Create Custom Models Trained on Their Data
Editor's note: This post is part of the AI Decoded series, which demystifies...
13/08/2024
Applications Now Open for $60,000 NVIDIA Graduate Fellowship Awards
Bringing together the world's brightest minds and the latest accelerated computing technology leads to powerful breakthroughs that help tackle some of the b...
09/08/2024
Golden Opportunities: California to Train Students, Educators in AI
The State of California today announced a first-of-its-kind AI education initiative with NVIDIA. The public-private collaboration supports the state's goal...
08/08/2024
GeForce NOW Celebrates 2,000 Games in the Cloud
Editor's note: This blog was updated on Aug. 9 to reflect changes to the availability of Warhammer 40,000: Speed Freeks.' This GFN Thursday marks 2,00...
08/08/2024
Figure Unveils Next-Gen Conversational Humanoid Robot With 3x AI Computing for Fully Autonomous Tasks
Silicon Valley's Figure has taken the wraps off of its next-generation Figur...
07/08/2024
Problem Solved: STEM Studies Supercharged With RTX and AI Technologies
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...
07/08/2024
Recursion CEO Chris Gibson on Accelerating the Biopharmaceutical Industry With AI
Techbio is a field combining data, technology and biology to enhance scientific ...
06/08/2024
Meet the Maker: High School Student Develops Robot Guide Dogs With NVIDIA Jetson
High school student Selin Alara Ornek is looking ahead - using machine learning and the NVIDIA Jetson platform for edge AI and robotics to create robot guide do...
06/08/2024
Editor's Paradise: NVIDIA RTX-Powered Video Software CyberLink PowerDirector Gains High-Efficiency Video Coding Upgrades
Editor's note: This post is part of our In the NVIDIA Studio series, which c...
01/08/2024
August Adventures Await: 18 New Games Coming to GeForce NOW
Members can choose their own adventure with GeForce NOW bringing 18 new games to the cloud in August - including Square Enix's fantasy role-playing game Vis...
31/07/2024
Oracle Cloud Infrastructure Expands NVIDIA GPU-Accelerated Instances for AI, Digital Twins and More
Enterprises are rapidly adopting generative AI, large language models (LLMs), ad...
31/07/2024
NVIDIA Researchers Harness Real-Time Gen AI to Build Immersive Desert World
NVIDIA researchers used NVIDIA Edify, a multimodal architecture for visual generative AI, to build a detailed 3D desert landscape within a few minutes in a live...
31/07/2024
NVIDIA and Zoox Pave the Way for Autonomous Ride-Hailing
In celebration of Zoox's 10th anniversary, NVIDIA founder and CEO Jensen Huang recently joined the robotaxi company's CEO, Aicha Evans, and its cofounde...
31/07/2024
Taking AI to Warp Speed: Decoding How NVIDIA's Latest RTX-Powered Tools and Apps Help Developers Accelerate AI on PCs and Workstations
Editor's note: This post is part of the AI Decoded series, which demystifies...
29/07/2024
For Your Edification: Shutterstock Releases Generative 3D, Getty Images Upgrades Service Powered by NVIDIA
Designers and artists have new and improved ways to boost their productivity wit...
29/07/2024
AI Gets Physical: New NVIDIA NIM Microservices Bring Generative AI to Digital Environments
Millions of people already use generative AI to assist in writing and learning. ...