
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC users.
Large language models are driving some of the most exciting developments in AI with their ability to quickly understand, summarize and generate text-based content.
These capabilities power a variety of use cases, including productivity tools, digital assistants, non-playable characters in video games and more. But they're not a one-size-fits-all solution, and developers often must fine-tune LLMs to fit the needs of their applications.
The NVIDIA RTX AI Toolkit makes it easy to fine-tune and deploy AI models on RTX AI PCs and workstations through a technique called low-rank adaptation, or LoRA. A new update, available today, enables support for using multiple LoRA adapters simultaneously within the NVIDIA TensorRT-LLM AI acceleration library, improving the performance of fine-tuned models by up to 6x.
Fine-Tuned for Performance LLMs must be carefully customized to achieve higher performance and meet growing user demands.
These foundational models are trained on huge amounts of data but often lack the context needed for a developer's specific use case. For example, a generic LLM can generate video game dialogue, but it will likely miss the nuance and subtlety needed to write in the style of a woodland elf with a dark past and a barely concealed disdain for authority.
To achieve more tailored outputs, developers can fine-tune the model with information related to the app's use case.
Take the example of developing an app to generate in-game dialogue using an LLM. The process of fine-tuning starts with using the weights of a pretrained model, such as information on what a character may say in the game. To get the dialogue in the right style, a developer can tune the model on a smaller dataset of examples, such as dialogue written in a more spooky or villainous tone.
In some cases, developers may want to run all of these different fine-tuning processes simultaneously. For example, they may want to generate marketing copy written in different voices for various content channels. At the same time, they may want to summarize a document and make stylistic suggestions - as well as draft a video game scene description and imagery prompt for a text-to-image generator.
It's not practical to run multiple models simultaneously, as they won't all fit in GPU memory at the same time. Even if they did, their inference time would be impacted by memory bandwidth - how fast data can be read from memory into GPUs.
Lo(RA) and Behold A popular way to address these issues is to use fine-tuning techniques such as low-rank adaptation. A simple way of thinking of it is as a patch file containing the customizations from the fine-tuning process.
Once trained, customized LoRA adapters can integrate seamlessly with the foundation model during inference, adding minimal overhead. Developers can attach the adapters to a single model to serve multiple use cases. This keeps the memory footprint low while still providing the additional details needed for each specific use case.
Architecture overview of supporting multiple clients and use-cases with a single foundation model using multi-LoRA capabilities In practice, this means that an app can keep just one copy of the base model in memory, alongside many customizations using multiple LoRA adapters.
This process is called multi-LoRA serving. When multiple calls are made to the model, the GPU can process all of the calls in parallel, maximizing the use of its Tensor Cores and minimizing the demands of memory and bandwidth so developers can efficiently use AI models in their workflows. Fine-tuned models using multi-LoRA adapters perform up to 6x faster.
LLM inference performance on GeForce RTX 4090 Desktop GPU for Llama 3B int4 with LoRA adapters applied at runtime. Input sequence length is 43 tokens and output sequence length is 100 tokens. LoRA adapter max rank is 64. In the example of the in-game dialogue application described earlier, the app's scope could be expanded, using multi-LoRA serving, to generate both story elements and illustrations - driven by a single prompt.
The user could input a basic story idea, and the LLM would flesh out the concept, expanding on the idea to provide a detailed foundation. The application could then use the same model, enhanced with two distinct LoRA adapters, to refine the story and generate corresponding imagery. One LoRA adapter generates a Stable Diffusion prompt to create visuals using a locally deployed Stable Diffusion XL model. Meanwhile, the other LoRA adapter, fine-tuned for story writing, could craft a well-structured and engaging narrative.
In this case, the same model is used for both inference passes, ensuring that the space required for the process doesn't significantly increase. The second pass, which involves both text and image generation, is performed using batched inference, making the process exceptionally fast and efficient on NVIDIA GPUs. This allows users to rapidly iterate through different versions of their stories, refining the narrative and the illustrations with ease.
This process is outlined in more detail in a recent technical blog.
LLMs are becoming one of the most important components of modern AI. As adoption and integration grows, demand for powerful, fast LLMs with application-specific customizations will only increase. The multi-LoRA support added today to the RTX AI Toolkit gives developers a powerful new way to accelerate these capabilities.
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
18/05/2026
Despite a bumpy start to the year, the provider of event-production support and ...
18/05/2026
Daktronics has partnered with the University of North Carolina to manufacture and install 11 LED displays totaling more than 10,000 square feet and more than 14...
18/05/2026
CBS LA has announced a multi-year partnership with the Los Angeles Rams, covering exclusive local broadcasts of Rams preseason games, weekly year-round programm...
18/05/2026
Skyline Communications has announced an integration between its DataMiner xOps p...
18/05/2026
eCLUTCH, the hybrid esports platform powered by iKOMG, has announced an expansion of its distribution across Europe, MENA, Africa, and Asia, along with new cont...
18/05/2026
Behind The Mic provides a roundup of recent news regarding on-air talent, includ...
18/05/2026
With 22 games this season, the production team looks forward to tweaking and enhancing the coverage
After 8,660 days off the air, the WNBA returned to NBC yest...
18/05/2026
(L-R) Midori Francis and Natalie Erika James attend the Saccharine premiere during the 2026 Sundance Film Festival at The Ray Theatre on January 22, 2026, in ...
18/05/2026
Last night, the Spotify Podcast Awards in Mexico returned to the country's capital. Now in its second year, the evening honors creators whose voices are hel...
18/05/2026
ZEN-Core synth goes mobile
Roland's powerful ZEN-Core software synthesizer has just been introduced to the iPad, offering a convenient entry point into ...
18/05/2026
Versatile new limiter plug-in announced
Based in Sheffield, UK, fedDSP offer a range of plug-ins that span the music production, live sound and high-end med...
18/05/2026
A new use for convolution?
Viiri Audio's debut plug-in aims to do something a little different with convolution processing, allowing users to adjust all...
18/05/2026
Delta Goodrem shines for SBS as more than 3.27 million Australians tune in for E...
18/05/2026
The Australian Defence Force uses L3Harris T4 and T7 robots for explosive ordnan...
18/05/2026
Continued investment across Europe and Germany is expanding local teams and improving access to stock, regional expertise, and specialist broadcast support.
CV...
18/05/2026
18 May 2026
Dubai and New York, May 18, 2026 - VEON Ltd. (NASDAQ: VEON), a glob...
18/05/2026
Monday 18 May 2026
Sky News offers ad-free podcasts and bonus episodes for just...
18/05/2026
Comscore March 2026 Consumer AI Chatbot Usage Rankings Show Claude Gaining Share OpenAI's ChatGPT maintains lead while Anthropic's Claude continues to c...
17/05/2026
Delta Goodrem's Eurovision Eclipse marks end of a stellar run
17 May, 2026
Media releases
Bulgaria wins Eurovision 2026
Relive every spellbinding mome...
17/05/2026
Back to All News
Oasis Premieres on Netflix June 19
Entertainment
17 May 2026
GlobalSpain
Link copied to clipboard
Summer, sunshine, the beach, parties. T...
16/05/2026
Helps to deliver a clean, balanced midrange
Developed alongside Newfangled Audio, the latest plug-in in Eventide's software collection has been designed...
16/05/2026
Brings onboard stem rendering to RANE System One
Engine DJ have just released Engine DJ 5.0, a free update for their Engine DJ OS embedded hardware and Engi...
16/05/2026
Boris FX Continuum Pairs AI Precision and Advanced Creative Controls
Jessie Electa Petrov May 16, 2026
0 Comments
The 2026.5 release adds automatic de...
16/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
15/05/2026
Seattle Sounders FC and Seattle Reign FC, in partnership with RAVE Foundation an...
15/05/2026
Dan Brumm has served as sound designer on Bluey, the Australian children's t...
15/05/2026
The Professional Audio Manufacturers Alliance (PAMA) and Shure Incorporated are accepting applications for the 6th annual Mark Brunner Professional Audio Schola...
15/05/2026
Netflix has announced an expanded NFL schedule for 2026 and beyond under a four-year partnership extension with the NFL through the 2029-30 season. Each season,...
15/05/2026
Ateme is supporting TVRI (Televisi Republik Indonesia) with a contribution and d...
15/05/2026
Concacaf has announced the launch of a new website and mobile app built on Deltatre's FORGE platform. Concacaf.com and the mobile app, available on iOS and ...
15/05/2026
Eutelsat has announced the launch of QBC Business Economic Channel by Qatar Media Corporation, broadcasting in 4K/UHD via Eutelsat's 7/8 West video neighbo...
15/05/2026
Major League Soccer has announced four original content series timed to the 2026...
15/05/2026
The Alliance for IP Media Solutions (AIMS) has announced it will exhibit and present at InfoComm 2026, taking place June 13-19 at the Las Vegas Convention Cente...
15/05/2026
InfoComm 2026 will take place June 13-19 (exhibits June 17-19) at the Las Vegas Convention Center. The show will include sessions and exhibits covering broadcas...
15/05/2026
Tracy McGrady's Ones Basketball League (OBL) and FuboTV Inc. have announced ...
15/05/2026
Disguise has partnered with Creative Technology (CT) to deliver visual playback ...
15/05/2026
Sony Electronics has announced two new products for professional imaging: the Alpha 7R VI full-frame mirrorless camera and the FE 100-400mm F4.5 GM OSS super-te...
15/05/2026
In-venue and creative video staffers at the professional and collegiate level ha...
15/05/2026
Ratings Roundup is a rundown of recent rating news and is derived from press rel...
15/05/2026
For sports organizations, the most valuable assets are often the most sensitive:...
15/05/2026
The NFL's broadcast partners released their 2026 regular season schedules ye...
15/05/2026
Besides drawing on experience with boxing, the production integrates specialty c...