How to Accelerate Larger LLMs Locally on RTX With LM Studio

23/10/2024

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for GeForce RTX PC and NVIDIA RTX workstation users.

Large language models (LLMs) are reshaping productivity. They're capable of drafting documents, summarizing web pages and, having been trained on vast quantities of data, accurately answering questions about nearly any topic.

LLMs are at the core of many emerging use cases in generative AI, including digital assistants, conversational avatars and customer service agents.

Many of the latest LLMs can run locally on PCs or workstations. This is useful for a variety of reasons: users can keep conversations and content private on-device, use AI without the internet, or simply take advantage of the powerful NVIDIA GeForce RTX GPUs in their system. Other models, because of their size and complexity, do no't fit into the local GPU's video memory (VRAM) and require hardware in large data centers.

However, Iit i's possible to accelerate part of a prompt on a data-center-class model locally on RTX-powered PCs using a technique called GPU offloading. This allows users to benefit from GPU acceleration without being as limited by GPU memory constraints.

Size and Quality vs. Performance There's a tradeoff between the model size and the quality of responses and the performance. In general, larger models deliver higher-quality responses, but run more slowly. With smaller models, performance goes up while quality goes down.

This tradeoff isn't always straightforward. There are cases where performance might be more important than quality. Some users may prioritize accuracy for use cases like content generation, since it can run in the background. A conversational assistant, meanwhile, needs to be fast while also providing accurate responses.

The most accurate LLMs, designed to run in the data center, are tens of gigabytes in size, and may not fit in a GPU's memory. This would traditionally prevent the application from taking advantage of GPU acceleration.

However, GPU offloading uses part of the LLM on the GPU and part on the CPU. This allows users to take maximum advantage of GPU acceleration regardless of model size.

Optimize AI Acceleration With GPU Offloading and LM Studio LM Studio is an application that lets users download and host LLMs on their desktop or laptop computer, with an easy-to-use interface that allows for extensive customization in how those models operate. LM Studio is built on top of llama.cpp, so it's fully optimized for use with GeForce RTX and NVIDIA RTX GPUs.

LM Studio and GPU offloading takes advantage of GPU acceleration to boost the performance of a locally hosted LLM, even if the model can't be fully loaded into VRAM.

With GPU offloading, LM Studio divides the model into smaller chunks, or subgraphs, which represent layers of the model architecture. Subgraphs aren't permanently fixed on the GPU, but loaded and unloaded as needed. With LM Studio's GPU offloading slider, users can decide how many of these layers are processed by the GPU.

LM Studio's interface makes it easy to decide how much of an LLM should be loaded to the GPU. For example, imagine using this GPU offloading technique with a large model like Gemma 2 27B. 27B refers to the number of parameters in the model, informing an estimate as to how much memory is required to run the model.

According to 4-bit quantization, a technique for reducing the size of an LLM without significantly reducing accuracy, each parameter takes up a half byte of memory. This means that the model should require about 13.5 billion bytes, or 13.5GB - plus some overhead, which generally ranges from 1-5GB.

Accelerating this model entirely on the GPU requires 19GB of VRAM, available on the GeForce RTX 4090 desktop GPU. With GPU offloading, the model can run on a system with a lower-end GPU and still benefit from acceleration.

The table above shows how to run several popular models of increasing size across a range of GeForce RTX and NVIDIA RTX GPUs. The maximum level of GPU offload is indicated for each combination. Note that even with GPU offloading, users still need enough system RAM to fit the whole model. In LM Studio, it's possible to assess the performance impact of different levels of GPU offloading, compared with CPU only. The below table shows the results of running the same query across different offloading levels on a GeForce RTX 4090 desktop GPU.

Depending on the percent of the model offloaded to GPU, users see increasing throughput performance compared with running on CPUs alone. For the Gemma 2 27B model, performance goes from an anemic 2.1 tokens per second to increasingly usable speeds the more the GPU is used. This enables users to benefit from the performance of larger models that they otherwise would've been unable to run. On this particular model, even users with an 8GB GPU can enjoy a meaningful speedup versus running only on CPUs. Of course, an 8GB GPU can always run a smaller model that fits entirely in GPU memory and get full GPU acceleration.

Achieving Optimal Balance LM Studio's GPU offloading feature is a powerful tool for unlocking the full potential of LLMs designed for the data center, like Gemma 2 27B, locally on RTX AI PCs. It makes larger, more complex models accessible across the entire lineup of PCs powered by GeForce RTX and NVIDIA RTX GPUs.

Download LM Studio to try GPU offloading on larger models, or experiment with a variety of RTX-accelerated LLMs running locally on RTX AI PCs and workstations.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what's new and what's next by subscribing to the AI Decoded newsletter.

LINK:	https://blogs.nvidia.com/blog/ai-decoded-lm-studio/...
	See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

07/10/2026

Dalet Flex LTS Delivers Smarter Media Operations from Ingest to Distribution

Dalet, a leading technology and service provider for media-rich organizations, today announced the latest Long-Term Supported (LTS) release of Dalet Flex. Build...

06/09/2026

Dolby and MagentaTV Bring Fans Closer to the FIFA World Cup 2026 in Germany with Dolby Vision and Dolby Atmos

June 9 2026, 23:00 (PDT) Dolby and MagentaTV Bring Fans Closer to the FIFA Worl...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

30/07/2026

CP Communications Provides Broadcast Support for 2026 MLB All-Star Week in Philadelphia

CP Communications provided RF audio, RF video, communications, RF coordination, ...

30/07/2026

OpenDrives Named to 2026 CRN Storage 100 List

OpenDrives has been included in CRN's 2026 Storage 100 list in the Software-Defined Storage category. The annual list, selected by the CRN editorial team, r...

30/07/2026

SES Selected by LATAM Airlines for Multi-Orbit Satellite Connectivity

LATAM Airlines has selected SES to provide multi-orbit inflight connectivity to its fleet of Airbus and Embraer aircraft. More than 60 aircraft - including Airb...

30/07/2026

Tagboard Launches Producer API with Elgato Stream Deck Integration

Tagboard has launched the Producer API, an open interface that allows key commands in Tagboard's live production environment to be triggered from external d...

30/07/2026

FOX Sports Adds Del Mar to Horse Racing Coverage in Multi-Year Deal

FOX Sports and the New York Racing Association (NYRA) have announced a multi-year agreement with Del Mar Thoroughbred Club that will make FOX Sports the exclusi...

30/07/2026

ABC Commercial Launches Four FAST Channels on LG Smart TVs

ABC Commercial has launched four free ad-supported streaming television (FAST) channels on LG Smart TVs across North America, Great Britain, and select countrie...

30/07/2026

Six Kings Slam Returns to Netflix in October with Sinner, Alcaraz, Djokovic

The Six Kings Slam exhibition tennis tournament will return to Riyadh on October 21, 22, and 24, streaming live on Netflix at no additional cost to subscribers....

30/07/2026

Central Christian Church Uses Clear-Coms Gen-IC to Connect Remote Production Teams

Central Christian Church in Mt. Vernon, Illinois has deployed Clear-Com's LQ...

30/07/2026

Blackmagic Design Releases UltraStudio Express 3G Capture and Playback Devices

Blackmagic Design has released the UltraStudio Express 3G family, a pair of USB4 capture and playback devices compatible with Mac, Windows, and Linux computers ...

30/07/2026

ESPN Secures Media Rights to Womens Pro Baseball League

ESPN has reached a media rights agreement with the Women's Pro Baseball League (WPBL), making ESPN the national streaming home of the league's 2026 seas...

30/07/2026

Most Valuable Promotions and Professional Fighters League Merge

Most Valuable Promotions (MVP) and the Professional Fighters League (PFL) have announced a merger that will operate under the MVP banner. PFL CEO John Martin wi...

30/07/2026

Columbus Blue Jackets To Simulcast TV and Radio Broadcasts Beginning 2026-27

The Columbus Blue Jackets will simulcast all game broadcasts across television and radio beginning with the 2026-27 NHL season. Under the new format, the televi...

30/07/2026

TMRW Sports Selects Populous To Design Professional Flag Football Stadium

TMRW Sports has selected Populous as architect for a purpose-built stadium for its professional flag football league, being developed in partnership with the NF...

30/07/2026

SiriusXM Launches Sports Pass Subscription at $5 per Month

SiriusXM has announced SiriusXM Sports Pass, a new subscription plan launching September 1 that bundles the company's sports audio programming into a single...

30/07/2026

SVG Cloud & Content Workflows Summit Focuses on Live Production, Content Management, AI

Leagues, broadcasters, and technologists gather to tackle cloud economics, distr...

30/07/2026

AWSs Jason Dvorkin on the Power of AWS Elemental Inference, Agentic AI in Live Sports Production

A major player in the cloud-based solutions, Amazon Web Services (AWS) has conti...

30/07/2026

Minnesota Timberwolves Ink Local-Media-Rights Deal With DAZN

The NBA will provide production for all local game broadcasts, including pregame and postgame coverage...

30/07/2026

Cosm Detroit To Open September 10

Cosm will open its fourth immersive sports and entertainment venue in downtown Detroit on September 10. Located at 25 Cadillac Square, adjacent to Campus Martiu...

30/07/2026

NFL Network Added to ESPN App Beginning July 30

NFL Network will be available on the ESPN App beginning July 30 for Unlimited plan subscribers. The addition brings NFL Network's live games, studio program...

30/07/2026

NEP Platform Adds Riedels SimplyLive Production Suite

NEP Group has added Riedel's SimplyLive Production Suite to NEP Platform, giving customers access to the live production software through NEP's orchestr...

30/07/2026

SUNDANCE INSTITUTE NAMES TRISTA SCHROEDER GENERAL COUNSEL & CHIEF OPERATING OFFICER

LOS ANGELES, CA, July 30, 2026 - The nonprofit Sundance Institute today announce...

30/07/2026

Waves release Atlas Reverb

A new take on classic algorithmic reverbs Waves have just introduced a new flagship algorithmic reverb plug-in that offers the company's take on the ico...

30/07/2026

Arturia's Summer Sale now live

Save up to 30% on virtual instruments & plug-ins Arturia have recently launched their annual Summer Sale, which sees discounts of up to 30% applied across a...

30/07/2026

SSL unveil the Odyssey console

ActiveAnalogue technology powers new flagship modular console A year on from the launch of the Oracle, SSL have revealed a new flagship console which they s...

30/07/2026

DT 275 Pro from beyerdynamic

Up to 22dBA of passive attenuation The latest arrival to the beyerdynamic range introduces a new compact monitoring headphone that's been designed to ta...

30/07/2026

Apprenticeship success at SGL Carbon in Meitingen

As part of its traditional graduation ceremony, SGL Carbon honored the achievements of a total of twelve young professionals who successfully completed their ap...

30/07/2026

Premiere date revealed for SBS Original drama The Airport Chaplain, starring Hugo Weaving and Shabana Azeez

Premiere date revealed for SBS Original drama The Airport Chaplain, starring Hug...

30/07/2026

TVNZ Works with Evergent to Successfully Debut New Stream...

TVNZ and Evergent are celebrating a major TVNZ milestone - the evolution from a free-to-view, ad-supported business into a hybrid model, offering premium sport...

30/07/2026

WebMiere Adds Native VP9/AV1 WebM/MKV Import to Adobe Premiere Pro

WebMiere Adds Native VP9/AV1 WebM/MKV Import to Adobe Premiere Pro Brie Clayton July 29, 2026 0 Comments Brought to you by KawaiiEngine, WebMiere is a...

30/07/2026

Comments on FCC License Renewals for ABC Stations Top 150,000

Share Copy link Facebook X Linkedin Bluesky Email...

30/07/2026

CW Sports Live Streaming to Launch Aug. 4 on the ESPN App

Share Copy link Facebook X Linkedin Bluesky Email...

30/07/2026

Fox Advertising, iSpot Expand Measurement Deal

Share Copy link Facebook X Linkedin Bluesky Email...

30/07/2026

GlobalM Expands Software-Defined Media Platform for Cloud...

At IBC2026, GlobalM will showcase the latest evolution of its software-defined media platform, enabling broadcasters, telecom operators and media organisations ...

30/07/2026

Digital Alert Systems at 2026 TAB Show

At the 2026 TAB Show, Digital Alert Systems will highlight and demo the new DAS-Link AIR, an over-the-air common alert protocol (CAP) dissemination system for ...

30/07/2026

OpenDrives Named to the 2026 CRN Storage 100 List

OpenDrives Named to the 2026 CRN Storage 100 List Company Recognized Among the 50 Coolest Software-Defined Storage Vendors for Unlimited Capacity, S3 Hybrid M...

30/07/2026

VIDA Brings AI Scene-Level Search to Media Archives Endin...

New Scene Search capability in VIDA Content OS makes video content itself searchable, letting broadcasters and streamers find key moments across millions of ass...

30/07/2026

Tribeca Films Announces Summer 2026 Acquisitions Including Loves Company,' A Very Merry Breakup,' And Magic Hour'

July 30th, 2026 TRIBECA FILMS ANNOUNCES SUMMER 2026 ACQUISITIONS INCLUDING L...

30/07/2026

Which WSL matches are Sky Sports showing on the 2026/27 opening weekend?

Thursday 30 July 2026 Which WSL matches are Sky Sports showing on the 2026/27 opening weekend? The season will kick off on Friday 4 September with Manchester ...

30/07/2026

Enjoy discounts on the iPhone Air, plus 5Gb speeds with Ultimate TV for 2.10 a day

Thursday 30 July 2026 Enjoy discounts on the iPhone Air, plus 5Gb speeds with U...

30/07/2026

Creedon's Musical Atlas of Ireland returns for another season of stories and songs

Broadcaster John Creedon explores Ireland's rich musical heritage Sunday 2 A...

30/07/2026

Best in Class: Stream PC Games and Study on the Same Laptop With GeForce NOW

Back to school means balancing assignments, deadlines and downtime. GeForce NOW makes it easy to have it all. With cloud gaming, everyday laptops used for clas...

29/07/2026

Grass Valley Op-Ed Series: Seven Paradoxes Shaping the Next Era of Media Production

In this series, Grass Valley explores the seven paradoxes shaping the next era o...

29/07/2026

SVG Students to Watch: A Look Back at a Year of Spotlighting the Emerging Talent in Live Sports Production

Thats a wrap on Season 1 of SVG Students to Watch! Across the country, these stu...

29/07/2026

DAZN to Become Exclusive DTC In-Market Streaming Home of YES Network, MSG Networks; Gotham App to Shut Down

The agreement will move Gotham Sports App subscribers to DAZN during the 2026-27...

29/07/2026

Big South Promotes Emily Dassel to Associate Commissioner for Brand and Creative Strategy

Longtime digital-media leader expands role overseeing conference branding, video...

29/07/2026

Give Me the Backstory: Get to Know Gregg Araki, the Director of I Want Your Sex

By Bailey Pennick One of the most exciting things about the Sundance Film Festival is having a front-row seat for the bright future of independent filmmaking. ...

View most recent headlines