
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for GeForce RTX PC and NVIDIA RTX workstation users.
Large language models (LLMs) are reshaping productivity. They're capable of drafting documents, summarizing web pages and, having been trained on vast quantities of data, accurately answering questions about nearly any topic.
LLMs are at the core of many emerging use cases in generative AI, including digital assistants, conversational avatars and customer service agents.
Many of the latest LLMs can run locally on PCs or workstations. This is useful for a variety of reasons: users can keep conversations and content private on-device, use AI without the internet, or simply take advantage of the powerful NVIDIA GeForce RTX GPUs in their system. Other models, because of their size and complexity, do no't fit into the local GPU's video memory (VRAM) and require hardware in large data centers.
However, Iit i's possible to accelerate part of a prompt on a data-center-class model locally on RTX-powered PCs using a technique called GPU offloading. This allows users to benefit from GPU acceleration without being as limited by GPU memory constraints.
Size and Quality vs. Performance There's a tradeoff between the model size and the quality of responses and the performance. In general, larger models deliver higher-quality responses, but run more slowly. With smaller models, performance goes up while quality goes down.
This tradeoff isn't always straightforward. There are cases where performance might be more important than quality. Some users may prioritize accuracy for use cases like content generation, since it can run in the background. A conversational assistant, meanwhile, needs to be fast while also providing accurate responses.
The most accurate LLMs, designed to run in the data center, are tens of gigabytes in size, and may not fit in a GPU's memory. This would traditionally prevent the application from taking advantage of GPU acceleration.
However, GPU offloading uses part of the LLM on the GPU and part on the CPU. This allows users to take maximum advantage of GPU acceleration regardless of model size.
Optimize AI Acceleration With GPU Offloading and LM Studio LM Studio is an application that lets users download and host LLMs on their desktop or laptop computer, with an easy-to-use interface that allows for extensive customization in how those models operate. LM Studio is built on top of llama.cpp, so it's fully optimized for use with GeForce RTX and NVIDIA RTX GPUs.
LM Studio and GPU offloading takes advantage of GPU acceleration to boost the performance of a locally hosted LLM, even if the model can't be fully loaded into VRAM.
With GPU offloading, LM Studio divides the model into smaller chunks, or subgraphs, which represent layers of the model architecture. Subgraphs aren't permanently fixed on the GPU, but loaded and unloaded as needed. With LM Studio's GPU offloading slider, users can decide how many of these layers are processed by the GPU.
LM Studio's interface makes it easy to decide how much of an LLM should be loaded to the GPU. For example, imagine using this GPU offloading technique with a large model like Gemma 2 27B. 27B refers to the number of parameters in the model, informing an estimate as to how much memory is required to run the model.
According to 4-bit quantization, a technique for reducing the size of an LLM without significantly reducing accuracy, each parameter takes up a half byte of memory. This means that the model should require about 13.5 billion bytes, or 13.5GB - plus some overhead, which generally ranges from 1-5GB.
Accelerating this model entirely on the GPU requires 19GB of VRAM, available on the GeForce RTX 4090 desktop GPU. With GPU offloading, the model can run on a system with a lower-end GPU and still benefit from acceleration.
The table above shows how to run several popular models of increasing size across a range of GeForce RTX and NVIDIA RTX GPUs. The maximum level of GPU offload is indicated for each combination. Note that even with GPU offloading, users still need enough system RAM to fit the whole model. In LM Studio, it's possible to assess the performance impact of different levels of GPU offloading, compared with CPU only. The below table shows the results of running the same query across different offloading levels on a GeForce RTX 4090 desktop GPU.
Depending on the percent of the model offloaded to GPU, users see increasing throughput performance compared with running on CPUs alone. For the Gemma 2 27B model, performance goes from an anemic 2.1 tokens per second to increasingly usable speeds the more the GPU is used. This enables users to benefit from the performance of larger models that they otherwise would've been unable to run. On this particular model, even users with an 8GB GPU can enjoy a meaningful speedup versus running only on CPUs. Of course, an 8GB GPU can always run a smaller model that fits entirely in GPU memory and get full GPU acceleration.
Achieving Optimal Balance LM Studio's GPU offloading feature is a powerful tool for unlocking the full potential of LLMs designed for the data center, like Gemma 2 27B, locally on RTX AI PCs. It makes larger, more complex models accessible across the entire lineup of PCs powered by GeForce RTX and NVIDIA RTX GPUs.
Download LM Studio to try GPU offloading on larger models, or experiment with a variety of RTX-accelerated LLMs running locally on RTX AI PCs and workstations.
Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what's new and what's next by subscribing to the AI Decoded newsletter.
Most recent headlines
16/12/2025
Hawkins has landed on Spotify, just in time for Stranger Things Season 5, Volume...
16/12/2025
Wherever you are, your favorite music and audio content should go seamlessly with you. That's why Spotify has partnered with NAVER Corp, Korea's leading...
16/12/2025
2025 Wrapped arrived bigger and bolder than ever. This year's experience is designed to be ultra personal and shareable, with new features like Wrapped Part...
16/12/2025
Three 12-kilowatt Advanced Electric Propulsion System thrusters, supplied by L3Harris Technologies, form the core of Gateway's propulsion system. Pictured i...
16/12/2025
The challenge facing America's defense industrial base is not just about speed - its about rebuilding the foundation that makes speed possible. Our nations ...
16/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
16/12/2025
SEVILLE, Spain Canal Sur, the public broadcasting service for Andalusia, Spain, has completed a total technology refresh based on Pebble's resilient, softwa...
16/12/2025
NEW YORK Teleprompting hardware provider Telescript International has acquired all software code and intellectual property previously owned by Telescript West. ...
16/12/2025
As cable operators face increased competition from 5G fixed wireless access providers, a new report from Ookla Research finds that T-Mobile is the FWA speed lea...
16/12/2025
Apple has announced a major upgrade to the Apple TV app for device owners outside the Apple ecosystem with news that the Apple TV app for Android now supports G...
16/12/2025
Space42 grows Direct-to-Device partner ecosystem through a Memorandum of Underst...
16/12/2025
16 Dec 2025
VEON Announces Release Date for Full Year and Fourth Quarter 2025 R...
16/12/2025
16 Dec 2025
VEON's Kyivstar Invests in Renewable Energy in Ukraine with Acq...
16/12/2025
Back to All News
Emma Appleton, Fares Fares, Frida Gustavsson and Jakob Oftebro...
16/12/2025
Back to All News
Docu-reality My Korean Boyfriend Gets a Trailer and Premiere D...
16/12/2025
Harmonic's XOS Advanced Media Processor Improves Streaming Video Quality and Boosts Viewer Engagement SAN JOSE, Calif. - Dec. 16, 2025 - Harmonic (NASDAQ: ...
16/12/2025
RT Sport Awards 2025 live on RT One and RT Player at 8:05pm on Saturday 20 December.
On Saturday 20 December live on RT One and RT Player at the earlier t...
16/12/2025
Singer -songwriter Brian Kennedy has been announced as the final celebrity dance...
15/12/2025
Harlem Globetrotters Celebrate 100th Anniversary With New Brand Campaign From Th...
15/12/2025
Top L-R: La Tierra Del Valor (The Home of the Brave), Mangittatuarjuk (The Gnawe...
15/12/2025
L3Harris will leverage 15 years of experience supporting the E-4B Nightwatch and...
15/12/2025
CONWAY, Ark. In a notable example of how the loss of federal funding is forcing public stations to make massive cuts and operational changes, the statewide pub...
15/12/2025
BOULDER, Colo. Public Media Venture Group (PMVG), Venture Technologies Group (VTG), and WQED have completed a multipart agreement that they say will significant...
15/12/2025
Cape Town, November 13, 2025 - SES and International artist and humanitarian, Fo...
15/12/2025
Luxembourg, December 15, 2025 - SES, a leading space solutions company, and Abra Group launched fast and reliable multi-orbit inflight connectivity service on t...
15/12/2025
How Rivian s Design Puts Drivers First-And Why That Matters Published on Dec 15, 2025 Categories: Business Solutions
LinkedIn Corporate Communications
Sha...
15/12/2025
Space42 and Cobham Satcom completed the full range of advanced terminals for the...
15/12/2025
15 Dec 2025
VEON's Beeline Kazakhstan Delivers First Starlink Direct to Cel...
15/12/2025
Andrew Mountbatten-Windsor finds himself the topic of year's cracker jokes
Oasis, David Harbour, Celebrity Traitors and Angela Rayner all feature in this y...
15/12/2025
Comscore Expands Cross-Platform Campaign Measurement to Include Audio and Social New capabilities strengthen cross-platform campaign reporting suite; CCR rebran...
15/12/2025
NVIDIA today announced it has acquired SchedMD - the leading developer of Slurm, an open-source workload management system for high-performance computing (HPC) ...
15/12/2025
RT .ie has reached one billion page views this year and is on track to finish 2025 2% ahead of last year. Average time spent on the site is up 3% on 2024, with ...
15/12/2025
Modern workflows showcase the endless possibilities of generative and agentic AI on PCs.
Of many, some examples include tuning a chatbot to handle product-supp...
13/12/2025
Powering Client Growth: Horizon Deepens Nielsen Partnership, Enabling More Effic...
13/12/2025
In a move that will help it offer more flexible and less costly programming options, YouTube TV has announced that it will be launching YouTube TV Plans with mo...
13/12/2025
SINGAPORE Magna Systems has designed, built and completed what is believed to be the first full UHD and IP-based OB truck in Southeast Asia for a Singapore medi...
12/12/2025
SVG Summit 2025 Preview: Everything You Need to Know for Next Week's Big Sho...
12/12/2025
Hailey Gates at the Atropia premiere (photo by George Pimentel / Shutterstock for Sundance Film Festival)...
12/12/2025
Last month, Spotify announced a new collaboration with the ATP Tour, the global governing body of men's professional tennis, aimed at bringing the next gene...
12/12/2025
CONWAY, Ark. In a notable example of how the elimination of Federal federal funding is forcing public stations to make massive cuts and changes in the way they...
12/12/2025
Wisycom and DPA Microphones announce the appointment of Ren Moerch as Group Product Director, Wireless, a strategic leadership role that will guide the combine...
12/12/2025
SMPTE , the home of media professionals, technologists, and engineers, in conjuncture with the European Broadcasting Union (EBU) and the Entertainment Technolog...
12/12/2025
Keepit, the vendor-independent, cloud-native data protection provider, today announced a strategic go-to-market relationship in Poland with Ingram Micro, a lead...
12/12/2025
Atomos announced the immediate availability of a new firmware update for its Ninja TX GO and Ninja TX monitor-recorders, unlocking Open Gate 48P RAW recording w...
12/12/2025
Professional Wireless Systems (PWS) once again played a critical role in delivering flawless wireless coordination and support at the 2025 Latin Grammy Awards a...
12/12/2025
The Alliance for IP Media Solutions (AIMS), together with the Video Services Forum (VSF), the Advanced Media Workflow Association (AMWA) and the European Broadc...
12/12/2025
DHD audio will demonstrate the latest additions to its range of digital audio production solutions on Booth 321 in Hall B6 at Hamburg Open 2026. The show will b...
12/12/2025
Chaos today announces the release of V-Ray for Blender, update 2, bringing its award-winning rendering technology to even more Blender users by adding support f...
12/12/2025
Lighting specialist UltraLEDs has launched Precision LED Tape, a high-CRI lighting solution designed specifically for professional film, TV, and studio use.
P...