
As of today, NVIDIA now supports the general availability of Gemma 3n on NVIDIA RTX and Jetson. Gemma, previewed by Google DeepMind at Google I/O last month, includes two new models optimized for multi-modal on-device deployment.
Gemma now includes audio in addition to the text and vision capabilities introduced in version 3.5. Each component integrates trusted research models: Universal Speech Model for audio, MobileNet v4 for vision, and MatFormer for text.
The biggest usage advancement is an innovation called Per-Lay Embeddings. It allows for significant reduction in RAM usage for parameters. The Gemma 3n E4B model has a raw parameter count of 8B parameters but can operate using a dynamic memory footprint that's comparable to a 4B model. This enables developers to use a higher quality model within a resource-constrained environment.
Model name Raw Parameters Input Context Length Output Context Length Size on Disk
E2B 5B 32K 32K subtracting request input 1.55GB
E4B 8B 32K 32K subtracting request input 2.82BB
Table 1: Gemma 3n model components for both the E2B and E4B model Powering robotics and edge AI with Jetson The Gemma family of models works well on NVIDIA Jetson devices that are geared at powering edge applications, such as next-generation robotics. The lightweight architecture and, now, dynamic memory usage fit in resource-constrained environments.
Jetson developers can participate in the Gemma 3n Impact Challenge hosted on Kaggle. The aim is to use this technology to create meaningful, positive change in the world in areas such as accessibility, education, healthcare, environmental sustainability, and crisis response. Several cash prizes, which start at $10,000, are available for submissions for overall placement and for using different technologies suited for on-device deployment, such as Jetson.
To get started, check out the live text and image demo from the Gemma 3 Developer Day in April and the GitHub repository for deploying Gemma locally using Ollama.
NVIDIA RTX for Windows developers and AI enthusiasts With NVIDIA RTX AI PCs, developers can easily deploy Gemma 3n models using Ollama. AI enthusiasts can use Gemma 3n models with RTX accelerations in their favorite apps like AnythingLLM and LM Studio.
Developers can deploy Gemma 3n locally to both RTX and Jetson devices with a few simple instructions using the Ollama CLI:
Download and install Ollama for Windows
Open a terminal window and complete the following commands:
ollama pull gemma3n:e4b ollama run gemma3n:e4b Summarize Shakespeare's Hamlet
NVIDIA collaborates with Ollama to provide performance optimizations for NVIDIA RTX GPUs, accelerating the latest models like Gemma 3n. For this model, Ollama leverages the Ollama engine in the backend, which builds upon the GGML library. Learn more about NVIDIA's contributions to the GGML library for maximum performance on NVIDIA RTX GPUs.
Customize Gemma for your data with the open NVIDIA NeMo Framework Developers can use the Gemma 3n models from Hugging Face with the open source NVIDIA NeMo Framework. It provides a comprehensive framework for post-training Llama models to achieve higher accuracy, specifically through fine-tuning with enterprise-specific data. The workflow within NeMo is designed to be end-to-end, covering data preparation, efficient fine-tuning, and model evaluation.
data-src=https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-png.webp alt=A diagram showing the workflow of NeMo Framework. It provides end-to-end support for developing large language models (LLMs) and multimodal models (MMs). class=lazyload wp-image-102645 data-srcset=https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-png.webp 1600w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-300x169-png.webp 300w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-625x352-png.webp 625w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-179x101-png.webp 179w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-768x432-png.webp 768w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-1536x864-png.webp 1536w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-645x363-png.webp 645w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-660x370-png.webp 660w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-500x281-png.webp 500w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-160x90-png.webp 160w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-362x204-png.webp 362w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-196x110-png.webp 196w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-1024x576-png.webp 1024w, https://developer-blogs.nvidia.com/wp-content/uploads/2025/06/powerade-fig-1-960x540-png.webp 960w data-sizes=(max-width: 1600px) 100vw, 1600px />
Figure 1. NeMo Framework provides end-to-end support for large language models and multimodal models.
The workflow includes:
Data curation (NeMo Curator): Curator prepares high-quality datasets for either pretraining or fine-tuning by offering tools to extract, filter, and deduplicate large volumes of structured and unstructured data. It ensures the quality of the input data for the model.
Fine-tuning (NeMo): Once the data is curated, NeMo enables efficient fine-tuning of Llama models. It supports various techniques to optimize this process, including LoRA (Low-Rank Adaptation), PEFT (Parameter-Efficient Fine-Tuning), and full parameter tuning for comprehensive customization.
Model evaluation (NeMo Evaluator): After fine-tuning, NeMo Evaluator is used
More from Nvidia
28/08/2025
Brace yourself, COGs - the Locusts aren't the only thing rising up. The Coal...
28/08/2025
Last week at Gamescom, NVIDIA announced the winners of the NVIDIA and ModDB RTX ...
27/08/2025
AI models are advancing at a rapid rate and scale.
But what might they lack that (most) humans don't? Common sense: an understanding, developed through rea...
25/08/2025
Robots around the world are about to get a lot smarter as physical AI developers...
25/08/2025
As autonomous vehicle systems rapidly grow in complexity, equipped with reasonin...
22/08/2025
As the latest member of the NVIDIA Blackwell architecture family, the NVIDIA Blackwell Ultra GPU builds on core innovations to accelerate training and AI reason...
22/08/2025
AI reasoning, inference and networking will be top of mind for attendees of next...
21/08/2025
Japan is once again building a landmark high-performance computing system - not ...
21/08/2025
From AI assistants doing deep research to autonomous vehicles making split-second navigation decisions, AI adoption is exploding across industries.
Behind ever...
21/08/2025
Across the globe, AI factories are rising - massive new data centers built not to serve up web pages or email, but to train and deploy intelligence itself. Inte...
21/08/2025
Get a glimpse into the future of gaming.
The NVIDIA Blackwell RTX architecture is coming to GeForce NOW in September, marking the service's biggest upgrade...
20/08/2025
Editor's note: This blog is a part of Into the Omniverse, a series focused o...
18/08/2025
With over 175 games now supporting NVIDIA DLSS 4 - a suite of advanced, AI-power...
18/08/2025
At Gamescom, NVIDIA is releasing its first major update to Project G Assist - an...
15/08/2025
Of around 7,000 languages in the world, a tiny fraction are supported by AI lang...
14/08/2025
NVIDIA is partnering with the U.S. National Science Foundation (NSF) to create a...
14/08/2025
Warhammer 40,000: Dawn of War - Definitive Edition is marching onto GeForce NOW,...
13/08/2025
Black Forest Labs' FLUX.1 Kontext [dev] image editing model is now available as an NVIDIA NIM microservice.
FLUX.1 models allow users to edit existing imag...
11/08/2025
Using NVIDIA digital twin technologies, Amazon Devices & Services is powering bi...
11/08/2025
Packing the power of the NVIDIA Blackwell architecture in compact, energy-effici...
11/08/2025
Physical AI is becoming the foundation of smart cities, facilities and industria...
07/08/2025
This GFN Thursday brings an offer members can't refuse - 2K's highly ant...
05/08/2025
Two new open-weight AI reasoning models from OpenAI released today bring cutting...
05/08/2025
In collaboration with OpenAI, NVIDIA has optimized the company's new open-so...
05/08/2025
NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA D...
05/08/2025
NVIDIA GPUs are at the heart of modern computing. They're used across industries - from healthcare and finance to scientific research, autonomous systems an...
31/07/2025
August brings new levels of gaming excitement on GeForce NOW, with 2,300 titles now available to stream in the cloud.
Grab a controller and get ready for epic ...
31/07/2025
Interest in generative AI is continuing to grow, as new models include more capabilities. With the latest advancements, even enthusiasts without a developer bac...
29/07/2025
FourCastNet3 (FCN3) is the latest AI global weather forecasting system from NVID...
28/07/2025
The electrical grid is designed to support loads that are relatively steady, such as lighting, household appliances, and industrial machines that operate at con...
24/07/2025
For media company Black Mixture, AI isn't just a tool - it's an entire p...
24/07/2025
Sharpen the blade and brace for a journey steeped in myth and mystery. WUCHANG: Fallen Feathers has launched in the cloud.
Ride in style with skateboarding leg...
23/07/2025
In today's fast-evolving digital landscape, marketing teams face increasing ...
22/07/2025
Editor's note: This post is part of the AI On blog series, which explores th...
17/07/2025
Listen up citizens, the law is back and patrolling the cloud. Nacon's RoboCop Rogue City - Unfinished Business launches today in the cloud, bringing justice...
15/07/2025
Submissions for NVIDIA's Plug and Play: Project G-Assist Plug-In Hackathon a...
14/07/2025
This month, NVIDIA founder and CEO Jensen Huang promoted AI in both Washington, D.C. and Beijing - emphasizing the benefits that AI will bring to business and s...
11/07/2025
Ceramics - the humble mix of earth, fire and artistry - have been part of a global conversation for millennia.
From Tang Dynasty trade routes to Renaissance pa...
10/07/2025
In the race to understand our planet's changing climate, speed and accuracy are everything. But today's most widely used climate simulators often strugg...
10/07/2025
As one of the world's largest emerging markets, Indonesia is making strides toward its Golden 2045 Vision - an initiative tapping digital technologies and...
10/07/2025
Grab a friend and climb toward the clouds - PEAK is now available on GeForce NOW, enabling members to try the hugely popular indie hit on virtually any device.
...
10/07/2025
Coding assistants or copilots - AI-powered assistants that can suggest, explain and debug code - are fundamentally changing how software is developed for both e...
08/07/2025
Modern AI applications increasingly rely on models that combine huge parameter c...
03/07/2025
The forecast this month is showing a 100% chance of epic gaming. Catch the scorching lineup of 20 titles coming to the cloud, which gamers can play whether indo...
02/07/2025
Black Forest Labs, one of the world's leading AI research labs, just changed the game for image generation.
The lab's FLUX.1 image models have earned g...
01/07/2025
In many parts of the world, including major technology hubs in the U.S., there's a yearslong wait for AI factories to come online, pending the buildout of n...
26/06/2025
As of today, NVIDIA now supports the general availability of Gemma 3n on NVIDIA RTX and Jetson. Gemma, previewed by Google DeepMind at Google I/O last month, in...
26/06/2025
Editor's note: This blog is a part of Into the Omniverse, a series focused o...
26/06/2025
Mark Theriault founded the startup FITY envisioning a line of clever cooling products: cold drink holders that come with freezable pucks to keep beverages cold ...