
Developers of generative AI typically face a tradeoff between model size and accuracy. But a new language model released by NVIDIA delivers the best of both, providing state-of-the-art accuracy in a compact form factor.
Mistral-NeMo-Minitron 8B - a miniaturized version of the open Mistral NeMo 12B model released by Mistral AI and NVIDIA last month - is small enough to run on an NVIDIA RTX-powered workstation while still excelling across multiple benchmarks for AI-powered chatbots, virtual assistants, content generators and educational tools. Minitron models are distilled by NVIDIA using NVIDIA NeMo, an end-to-end platform for developing custom generative AI.
We combined two different AI optimization methods - pruning to shrink Mistral NeMo's 12 billion parameters into 8 billion, and distillation to improve accuracy, said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA. By doing so, Mistral-NeMo-Minitron 8B delivers comparable accuracy to the original model at lower computational cost.
Unlike their larger counterparts, small language models can run in real time on workstations and laptops. This makes it easier for organizations with limited resources to deploy generative AI capabilities across their infrastructure while optimizing for cost, operational efficiency and energy use. Running language models locally on edge devices also delivers security benefits, since data doesn't need to be passed to a server from an edge device.
Developers can get started with Mistral-NeMo-Minitron 8B packaged as an NVIDIA NIM microservice with a standard application programming interface (API) - or they can download the model from Hugging Face. A downloadable NVIDIA NIM, which can be deployed on any GPU-accelerated system in minutes, will be available soon.
State-of-the-Art for 8 Billion Parameters For a model of its size, Mistral-NeMo-Minitron 8B leads on nine popular benchmarks for language models. These benchmarks cover a variety of tasks including language understanding, common sense reasoning, mathematical reasoning, summarization, coding and ability to generate truthful answers.
Packaged as an NVIDIA NIM microservice, the model is optimized for low latency, which means faster responses for users, and high throughput, which corresponds to higher computational efficiency in production.
In some cases, developers may want an even smaller version of the model to run on a smartphone or an embedded device like a robot. To do so, they can download the 8-billion-parameter model and, using NVIDIA AI Foundry, prune and distill it into a smaller, optimized neural network customized for enterprise-specific applications.
The AI Foundry platform and service offers developers a full-stack solution for creating a customized foundation model packaged as a NIM microservice. It includes popular foundation models, the NVIDIA NeMo platform and dedicated capacity on NVIDIA DGX Cloud. Developers using NVIDIA AI Foundry can also access NVIDIA AI Enterprise, a software platform that provides security, stability and support for production deployments.
Since the original Mistral-NeMo-Minitron 8B model starts with a baseline of state-of-the-art accuracy, versions downsized using AI Foundry would still offer users high accuracy with a fraction of the training data and compute infrastructure.
Harnessing the Perks of Pruning and Distillation To achieve high accuracy with a smaller model, the team used a process that combines pruning and distillation. Pruning downsizes a neural network by removing model weights that contribute the least to accuracy. During distillation, the team retrained this pruned model on a small dataset to significantly boost accuracy, which had decreased through the pruning process.
The end result is a smaller, more efficient model with the predictive accuracy of its larger counterpart.
This technique means that a fraction of the original dataset is required to train each additional model within a family of related models, saving up to 40x the compute cost when pruning and distilling a larger model compared to training a smaller model from scratch.
Read the NVIDIA Technical Blog and a technical report for details.
NVIDIA also announced this week Nemotron-Mini-4B-Instruct, another small language model optimized for low memory usage and faster response times on NVIDIA GeForce RTX AI PCs and laptops. The model is available as an NVIDIA NIM microservice for cloud and on-device deployment and is part of NVIDIA ACE, a suite of digital human technologies that provide speech, intelligence and animation powered by generative AI.
Experience both models as NIM microservices from a browser or an API at ai.nvidia.com.
See notice regarding software product information.
Most recent headlines
30/12/2025
As the College Football Playoff Enters the Quarterfinals, ESPN Blows Out Its Meg...
30/12/2025
SVG's Best of 2025: Original ArticlesTake a look back at all our coverage of big-time productions, game-changing technologies, and state-of-the-art new faci...
30/12/2025
MELBOURNE, Fla., Dec. 30, 2025 - L3Harris Technologies (NYSE: LHX) will release its fourth quarter 2025 financial results before the market opens on Thursday, J...
30/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
30/12/2025
It marked the first civilian operational authorization for a HAPS flight in Europe, led by Space42's subsidiary, Mira Aerospace
The flight demonstrated HAP...
29/12/2025
San Francisco 49ers Strike Gold With Halftime Laser SpectacularStunning display caps $200 million renovation of Levi's Stadium techBy Dan Daley, Audio Edito...
29/12/2025
The Cup's Around the Corner: An Inside Look at Broadcast Preparations for th...
29/12/2025
SVG's Best of 2025: Longform VideoWatch the standout keynote conversations, deep dives, and panel discussions from the year for free on SVG PLAY!By Brandon ...
29/12/2025
From crisper Lossless audio and immersive music videos in beta to new Audiobooks+ plans, custom transitions between tracks, and in-app Messages, we keep levelin...
29/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
27/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
26/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
26/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
25/12/2025
Holiday lights are twinkling, hot cocoa's on the stove and gamers are settling in for a well-earned break.
Whether staying in or heading on a winter getawa...
24/12/2025
What is AI good for? Posted by MTI Film on December 24, 2025
What is AI good for?
What is AI good for?
It's been three years since ChatGPT first cap...
24/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
24/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
24/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
24/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
24/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
24/12/2025
Back to All News
The Boyfriend' Season 2 Unveils Heartwarming Trailer, Key...
24/12/2025
Back to All News
Love, Fights, and Everything in Between: Badly in Love' Returns for Season 2
Entertainment
24 December 2025
GlobalJapan
Link copied t...
24/12/2025
Scripps Research study links sleep variability with sleep apnea and hypertension How consumers' digital activity trackers could enable personalized health s...
23/12/2025
How guilas Cibae as Dominican Winter League Games Are Locally Produced for Glob...
23/12/2025
BitFire's Jim Akimchuk on Supplying Scalability and Customization in the Clo...
23/12/2025
CAMB.AI Enables European Athletics to Offer Multi-Language SupportPlan is to eventually offer translation into all languages spoken in EuropeBy Ken Kerschbaumer...
23/12/2025
Analysis: As sports media values trend negative, scarcity and quality are king By Callum McCarthy, Editor-at-Large
Monday, December 22, 2025 - 14:08
Print ...
23/12/2025
ESPN, Disney, and NBA Return to the Animated Altcast Fray With Second Edition of...
23/12/2025
End the Year on a High Note and Donate to the Sports Broadcasting Fund Today!By Ken Kerschbaumer, Editorial Director
Tuesday, December 23, 2025 - 12:25 pm
P...
23/12/2025
The year is winding down, the weather outside is frightful, and it's the perfect time to escape into a story that warms the heart. For listeners looking for...
23/12/2025
A Zeus motor is hot fire tested at L3Harris' Camden, Arkansas, solid rocket ...
23/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
23/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
23/12/2025
Lightware will exhibit several major product innovations at ISE 2026, including the new USB-C BOOSTER-V1, Google Meet. integration for various Taurus UCX models...
23/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
23/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
23/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
23/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
23/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
23/12/2025
Taking the Stage at Carnegie Hall-On a Global Scale Boston Conservatory Orchestra students reflect on their epic concert marking the 80th session of the UN Gene...
23/12/2025
Back to All News
Netflix's The Great Flood and Culinary Class Wars 2 Top Gl...
23/12/2025
Back to All News
Stranger Things By the Numbers: How the Global Phenomenon Shap...
23/12/2025
Experience the power of WO Automation for Radio's newest service, the System Effectiveness Review. Designed to help you achieve more, a System Effectiveness...
23/12/2025
23 Dec 2025
VEON's Beeline Kazakhstan and Rakuten Symphony Collaborate to A...
23/12/2025
Back to All News
How Steamy Can It Get? Single's Inferno' Season 5 Pre...
23/12/2025
Back to All News
33 Million Global Viewers on Netflix Watched Jake Paul vs. Ant...
23/12/2025
New technique lights up where drugs go in the body, cell by cell Scripps Research scientists developed a technique that maps drug binding in individual cells th...
22/12/2025
SVG New Sponsor Spotlight: Presidio's Neerav Shah on the Role of Its Captiva...