
Developers of generative AI typically face a tradeoff between model size and accuracy. But a new language model released by NVIDIA delivers the best of both, providing state-of-the-art accuracy in a compact form factor.
Mistral-NeMo-Minitron 8B - a miniaturized version of the open Mistral NeMo 12B model released by Mistral AI and NVIDIA last month - is small enough to run on an NVIDIA RTX-powered workstation while still excelling across multiple benchmarks for AI-powered chatbots, virtual assistants, content generators and educational tools. Minitron models are distilled by NVIDIA using NVIDIA NeMo, an end-to-end platform for developing custom generative AI.
We combined two different AI optimization methods - pruning to shrink Mistral NeMo's 12 billion parameters into 8 billion, and distillation to improve accuracy, said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA. By doing so, Mistral-NeMo-Minitron 8B delivers comparable accuracy to the original model at lower computational cost.
Unlike their larger counterparts, small language models can run in real time on workstations and laptops. This makes it easier for organizations with limited resources to deploy generative AI capabilities across their infrastructure while optimizing for cost, operational efficiency and energy use. Running language models locally on edge devices also delivers security benefits, since data doesn't need to be passed to a server from an edge device.
Developers can get started with Mistral-NeMo-Minitron 8B packaged as an NVIDIA NIM microservice with a standard application programming interface (API) - or they can download the model from Hugging Face. A downloadable NVIDIA NIM, which can be deployed on any GPU-accelerated system in minutes, will be available soon.
State-of-the-Art for 8 Billion Parameters For a model of its size, Mistral-NeMo-Minitron 8B leads on nine popular benchmarks for language models. These benchmarks cover a variety of tasks including language understanding, common sense reasoning, mathematical reasoning, summarization, coding and ability to generate truthful answers.
Packaged as an NVIDIA NIM microservice, the model is optimized for low latency, which means faster responses for users, and high throughput, which corresponds to higher computational efficiency in production.
In some cases, developers may want an even smaller version of the model to run on a smartphone or an embedded device like a robot. To do so, they can download the 8-billion-parameter model and, using NVIDIA AI Foundry, prune and distill it into a smaller, optimized neural network customized for enterprise-specific applications.
The AI Foundry platform and service offers developers a full-stack solution for creating a customized foundation model packaged as a NIM microservice. It includes popular foundation models, the NVIDIA NeMo platform and dedicated capacity on NVIDIA DGX Cloud. Developers using NVIDIA AI Foundry can also access NVIDIA AI Enterprise, a software platform that provides security, stability and support for production deployments.
Since the original Mistral-NeMo-Minitron 8B model starts with a baseline of state-of-the-art accuracy, versions downsized using AI Foundry would still offer users high accuracy with a fraction of the training data and compute infrastructure.
Harnessing the Perks of Pruning and Distillation To achieve high accuracy with a smaller model, the team used a process that combines pruning and distillation. Pruning downsizes a neural network by removing model weights that contribute the least to accuracy. During distillation, the team retrained this pruned model on a small dataset to significantly boost accuracy, which had decreased through the pruning process.
The end result is a smaller, more efficient model with the predictive accuracy of its larger counterpart.
This technique means that a fraction of the original dataset is required to train each additional model within a family of related models, saving up to 40x the compute cost when pruning and distilling a larger model compared to training a smaller model from scratch.
Read the NVIDIA Technical Blog and a technical report for details.
NVIDIA also announced this week Nemotron-Mini-4B-Instruct, another small language model optimized for low memory usage and faster response times on NVIDIA GeForce RTX AI PCs and laptops. The model is available as an NVIDIA NIM microservice for cloud and on-device deployment and is part of NVIDIA ACE, a suite of digital human technologies that provide speech, intelligence and animation powered by generative AI.
Experience both models as NIM microservices from a browser or an API at ai.nvidia.com.
See notice regarding software product information.
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
06/11/2025
MELVILLE, N.Y. Canon USA has launched the CR-N400 and CR-N350, two new Pan-Tilt-Zoom (PTZ) cameras designed to deliver high image quality, versatile connectivit...
06/11/2025
NEW YORK Fox Sports said its coverage of the Los Angeles Dodgers' win over the Toronto Blue Jays in the decisive Game 7 of the 2025 World Series delivered 2...
06/11/2025
NEW YORK DoubleVerify and Roku, Inc. are reporting that they have seen a marked reduction in fraudulent ad requests imitating Roku device traffic across the s...
06/11/2025
CAMBRIDGE, England Faced with rising inflation and worries that the economy is weakening, consumers are prioritizing their spending on popular streaming service...
06/11/2025
STAMFORD, Conn. NBC Sports is mounting mics in unconventional places to bring the sounds of NASCAR into the living rooms of sports fans....
06/11/2025
BRANSON, Mo. Link Electronics has partnered with Aberdeen Broadcast Services to provide a real-time, dual-stream translation and captioning service....
06/11/2025
In the third episode of The 2 Johnnies Late Night Lock In actor Danny O'Carr...
06/11/2025
NVIDIA founder and CEO Jensen Huang and chief scientist Bill Dally were honored ...
06/11/2025
A crisp chill's in the air - and so is the action. GeForce NOW is packing November with 23 games hitting the cloud, including the launch of the highly antic...
06/11/2025
Scripps Research team identifies sugar molecules that trigger placental formation Study reveals how sugar-protein interactions are critical for the placenta dur...
05/11/2025
College Hoops Preview 2025: ESPN Remote-Ops Team Preps for Massive Slate of Men&...
05/11/2025
University of Iowa Centralizes Video Production With Dual Control Rooms at Carve...
05/11/2025
On Monday night, Ed Sheeran and Spotify lit up The Royal Dublin Society in Dublin for a one-night-only performance. The occasion? The third installment of Billi...
05/11/2025
Cumbia has long been woven into daily life in Argentina, and its popularity on S...
05/11/2025
La cumbia forma parte del d a a d a de los argentinos desde siempre, y su popula...
05/11/2025
Earlier this year, our in-house publishing imprint, Spotify Audiobooks, put out ...
05/11/2025
SBS showcases next generation of pro cyclists in extended broadcast deal with Pr...
05/11/2025
Photo credit: CIVMEC...
05/11/2025
A Nielsen survey of 500 directors and millions of monday.com workflows reveals t...
05/11/2025
Prime Video's VoD offerings included for the first time in standardized AGF Measurement
Frankfurt, October 28, 2025. AGF Videoforschung, together with Ama...
05/11/2025
A 13-year-old company that serves as a master control/disaster recovery hub for PBS stations says it has seen little financial impact from cuts to public broadc...
05/11/2025
To no one's surprise, the NFL's most dominant franchise in recent years attracts the most TV viewers, according to new research by S&P Global Market Int...
05/11/2025
MELBOURNE, Australia Atomos today introduced Ninja TX GO, a new HDMI monitor-recorder that combines a brighter screen, advanced monitoring tools, professional c...
05/11/2025
WASHINGTON Despite the ongoing government shutdown, Federal Communications Commission Chairman Brendan Carr has announced a tentative agenda for the agency'...
05/11/2025
The College Football Playoff (CFP), ESPN and TNT Sports have announced kick times and broadcast information for the 2025 CFP First Round, which will launch the ...
05/11/2025
NEW YORK IAB Tech Lab, the global digital advertising technical standards-setting body, has announced the release of device attestation support in the industry ...
05/11/2025
SINGAPORE Appear, an Oslo-based provider of live production technology, is opening a new facility in Singapore as part of the company's expansion into the A...
05/11/2025
DALLAS Parks Associates has released new data showing just how far the dramatic shift to streaming services has gone in recent years. Currently, more than nine ...
05/11/2025
Supports existing services while powering 16 new products for government and ent...
05/11/2025
Wednesday 5 November 2025
To view this content, please enable our use of cookie...
05/11/2025
Wednesday 5 November 2025
To view this content, please enable our use of cookie...
05/11/2025
Rohde & Schwarz Mobile Test Summit 2025 on the future of wireless communications...
05/11/2025
Wuppertal November 5, 2025
Riedel RefCam and Easy5G to Make Handball Debut at the Men's EHF EURO 2026The European Handball Federation (EHF) will introduce...
05/11/2025
Back to All News
Netflix's Third Season of Ads and a Look Ahead at Whats Next
Amy Reinhard
President, Advertising
Business
05 November 2025
United Sta...
05/11/2025
Comscore and Polaris I/O Partner to Automate Audience Insights in MarketView for...
05/11/2025
New schedule will be live on-air Monday 10 November
Brand-new Today with David McCullagh from 9am
Oliver Callan in all-new extended show from 11am to 1pm
Kie...
05/11/2025
Explore the future with Science Week on RT
Dive into a week of innovative, themed programming and content across RT television, radio and online
Includes a ...
05/11/2025
Get ready for six weeks of United FC, a brand-new, feel-good teen docuseries kic...
04/11/2025
SVG Sit-Down: Why Professional Fight League CEO John Martin Believes Growth Is I...
04/11/2025
SVG All-Stars: David Koppett, Executive Producer, Live Sports and Studio, NESN a...
04/11/2025
From concept to kick-off: How TAMS could transform sports workflows By Paul Markham
Tuesday, October 28, 2025 - 09:43
Print This Story
Techex tx darwin pr...
04/11/2025
College Hoops Preview 2025: The CW Tips Off Third Season of ACC Men's/Women&...
04/11/2025
College Hoops Preview 2025: Big Ten Network Heats Up for Busy Season With 500 Me...
04/11/2025
College Hoops Preview 2025: CBS Sports Readies 300+ Game Broadcasts Across Its P...
04/11/2025
College Hoops Preview 2025: NBC Sports Slate Features 200+ Big Ten, BIG EAST, an...
04/11/2025
College Hoops Preview 2025: ESPN Remote-Ops Team Preps for Massive Slate of 7,40...
04/11/2025
Never-before-seen footage of Selena Quintanilla and her family's band offers...
04/11/2025
Joel Edgerton at Train Dreams Park City premiere (photo by Soul Brother / Shutterstock for Sundance Film Festival)...
04/11/2025
Today, we announced our third quarter 2025 earnings, marking strong momentum as we surpassed 700 million Monthly Active Users and achieved double-digit subscrib...