
Developers of generative AI typically face a tradeoff between model size and accuracy. But a new language model released by NVIDIA delivers the best of both, providing state-of-the-art accuracy in a compact form factor.
Mistral-NeMo-Minitron 8B - a miniaturized version of the open Mistral NeMo 12B model released by Mistral AI and NVIDIA last month - is small enough to run on an NVIDIA RTX-powered workstation while still excelling across multiple benchmarks for AI-powered chatbots, virtual assistants, content generators and educational tools. Minitron models are distilled by NVIDIA using NVIDIA NeMo, an end-to-end platform for developing custom generative AI.
We combined two different AI optimization methods - pruning to shrink Mistral NeMo's 12 billion parameters into 8 billion, and distillation to improve accuracy, said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA. By doing so, Mistral-NeMo-Minitron 8B delivers comparable accuracy to the original model at lower computational cost.
Unlike their larger counterparts, small language models can run in real time on workstations and laptops. This makes it easier for organizations with limited resources to deploy generative AI capabilities across their infrastructure while optimizing for cost, operational efficiency and energy use. Running language models locally on edge devices also delivers security benefits, since data doesn't need to be passed to a server from an edge device.
Developers can get started with Mistral-NeMo-Minitron 8B packaged as an NVIDIA NIM microservice with a standard application programming interface (API) - or they can download the model from Hugging Face. A downloadable NVIDIA NIM, which can be deployed on any GPU-accelerated system in minutes, will be available soon.
State-of-the-Art for 8 Billion Parameters For a model of its size, Mistral-NeMo-Minitron 8B leads on nine popular benchmarks for language models. These benchmarks cover a variety of tasks including language understanding, common sense reasoning, mathematical reasoning, summarization, coding and ability to generate truthful answers.
Packaged as an NVIDIA NIM microservice, the model is optimized for low latency, which means faster responses for users, and high throughput, which corresponds to higher computational efficiency in production.
In some cases, developers may want an even smaller version of the model to run on a smartphone or an embedded device like a robot. To do so, they can download the 8-billion-parameter model and, using NVIDIA AI Foundry, prune and distill it into a smaller, optimized neural network customized for enterprise-specific applications.
The AI Foundry platform and service offers developers a full-stack solution for creating a customized foundation model packaged as a NIM microservice. It includes popular foundation models, the NVIDIA NeMo platform and dedicated capacity on NVIDIA DGX Cloud. Developers using NVIDIA AI Foundry can also access NVIDIA AI Enterprise, a software platform that provides security, stability and support for production deployments.
Since the original Mistral-NeMo-Minitron 8B model starts with a baseline of state-of-the-art accuracy, versions downsized using AI Foundry would still offer users high accuracy with a fraction of the training data and compute infrastructure.
Harnessing the Perks of Pruning and Distillation To achieve high accuracy with a smaller model, the team used a process that combines pruning and distillation. Pruning downsizes a neural network by removing model weights that contribute the least to accuracy. During distillation, the team retrained this pruned model on a small dataset to significantly boost accuracy, which had decreased through the pruning process.
The end result is a smaller, more efficient model with the predictive accuracy of its larger counterpart.
This technique means that a fraction of the original dataset is required to train each additional model within a family of related models, saving up to 40x the compute cost when pruning and distilling a larger model compared to training a smaller model from scratch.
Read the NVIDIA Technical Blog and a technical report for details.
NVIDIA also announced this week Nemotron-Mini-4B-Instruct, another small language model optimized for low memory usage and faster response times on NVIDIA GeForce RTX AI PCs and laptops. The model is available as an NVIDIA NIM microservice for cloud and on-device deployment and is part of NVIDIA ACE, a suite of digital human technologies that provide speech, intelligence and animation powered by generative AI.
Experience both models as NIM microservices from a browser or an API at ai.nvidia.com.
See notice regarding software product information.
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
02/05/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
01/05/2026
January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...
01/04/2026
January 4 2026, 18:00 (PST) DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION
Douyin Users Can Now Create And Share Videos With Stun...
16/03/2026
NEW YORK - March 16, 2026 - A E Global Media and Nielsen today announced a new,...
16/03/2026
aconnic AG (ISIN: DE000A0LBKW6), Munich, is delivering the first commercial 100-Gigabit systems following successful validation and certification for customer n...
16/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/03/2026
Shotoku USA, Shotoku Broadcast Systems' North American operation, will unveil significant additions to its platform at NAB 2026. Topping the list is the wor...
16/03/2026
Ikegami USA will demonstrate the latest additions to its wide range of broadcast-quality cameras, controllers and monitors on Central Hall booth C3819 during th...
16/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/03/2026
ELEMENTS launches GRID at NAB Show 2026
Brie Clayton March 15, 2026
0 Comments
North Hall, Booth N1717
ELEMENTS returns to NAB Show 2026, with an exp...
16/03/2026
Blackmagic Design Cameras Capture Artist Salavat Fidai's Micro Sculptures
Brie Clayton March 15, 2026
0 Comments
6K sensor and open gate capabilit...
16/03/2026
DHD to Introduce Latest Generation Broadcast Audio Mixers at NAB 2026, Las Vegas
Brie Clayton March 15, 2026
0 Comments
Hero image: Front of DHD RM1 P...
16/03/2026
The reporting option was introduced following extensive consultation with publis...
16/03/2026
After a nail-biting Grand Finale, Rose of Tralee Katelyn Cummins has been announced as the winner of Dancing with the Stars 2026.
The four finalists each dance...
16/03/2026
Welcome to Moore Street will begin on RT One and RT Player on Thursday 19 Marc...
15/03/2026
Visit ToolsOnAir at NAB Las Vegas 2026
More Details:From April 19-22, join us at NAB Show Las Vegas in the North Hall, Booth N1258, for an exclusive preview of...
15/03/2026
Latest dark drama, thrillers & tension library announced
The Very Loud Indeed Co.'s latest Kontakt library has just arrived, delivering a third instalme...
15/03/2026
Johannesburg, 14 March 2026 - On 13 and 14 March 2026, the 19th Annual South Afr...
15/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/03/2026
Combines mic, USB interface & wireless IEMs
Following a successful Kickstarter campaign, HISONG have announced that their innovative AirStudio S1 device is ...
14/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/03/2026
Yospace, the trusted leader in Dynamic Ad Insertion (DAI), stitched 5.4 billion one-to-one addressable OTT advertisements across the 17 days of Milano Cortina 2...
14/03/2026
Telestream Advances Production-Ready AI Across Its Product Portfolio
Brie Clayton March 13, 2026
0 Comments
New AI capabilities drive smarter automati...
14/03/2026
Kraken Graded in DaVinci Resolve Studio
Brie Clayton March 13, 2026
0 Comments
Senior Colorist Dylan Hopkin delivers the first Scandinavian feature in...
14/03/2026
Tedial Powers the Future of Media Operations at NAB Show 2026
Brie Clayton March 13, 2026
0 Comments
Transforming Media Through Intelligence, Flexibil...
13/03/2026
Recently named CEO Andreas Eriksson has taken the helm at Net Insight at a pivot...
13/03/2026
Scripps Sports and Ally Financial are partnering with the Professional Women's Hockey League (PWHL) to broadcast its first game on national linear televisio...
13/03/2026
Disney+ has launched Verts, a vertical video feed on its U.S. mobile app, markin...
13/03/2026
LTN, a managed IP video transport company, and Appear, a live production technol...
13/03/2026
The Professional Fighters League (PFL) has announced an agreement with Sportradar for global betting data and streaming rights. Under the deal, Sportradar becom...
13/03/2026
In-venue and creative video staffers at the professional and collegiate level ha...
13/03/2026
The streamer will be the first entertainment platform to offer AI-enabled vertical video for live games, starting with the NBA...
13/03/2026
Ease Live, an Evertz company specializing in interactive graphical overlays, has deployed its platform on Red Bull TV for Premier Padel coverage. The deployment...
13/03/2026
Monday Night Football, ESPN's premiere NFL property, has continued to be improved and upgraded from a production perspective. Alternative broadcasts are aug...
13/03/2026
At NAB Show 2026, Net Insight (booth W1653) is introducing the Nimbra 520, a high-density media processing node for live contribution and distribution across ma...
13/03/2026
Harmonic (booth W2831) has announced Spectrum X Plus, the newest generation of its Spectrum X media server, offering double the channel density of previous gene...
13/03/2026
Riedel Communications has announced the expansion of its Managed Technology Divi...
13/03/2026
Telestream (booth W1503) has announced the expansion of Telestream Cloud Services with the introduction of UP, a cloud-native solution for ingest, orchestration...
13/03/2026
From awards ceremonies and sports honors shows to festivals and fan conventions,...
13/03/2026
Overtime has announced a partnership with Metro by T-Mobile, naming Metro the Of...
13/03/2026
At NAB 2026, Calrec (booth C6907) will IP Ecosystem powered by True Control 2.0, integrating the company's IP-native Argo consoles - including the U.S. debu...
13/03/2026
Ratings Roundup is a rundown of recent ratings news derived from press releases ...
13/03/2026
Spotify has always been built around your taste. More than 80% of listeners say personalization is what they love most about us. Now we're taking that even ...
13/03/2026
The new Spotify Legends Club has opened its doors. Its members: select German-sp...
13/03/2026
Pushing drum sampler technology into new territories
The latest version of Klevgrand's software drum sampler has just arrived, boasting a newly designe...