
Generative AI-powered laptops and PCs are unlocking advancements in gaming, content creation, productivity and development. Today, over 600 Windows apps and games are already running AI locally on more than 100 million GeForce RTX AI PCs worldwide, delivering fast, reliable and low-latency performance.
At Microsoft Ignite, NVIDIA and Microsoft announced tools to help Windows developers quickly build and optimize AI-powered apps on RTX AI PCs, making local AI more accessible. These new tools enable application and game developers to harness powerful RTX GPUs to accelerate complex AI workflows for applications such as AI agents, app assistants and digital humans.
RTX AI PCs Power Digital Humans With Multimodal Small Language Models Meet James, an interactive digital human knowledgeable about NVIDIA and its products. James uses a collection of NVIDIA NIM microservices, NVIDIA ACE and ElevenLabs digital human technologies to provide natural and immersive responses. NVIDIA ACE is a suite of digital human technologies that brings life to agents, assistants and avatars. To achieve a higher level of understanding so that they can respond with greater context-awareness, digital humans must be able to visually perceive the world like humans do.
Enhancing digital human interactions with greater realism demands technology that enables perception and understanding of their surroundings with greater nuance. To achieve this, NVIDIA developed multimodal small language models that can process both text and imagery, excel in role-playing and are optimized for rapid response times.
The NVIDIA Nemovision-4B-Instruct model, soon to be available, uses the latest NVIDIA VILA and NVIDIA NeMo framework for distilling, pruning and quantizing to become small enough to perform on RTX GPUs with the accuracy developers need.
The model enables digital humans to understand visual imagery in the real world and on the screen to deliver relevant responses. Multimodality serves as the foundation for agentic workflows and offers a sneak peek into a future where digital humans can reason and take action with minimal assistance from a user.
NVIDIA is also introducing the Mistral NeMo Minitron 128k Instruct family, a suite of large-context small language models designed for optimized, efficient digital human interactions, coming soon. Available in 8B-, 4B- and 2B-parameter versions, these models offer flexible options for balancing speed, memory usage and accuracy on RTX AI PCs. They can handle large datasets in a single pass, eliminating the need for data segmentation and reassembly. Built in the GGUF format, these models enhance efficiency on low-power devices and support compatibility with multiple programming languages.
Turbocharge Gen AI With NVIDIA TensorRT Model Optimizer for Windows When bringing models to PC environments, developers face the challenge of limited memory and compute resources for running AI locally. And they want to make models available to as many people as possible, with minimal accuracy loss.
Today, NVIDIA announced updates to NVIDIA TensorRT Model Optimizer (ModelOpt) to offer Windows developers an improved way to optimize models for ONNX Runtime deployment.
With the latest updates, TensorRT ModelOpt enables models to be optimized into an ONNX checkpoint for deploying the model within ONNX runtime environments - using GPU execution providers such as CUDA, TensorRT and DirectML.
TensorRT-ModelOpt includes advanced quantization algorithms, such as INT4-Activation Aware Weight Quantization. Compared to other tools such as Olive, the new method reduces the memory footprint of the model and improves throughput performance on RTX GPUs.
During deployment, the models can have up to 2.6x reduced memory footprint compared to FP16 models. This results in faster throughput, with minimal accuracy degradation, allowing them to run on a wider range of PCs.
Learn more about how developers on Microsoft systems, from Windows RTX AI PCs to NVIDIA Blackwell-powered Azure servers, are transforming how users interact with AI on a daily basis.
More from Nvidia
13/09/2025
Celtic languages - including Cornish, Irish, Scottish Gaelic and Welsh - are the U.K.'s oldest living languages. To empower their speakers, the UK-LLM sover...
10/09/2025
GeForce NOW Blackwell RTX 5080-class SuperPODs are now rolling out, unlocking a new level of ultra high-performance, cinematic cloud gaming.
GeForce NOW Ultima...
09/09/2025
Inference has emerged as the new frontier of complexity in AI. Modern models are...
09/09/2025
As large language models (LLMs) grow larger, they get smarter, with open models from leading developers now featuring hundreds of billions of parameters. At the...
09/09/2025
At this week's AI Infrastructure Summit in Silicon Valley, NVIDIA's VP o...
09/09/2025
Inference performance is critical, as it directly influences the economics of an AI factory. The higher the throughput of AI factory infrastructure, the more to...
09/09/2025
At this week's IAA Mobility conference in Munich, NVIDIA Vice President of A...
09/09/2025
ComfyUI - an open-source, node-based graphical interface for running and buildin...
04/09/2025
NVIDIA today announced new AI education support for K-12 programs at a White House event to celebrate public-private partnerships that advance artificial intell...
04/09/2025
Editor's note: This post is part of the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copi...
04/09/2025
NVIDIA Blackwell RTX is coming to the cloud on Wednesday, Sept. 10 - an upgrade ...
03/09/2025
3D artists are constantly prototyping.
In traditional workflows, modelers must build placeholder, low-fidelity assets to populate 3D scenes, tinkering and adju...
02/09/2025
For more than a century, meteorologists have chased storms with chalkboards, equ...
28/08/2025
Brace yourself, COGs - the Locusts aren't the only thing rising up. The Coal...
28/08/2025
Last week at Gamescom, NVIDIA announced the winners of the NVIDIA and ModDB RTX ...
27/08/2025
AI models are advancing at a rapid rate and scale.
But what might they lack that (most) humans don't? Common sense: an understanding, developed through rea...
25/08/2025
Robots around the world are about to get a lot smarter as physical AI developers...
25/08/2025
As autonomous vehicle systems rapidly grow in complexity, equipped with reasonin...
22/08/2025
As the latest member of the NVIDIA Blackwell architecture family, the NVIDIA Blackwell Ultra GPU builds on core innovations to accelerate training and AI reason...
22/08/2025
AI reasoning, inference and networking will be top of mind for attendees of next...
21/08/2025
Japan is once again building a landmark high-performance computing system - not ...
21/08/2025
From AI assistants doing deep research to autonomous vehicles making split-second navigation decisions, AI adoption is exploding across industries.
Behind ever...
21/08/2025
Across the globe, AI factories are rising - massive new data centers built not to serve up web pages or email, but to train and deploy intelligence itself. Inte...
21/08/2025
Get a glimpse into the future of gaming.
The NVIDIA Blackwell RTX architecture is coming to GeForce NOW in September, marking the service's biggest upgrade...
20/08/2025
Editor's note: This blog is a part of Into the Omniverse, a series focused o...
18/08/2025
With over 175 games now supporting NVIDIA DLSS 4 - a suite of advanced, AI-power...
18/08/2025
At Gamescom, NVIDIA is releasing its first major update to Project G Assist - an...
15/08/2025
Of around 7,000 languages in the world, a tiny fraction are supported by AI lang...
14/08/2025
NVIDIA is partnering with the U.S. National Science Foundation (NSF) to create a...
14/08/2025
Warhammer 40,000: Dawn of War - Definitive Edition is marching onto GeForce NOW,...
13/08/2025
Black Forest Labs' FLUX.1 Kontext [dev] image editing model is now available as an NVIDIA NIM microservice.
FLUX.1 models allow users to edit existing imag...
11/08/2025
Using NVIDIA digital twin technologies, Amazon Devices & Services is powering bi...
11/08/2025
Packing the power of the NVIDIA Blackwell architecture in compact, energy-effici...
11/08/2025
Physical AI is becoming the foundation of smart cities, facilities and industria...
07/08/2025
This GFN Thursday brings an offer members can't refuse - 2K's highly ant...
05/08/2025
Two new open-weight AI reasoning models from OpenAI released today bring cutting...
05/08/2025
In collaboration with OpenAI, NVIDIA has optimized the company's new open-so...
05/08/2025
NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA D...
05/08/2025
NVIDIA GPUs are at the heart of modern computing. They're used across industries - from healthcare and finance to scientific research, autonomous systems an...
31/07/2025
August brings new levels of gaming excitement on GeForce NOW, with 2,300 titles now available to stream in the cloud.
Grab a controller and get ready for epic ...
31/07/2025
Interest in generative AI is continuing to grow, as new models include more capabilities. With the latest advancements, even enthusiasts without a developer bac...
29/07/2025
FourCastNet3 (FCN3) is the latest AI global weather forecasting system from NVID...
28/07/2025
The electrical grid is designed to support loads that are relatively steady, such as lighting, household appliances, and industrial machines that operate at con...
24/07/2025
For media company Black Mixture, AI isn't just a tool - it's an entire p...
24/07/2025
Sharpen the blade and brace for a journey steeped in myth and mystery. WUCHANG: Fallen Feathers has launched in the cloud.
Ride in style with skateboarding leg...
23/07/2025
In today's fast-evolving digital landscape, marketing teams face increasing ...
22/07/2025
Editor's note: This post is part of the AI On blog series, which explores th...
17/07/2025
Listen up citizens, the law is back and patrolling the cloud. Nacon's RoboCop Rogue City - Unfinished Business launches today in the cloud, bringing justice...
15/07/2025
Submissions for NVIDIA's Plug and Play: Project G-Assist Plug-In Hackathon a...