
NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.
More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers - one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles - are finalists for CVPR's Best Paper Awards.
NVIDIA is also the winner of the CVPR Autonomous Grand Challenge's End-to-End Driving at Scale track - a significant milestone that demonstrates the company's use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR's Innovation Award.
NVIDIA's research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.
Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.
Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement, said Jan Kautz, vice president of learning and perception research at NVIDIA. At CVPR, NVIDIA Research is sharing how we're pushing the boundaries of what's possible - from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.
At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.
Forget Fine-Tuning: JeDi Simplifies Custom Image Generation Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind - they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.
Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning - where a user trains the model on a custom dataset - but the process can be time-consuming and inaccessible for general users.
JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.
JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand's product catalog.
https://blogs.nvidia.com/wp-content/uploads/2024/06/JeDi-cow-sculpture.mp4
New Foundation Model Perfects the Pose NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.
The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.
FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.
NeRFDeformer Transforms 3D Scenes With a Single Snapshot A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed - or remake the NeRF entirely.
Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.
VILA Visual Language Model Gets the Picture A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.
The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA's unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.
VILA can understand memes and reason based on multiple images or video frames. The VILA model fa
North America Stories
14/04/2026
Haivision has announced the Falkon X4, a 5G mobile video transmitter for live broadcast and remote production. The device will be showcased at NAB Show 2026.
...
14/04/2026
Founded by veterans of the production-truck world, Stripe TV was born out of a d...
14/04/2026
Grass Valley has expanded its partnership with Studio Berlin, supplying 12 LDX 135 UHD/HDR camera systems and 12 LDX 180 Super 35mm cinematic cameras, including...
14/04/2026
LTN has announced two senior appointments ahead of NAB Show 2026: Mark Romano as Vice President, Multichannel Platforms, and Edward Cox as Vice President, Sales...
14/04/2026
Sennheiser and Italian rental company Agor have begun preparations for their Eu...
14/04/2026
DAZN and ADI Predictstreet have announced a partnership to integrate ADI Predict...
14/04/2026
Deltatre has announced the completion of a direct-to-fan website and app for Leg...
14/04/2026
Daktronics, will exhibit at NAB Show 2026 in partnership with Grass Valley at Booth C2408, Las Vegas Convention Center, April 19-22.
Daktronics will showcase a...
14/04/2026
Backblaze will host a series of partner presentations at NAB Show 2026 (Booth N1259, North Hall), open to all exhibition hall badge holders. Showcase partners i...
14/04/2026
BRAHMA AI will exhibit at NAB Show 2026 (Booth W2415, West Hall), demonstrating its Enterprise AI Content Platform for media, entertainment, sports, and technol...
14/04/2026
Bridge Technologies has announced that its VB440 production probe has been integrated into NEP Platform, NEP Group's software orchestration system. The inte...
14/04/2026
Clear-Com will introduce new products and demonstrate its communications ecosystem for broadcast and live production at NAB Show 2026 (Booth C5807, April 19-22)...
14/04/2026
EZDRM will use NAB Show 2026 (Booth W2260) to introduce the Revenue Security Prism, a framework for combining multiple security technologies to protect streamin...
14/04/2026
Telos Alliance will introduce the following products at NAB Show 2026:
TV and Media
Telos Infinity MK2 Panels: A new generation of hardware control panels for...
14/04/2026
Panasonic Video and Audio Systems North America will exhibit at NAB Show 2026 (Booth C3509), demonstrating IP- and IT-based production solutions for sports, bro...
14/04/2026
Behind The Mic provides a roundup of recent news regarding on-air talent, includ...
14/04/2026
Blackmagic Design will exhibit at NAB Show 2026 (Booth N2502). CEO Grant Petty s product announcement livestream is available at youtube.com. The company announ...
14/04/2026
The deal establishes a close strategic alignment' between the iconic cinema...
14/04/2026
BIGSHOT Media, with support from Teradek, produced untamed African mountain-bike race' with daily coverage and highlights show
The 2026 Absa Cape EPIC, an...
14/04/2026
The Sports Video Group (SVG) today announced the launch of the SVG AI Innovation...
14/04/2026
WESCAM MX -Series EO/IR Systems Provide Critical Visibility During NASA Re-Entry Operations....
14/04/2026
DiSCO reduces uncertainty during the fog of war in modern threat environments an...
14/04/2026
GOES-19 imagery captured by L3Harris-built Advanced Baseline Imager technology helps forecasters issue life-saving weather warnings. Credit: NOAA/NASA...
14/04/2026
For decades, understanding the physical world from space has required a trade-of...
14/04/2026
Ruiz has helped fuel growth for both client-side and media companies, including, most recently, TelevisaUnivision.
He will spearhead transformation, innovatio...
14/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/04/2026
Shotoku Introduces the World to Aura P2 PTZ Prompter Panner at NAB 2026
New system removes PTZ pan restrictions for teleprompter-based productions
Shotoku US...
14/04/2026
Integration of Vantage, Pulse, and Telestream UP with Grass Valley AMPP Ecosystem enables scalable, interoperable workflows spanning live production and file-ba...
14/04/2026
Enterprise growth leader to scale Dalet's next phase of innovation and global expansion
New York, NY April 14, 2026 Dalet, a leading technology and ser...
14/04/2026
Berklee Celebrates Prince's Legacy in Two-Night Signature Series Event Directed by Tia Fuller, the Prince Project (April 16-17) brings together more than ...
14/04/2026
Arooj Aftab Is Anything but Predictable The singular artist explores the juxtaposition of grief and joy, dark and light, in her distinctive sound.
April 14, ...
14/04/2026
Appear launches include XM estate management and new X Platform processing enhancements to add density for next-generation hybrid & IP workflows, X5 is also now...
14/04/2026
Addressing the needs of a new generation's viewing habits, Synamedia launches GO Shorts. The AI-powered module turns existing catalogues into TikTok-style ...
14/04/2026
Vubiquity, an Amdocs company and global leader in technology-led media services, will be showcasing a new end-to-end streaming solution in collaboration with El...
14/04/2026
LiveU today announced a significant expansion of its collaboration with Sony Corporation, introducing integrated support for Sony's file-based workflow solu...
14/04/2026
Open Broadcast Systems (https://www.obe.tv/) has announced that BBC World Service has selected its decoders for IP Television distribution. The high-quality, lo...
14/04/2026
Blackmagic Design Announces DaVinci Resolve 21
Brie Clayton April 14, 2026
0 Comments
Major update adds new Photo page bringing Hollywood's most a...
14/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
14/04/2026
Wowza will return to NAB Show 2026 with a set of live demonstrations focused on how video infrastructure is evolving for a new generation of AI-powered and oper...
14/04/2026
Stegawave Debuts Real-Time Forensic Watermarking to Tackle Piracy in Live Sports...
14/04/2026
Living in Boston: A Guide for Incoming Boston Conservatory Students From navigating the T to balancing school with professional gigs, a current student shar...
14/04/2026
Just What Is Genre These Days, Anyway? Understanding the business and art of genre-bending in 2026.
April 13, 2026
By
Bryan Parys
Illustration by Jack Fla...
14/04/2026
Lenora Helm Hammonds Is Turning Passion Into Plan A The dean of the Professional Education Division has seen the industry from all sides. Now shes bringing it...