
NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.
More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers - one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles - are finalists for CVPR's Best Paper Awards.
NVIDIA is also the winner of the CVPR Autonomous Grand Challenge's End-to-End Driving at Scale track - a significant milestone that demonstrates the company's use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR's Innovation Award.
NVIDIA's research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.
Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.
Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement, said Jan Kautz, vice president of learning and perception research at NVIDIA. At CVPR, NVIDIA Research is sharing how we're pushing the boundaries of what's possible - from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.
At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.
Forget Fine-Tuning: JeDi Simplifies Custom Image Generation Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind - they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.
Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning - where a user trains the model on a custom dataset - but the process can be time-consuming and inaccessible for general users.
JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.
JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand's product catalog.
https://blogs.nvidia.com/wp-content/uploads/2024/06/JeDi-cow-sculpture.mp4
New Foundation Model Perfects the Pose NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.
The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.
FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.
NeRFDeformer Transforms 3D Scenes With a Single Snapshot A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed - or remake the NeRF entirely.
Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.
VILA Visual Language Model Gets the Picture A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.
The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA's unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.
VILA can understand memes and reason based on multiple images or video frames. The VILA model fa
North America Stories
13/04/2026
Telos Alliance has announced that J nger Audio has joined the EBU ADM Implementers Group (ADM-IG) as a founding member. The group is focused on advancing ADM an...
13/04/2026
Grass Valley will demonstrate its Alliance Partner ecosystem at NAB Show 2026 (Booth C2408, Central Hall, April 19-22), showing AMPP integrations across live pr...
13/04/2026
Media Links will exhibit at NAB Show 2026 (Booth W2033), demonstrating IP transport solutions for live production including hitless protection technology, Xscen...
13/04/2026
NBC Sports has announced a programming, distribution, and sales partnership with...
13/04/2026
FloSports has promoted Chief Operating Officer Jayar Donlan to President, effective immediately. In his new role, Donlan will lead the company's commercial,...
13/04/2026
PanCam Pictures, the documentary production company founded by Paul Camarata, us...
13/04/2026
Mimir will exhibit at NAB Show 2026 (North Hall, Booth N2850), demonstrating its cloud-native media production platform with new capabilities including Mimir Cu...
13/04/2026
BBright has announced that its IP Gateway now supports the Reliable Internet Stream Transport (RIST) protocol. The addition will be introduced at NAB Show 2026 ...
13/04/2026
Net Insight has been awarded a development project through the European Space Agency's Navigation Innovation and Support Program (NAVISP), with co-funding f...
13/04/2026
intoPIX will exhibit at NAB Show 2026, marking the company's 20th anniversary. The company will demonstrate its JPEG XS compression portfolio and IPMX-appro...
13/04/2026
Starting from scratch, the team built an in-house content platform comprising ga...
13/04/2026
Here's a look at some of the new products and updates, along with audio-centric conferences, that attendees will find next week at the show
When the 2026 N...
13/04/2026
Avid will launch new integrated newsroom capabilities for Avid for News at NAB Show 2026 (Booth N2226, April 18-22), demonstrating how Avid Content Core connect...
13/04/2026
Synamedia has announced a new version of Quortex PowerVu, an IP-native, software...
13/04/2026
Mediaproxy has developed a suite of AI-powered tools for brand and advertisement tracking, integrated into its LogServer compliance logging and analysis platfor...
13/04/2026
Disguise will demonstrate its media servers and software at NAB Show 2026, appearing across five partner booths in Central Hall: MRMC, B&H, Planar, CarbonBlack,...
13/04/2026
OpenDrives is introducing OpenDrives Edge at NAB Show 2026, a hybrid cloud-edge performance accelerator for distributed video and rich media workflows. The prod...
13/04/2026
The show will deploy 18 cameras across two sets and the draft floor, including a...
13/04/2026
L3Harris is accelerating the development of infrared payloads for Space Development Agency's Tranche 2 Tracking Layer, to help meet urgent national defense ...
13/04/2026
By leveraging cutting-edge unfilmed Gen III image intensifier technology, NOVA delivers unmatched clarity, range, and reliability in low-light environments - en...
13/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/04/2026
Explore new Disguise plugins, including Sony's VP integration; Listen to panels across partner booths at Sony and B&H
Disguise, the company powering everyt...
13/04/2026
TAG Video Systems, the leading IP-native Realtime Media Platform, has announced its participation in the Media Exchange Layer (MXL) interop initiative. TAG has ...
13/04/2026
Today, Chaos launched V-Ray for Blender Community Edition at BCON Austin 2026, making its production-proven 3D renderer free for all Blender users. The same Aca...
13/04/2026
Additions strengthen LTN's leadership as broadcasters scale satellite-to-IP transition
LTN today announced the appointments of Mark Romano as Vice Presiden...
13/04/2026
LEEDS, UK, APRIL 13, 2026 NUGEN Audio releases Halo Vision v1.2, a significant update to its real time, customizable audio analysis suite for 3D, surround and...
13/04/2026
Atomos today announced the acquisition of Flanders Scientific (FSI), one of the most respected names in professional reference monitoring. This strategic move r...
13/04/2026
How Mei Semones Built Her Sound from J-Pop, Jazz, and Bilingual Songwriting The indie-pop artist combines agile guitar lines, rhythmic shifts, and lyrics that...
13/04/2026
Cue the Change: Jonathon Heyward Is Making Classical Music More Relatable Nicknamed the Converse Conductor, the Boston Conservatory alum holds top conductin...
13/04/2026
Heat Wave: Inside Miamis Sizzling, Boundary-Blurring Latin Music Scene In a city shaped by migration and exchange, Berklee alumni are helping drive a Latin mu...
13/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/04/2026
DHD audio, developer and manufacturer of digital audio systems for professional broadcast, has launched a comprehensive brand update to mark its 30th anniversar...
13/04/2026
Stegawave, an Irish technology company specialising in forensic watermarking for video content, today announced the launch of its anti-piracy platform for live ...
13/04/2026
New version of Quortex PowerVu delivers a standards-based approach to satellite-to-IP transitions, eliminating the need for baseband workflows and complex infra...
13/04/2026
Grass Valley LDX camera systems enable leading German production company to support broadcast and cinematic live production within a single environment.
Grass ...
13/04/2026
London, UK, 13 April 2026 Techex and MediaKind today announced a partnership to integrate Techexs IP transport and orchestration technology, tx edge, directly...
13/04/2026
In today's hybrid education environments, there is no one-size-fits-all' AV solution. Lightware's extensive AV portfolio addresses this challenge, ...
13/04/2026
Mediaproxy, the global standard for software-based IP compliance monitoring and multiviewing solutions, has developed a new suite of AI-powered tools designed t...
13/04/2026
Freelance Video Cameraman - Los Angeles
Brie Clayton April 13, 2026
0 Comments
Freelance Video Cameraman
April 8, 2026COW Jobs: Director Needed for ...
13/04/2026
Atomos to Acquire Flanders Scientific
Brie Clayton April 13, 2026
0 Comments
Strengthening commitment to precision monitoring, from camera to delivery...
13/04/2026
Digital Anarchy Announces ShotNotes, A Notepad and Task Tracking Panel for Premi...
13/04/2026
NAB 2026 Live Demo at HP Booth Highlights JALI Powered Interactive AI Character ...
13/04/2026
Manifold Introduces AT300 Multiviewer Support at NAB 2026
Brie Clayton April 13, 2026
0 Comments
and adds HDR-SDR conversion to the recently announced...
13/04/2026
Back to All News
Thank You Next, Another Self and Graveyard Return to Netflix W...
13/04/2026
Back to All News
Rafa, the Rafael Nadal Documentary, Premieres on Netflix on May 29
Entertainment
13 April 2026
GlobalSpain
Link copied to clipboard
Disco...
12/04/2026
Beeble expands AI production workflow ahead of NAB 2026 with background remover
Brie Clayton April 11, 2026
0 Comments
Ahead of its upcoming participa...
12/04/2026
Like and Transcribe Mei Semones BM '22 blends languages and techniques to create her singular style.
April 10, 2026
By
Bryan Parys
Mei Semones BM '...
12/04/2026
Cue the Change Nicknamed the Converse Conductor, Jonathon Heyward BM '14 is making classical music more relatable.
April 10, 2026
By
Sarah Godcher Murp...
12/04/2026
Heat Wave Inside Miamis sizzling, boundary-blurring Latin music scene.
April 13, 2026
By
Ricardo Herrera Bandrich
Image by Stella Levi
Down there: Thats ...