
NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.
More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers - one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles - are finalists for CVPR's Best Paper Awards.
NVIDIA is also the winner of the CVPR Autonomous Grand Challenge's End-to-End Driving at Scale track - a significant milestone that demonstrates the company's use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR's Innovation Award.
NVIDIA's research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.
Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.
Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement, said Jan Kautz, vice president of learning and perception research at NVIDIA. At CVPR, NVIDIA Research is sharing how we're pushing the boundaries of what's possible - from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.
At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.
Forget Fine-Tuning: JeDi Simplifies Custom Image Generation Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind - they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.
Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning - where a user trains the model on a custom dataset - but the process can be time-consuming and inaccessible for general users.
JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.
JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand's product catalog.
https://blogs.nvidia.com/wp-content/uploads/2024/06/JeDi-cow-sculpture.mp4
New Foundation Model Perfects the Pose NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.
The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.
FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.
NeRFDeformer Transforms 3D Scenes With a Single Snapshot A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed - or remake the NeRF entirely.
Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.
VILA Visual Language Model Gets the Picture A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.
The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA's unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.
VILA can understand memes and reason based on multiple images or video frames. The VILA model fa
Most recent headlines
13/12/2025
Powering Client Growth: Horizon Deepens Nielsen Partnership, Enabling More Effic...
13/12/2025
In a move that will help it offer more flexible and less costly programming options, YouTube TV has announced that it will be launching YouTube TV Plans with mo...
13/12/2025
SINGAPORE Magna Systems has designed, built and completed what is believed to be the first full UHD and IP-based OB truck in Southeast Asia for a Singapore medi...
12/12/2025
SVG Summit 2025 Preview: Everything You Need to Know for Next Week's Big Sho...
12/12/2025
Hailey Gates at the Atropia premiere (photo by George Pimentel / Shutterstock for Sundance Film Festival)...
12/12/2025
Last month, Spotify announced a new collaboration with the ATP Tour, the global governing body of men's professional tennis, aimed at bringing the next gene...
12/12/2025
CONWAY, Ark. In a notable example of how the elimination of Federal federal funding is forcing public stations to make massive cuts and changes in the way they...
12/12/2025
Wisycom and DPA Microphones announce the appointment of Ren Moerch as Group Product Director, Wireless, a strategic leadership role that will guide the combine...
12/12/2025
SMPTE , the home of media professionals, technologists, and engineers, in conjuncture with the European Broadcasting Union (EBU) and the Entertainment Technolog...
12/12/2025
Keepit, the vendor-independent, cloud-native data protection provider, today announced a strategic go-to-market relationship in Poland with Ingram Micro, a lead...
12/12/2025
Atomos announced the immediate availability of a new firmware update for its Ninja TX GO and Ninja TX monitor-recorders, unlocking Open Gate 48P RAW recording w...
12/12/2025
Professional Wireless Systems (PWS) once again played a critical role in delivering flawless wireless coordination and support at the 2025 Latin Grammy Awards a...
12/12/2025
The Alliance for IP Media Solutions (AIMS), together with the Video Services Forum (VSF), the Advanced Media Workflow Association (AMWA) and the European Broadc...
12/12/2025
DHD audio will demonstrate the latest additions to its range of digital audio production solutions on Booth 321 in Hall B6 at Hamburg Open 2026. The show will b...
12/12/2025
Chaos today announces the release of V-Ray for Blender, update 2, bringing its award-winning rendering technology to even more Blender users by adding support f...
12/12/2025
Lighting specialist UltraLEDs has launched Precision LED Tape, a high-CRI lighting solution designed specifically for professional film, TV, and studio use.
P...
12/12/2025
Zixi, the Emmy Award-winning leader in live broadcast-quality video over IP, today announced that Roi Sasson has joined the company as Vice President, Engineer...
12/12/2025
BitFire (bitfire.tv), the leader in software-defined live production and IP transmission, today announced a strategic partnership with Appear, a leader in high-...
12/12/2025
LOS ANGELES The Hollywood Professional Association (HPA) today said futurist Robert Tercek, creative technologist Jessie Hughes from Leonardo.AI and Emmy-winnin...
12/12/2025
HUDSON, Mass. BitFire and Appear have struck a strategic partnership aimed at offering broadcasters, sports leagues and streaming platforms a faster, more flexi...
12/12/2025
The broadcast industry is evolving faster than ever. #IPWorkflows #remoteproduction, and next-gen audio systems are reshaping how teams design, deliver, and sca...
12/12/2025
LOS ANGELES The payroll and production accounting platform Wrapbook has announced the acquisition of Cinapse, a modern scheduling platform for film and televisi...
12/12/2025
DEHLI Ross Video has announced that it is expanding and restructuring its commercial and technical teams in the South Asian Association for Regional Cooperation...
12/12/2025
LONDON Following the success of its UK launch in January 2025, Rise AV, the global not-for-profit initiative dedicated to supporting and advancing women in the ...
12/12/2025
SAN FRANCISCO Ad-supported streaming service Tubi next week will launch Matter Casting, a new casting standard that will enable seamless mobile-to-TV viewing di...
12/12/2025
LOS ANGELES The Hollywood Professional Association (HPA) today said futurist Robert Tercek, creative technologist Jessie Hughes from Leonardo.AI and Emmy-winnin...
12/12/2025
Friday 12 December 2025
Ted is back! Seth MacFarlane's live-action comedic ...
12/12/2025
In Las Vegas's T-Mobile Arena, fans of the Golden Knights are getting more than just hockey - they're getting a taste of the future. ADAM, a robot devel...
12/12/2025
Uachtar n na h ireann, Catherine Connolly visited RT Raidi na Gaeltachta's...
12/12/2025
Ireland AM host Eric Roberts has been revealed as the sixth contestant taking to...
12/12/2025
Scripps Research team pioneers an efficient way to stereoselectively add fluorine to drug-like molecules A new method uses a novel catalyst and inexpensive fluo...
11/12/2025
Thomson and the Center for News, Technology and Innovation (CNTI) convened a two-day workshop in Sarajevo bringing together more than 35 journalists, editors, p...
11/12/2025
ESPN's Aims for Spectacular With Heisman Trophy ShowEvent firsts include 1080p HDR production airing on both national broadcast and cableBy Dan Daley, Audio...
11/12/2025
SVG Students To Watch: Frankie Patton, University of ColoradoThe 2025 grad is hitting the ground running as a PA on national broadcastsBy Brandon Costa, Directo...
11/12/2025
SVG Summit 2025 Technology Exhibits Preview, Part 3By SVG Staff
Thursday, December 11, 2025 - 7:24 am
Print This Story | Subscribe
Story Highlights
The 2...
11/12/2025
SVG Sit-Down: What Makes Gen Z, X, and Y Fans Tick? Dave Gavant of WSC Sports Go...
11/12/2025
SVG Summit 2025 Preview: 5G, MXL, Spectrum Loss, and Outerspace on Tap for Tues...
11/12/2025
2025 Sports Broadcasting Hall of Fame: David Levy, Turner Titan and Master of Al...
11/12/2025
SVG Launches Follow the Money' Podcast: Go Inside the Sports Media Biz with...
11/12/2025
A Deep Dive Inside Game Creek Video's Bird and Magic Mobile Units, Home to A...
11/12/2025
How Sound Effects for Monsters Funday Football' Emulated the Sonic Soul of ...
11/12/2025
SVG New Sponsor Spotlight: CSP Mobile Productions' Len Chase on Upgrading Tr...
11/12/2025
Having the right song soundtrack your moves can make all the difference when gam...
11/12/2025
It's been a big year for Taylor Swift. Her highly anticipated album The Life...
11/12/2025
New satellites for the SDA Tranche 1 Tracking program in production at L3Harris&...
11/12/2025
The Meadowlands system, a compact and mobile version of the CCS, uses ground-based radio frequency units to disrupt satellite communications....
11/12/2025
The L3Harris demonstration united tactical communications devices, counter-UAS c...
11/12/2025
Throughout 2025, L3Harris delivered innovative solutions to U.S. and allied warfighters across every domain.
With an unrelenting commitment to excellence, our...
11/12/2025
A Majority of the World's Population (51%) Identify As Soccer Fans
The 2025 MLB postseason notched 58.2 billion viewing minutes, up +24% from the prior y...
11/12/2025
WALTHAM, Mass. Video-over-IP software provider Zixi said Roi Sasson has joined the company as vice president, engineering....