
NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.
More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers - one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles - are finalists for CVPR's Best Paper Awards.
NVIDIA is also the winner of the CVPR Autonomous Grand Challenge's End-to-End Driving at Scale track - a significant milestone that demonstrates the company's use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR's Innovation Award.
NVIDIA's research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.
Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.
Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement, said Jan Kautz, vice president of learning and perception research at NVIDIA. At CVPR, NVIDIA Research is sharing how we're pushing the boundaries of what's possible - from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.
At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.
Forget Fine-Tuning: JeDi Simplifies Custom Image Generation Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind - they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.
Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning - where a user trains the model on a custom dataset - but the process can be time-consuming and inaccessible for general users.
JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.
JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand's product catalog.
https://blogs.nvidia.com/wp-content/uploads/2024/06/JeDi-cow-sculpture.mp4
New Foundation Model Perfects the Pose NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.
The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.
FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.
NeRFDeformer Transforms 3D Scenes With a Single Snapshot A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed - or remake the NeRF entirely.
Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.
VILA Visual Language Model Gets the Picture A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.
The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA's unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.
VILA can understand memes and reason based on multiple images or video frames. The VILA model fa
Most recent headlines
11/12/2025
RASTATT, Germany Lawo and the Society of Motion Picture and Television Engineers (SMPTE) have partnered to launch the SMPTE ST 2110 Practical Lab, an immersive ...
11/12/2025
PHILADELPHIA Comcasts Xfinity operating brand has announced the launch of new national video plans with all-in pricing that the operator said will provide custo...
11/12/2025
After eight years of declines, MoffettNathansons new Cord Cutting Monitor for Q3 2025 shows that pay TV subscribers to linear TV packages rose by 303,000, the f...
11/12/2025
Happy Holidays from Berklee Enjoy this years holiday student-performance video.
December 10, 2025
By
Office of the President
Dear Berklee community,
As w...
11/12/2025
Dalet, a leading provider of cloud-native, end-to-end media workflow solutions, ...
10/12/2025
Sound-Alike Commercials Are Part of Sports' Soundtrack Johnny Cash for Coca-Cola is the latest in a long litany of sonic approximationsBy Dan Daley, Audio ...
10/12/2025
Immersive Sound Is Logical Next Step for Sports VenuesSound-systems suppliers are sanguine, but the market has its challengesBy Dan Daley, Audio Editor
Wednes...
10/12/2025
The Romans Built Arenas for Immersive Sound 2,000 Years AgoThe historic Arena of Nimes in France is still in use todayBy Dan Daley, Audio Editor
Wednesday, De...
10/12/2025
SVG Summit 2025 Preview: Audio Workshop Hits on Immersive, Virtualized, and Next...
10/12/2025
SVG Summit 2025 Technology Exhibits Preview: Audio SpotlightBy SVG Staff
Wednesday, December 10, 2025 - 8:21 am
Print This Story | Subscribe
Story Highlig...
10/12/2025
SVG Europe Audio: Listening to the sounds of powder and ice at Milano Cortina wi...
10/12/2025
Advancements in audio technology: Capturing the atmosphere of live sports By David Davies
Tuesday, November 25, 2025 - 09:27
Print This Story
Although wor...
10/12/2025
Everything smelled of popcorn: The art of bringing the complex sound of esports ...
10/12/2025
Top L-R: Ha-Chan, Shake Your Booty!, Hanging by a Wire, Broken English, Buddy
C...
10/12/2025
For the first time, Spotify is giving users the power to steer the algorithm. Gustav S derstr m, Spotify's Co-President, CPO, and CTO, shares the vision beh...
10/12/2025
L3Harris' new contract for Guided Multiple Launch Rocket System Insensitive ...
10/12/2025
L3Harris Meadowlands system has been designed with an open architecture software system that allows for more flexible and efficient software updates. This capab...
10/12/2025
During this interval, streaming comprised the majority of ad supported TV (46.4%...
10/12/2025
NEWPORT BEACH, Calif. Bitcentral, a provider of production, asset management, playout and streaming workflow solutions, has named technology veteran Rick Arnold...
10/12/2025
TV Tech is delighted to reveal the winners of the 2025 Media & Entertainment: Best in Market Awards....
10/12/2025
BOTHELL, Wash. The Alliance for IP Media Solutions (AIMS), the Video Services Forum (VSF), the Advanced Media Workflow Association (AMWA) and the European Broad...
10/12/2025
In a notable example of how pay TV operators are integrating streaming services into their lineup and using those services to retain or attract subscribers, Dir...
10/12/2025
Today, Chaos builds instant feedback into the viewport, connecting Maya and Houdini to Chaos Vantage's real-time path tracer. Artists can now assess 3D asse...
10/12/2025
Smeup, a key partner for companies engaged in digital transformation, today announced the expansion of its adoption of Cubbit, the first geo-distributed cloud s...
10/12/2025
Mediagenix, a global leader in smart content solutions to profitably connect the right content to the right audience, today announced two significant milestones...
10/12/2025
BEAVERTON, Ore. HDR10+ Technologies, LLC has announced that they will soon begin the licensing and certification of devices, content, and services that support ...
10/12/2025
SMPTE has joined forces with the European Broadcasting Union (EBU) and Entertainment Technology Center (ETC) to publish an updated report on AI and its impact o...
10/12/2025
Clear-Com is pleased to announce the appointment of Kris Koch as Director of Sales - North & South America. In this expanded leadership role, Kris will oversee...
10/12/2025
Mavis today announced the latest version of Mavis Camera (v7.4), a major update to its professional iOS camera app, headlined by the launch of Film Kit - an opt...
10/12/2025
Creamsource, renowned for its Vortex series of cinematic lighting, is laying the groundwork for its next phase of growth with the addition of Markus Zeiler as G...
10/12/2025
Digital Alert Systems, a global leader in emergency communications solutions for media providers, today announced that the DAS3-DC-PS, a new DC power supply opt...
10/12/2025
Riedel Communications today announced it has formed a strategic partnership with Racing Electronics, a premier provider of motorsport communication equipment in...
10/12/2025
#GALSNGEAR is launching two signature leadership retreats in early 2026, designed to equip women in media, entertainment, and technology with the tools to lead...
10/12/2025
Providing worldwide customers with total confidence through transparent, all-inclusive pricing
CVP, one of Europe's leading suppliers of professional video...
10/12/2025
With the Federal Communications Commission working on new rules for the deployment of NextGen TV, next year promises to be an important one for both the future ...
10/12/2025
DENVER Tom Rutledge, director emeritus and former president and CEO of Charter Communications, will be honored with the 2026 Bresnan Ethics in Business Award by...
10/12/2025
NEW YORK Novocap's Cadent has acquired VuePlanner, a YouTube video ad planning, optimization, and measurement company in a deal that will help Cadent expand...
10/12/2025
The NVIDIA accelerated computing platform is leading supercomputing benchmarks once dominated by CPUs, enabling AI, science, business and computing efficiency w...
10/12/2025
The world's top-performing system for graph processing at scale was built on...
10/12/2025
As the scale and complexity of AI infrastructure grows, data center operators need continuous visibility into factors including performance, temperature and pow...
10/12/2025
In preparation for the madness of March, here are some important reminders for scheduling back-to-back Special Playlists.
The first Special Playlist MUST end b...
10/12/2025
10 Dec 2025
VEON's Rising Capital Markets Profile Strengthened by Inclusion...
10/12/2025
10 Dec 2025
VEON Recognized for JazzCash, Kyivstar and Jazz at the World Commun...
10/12/2025
December 10th, 2025
TRIBECA FILMS TO RELEASE THE INDEPENDENT DOCUMENTARY FILM...
10/12/2025
Wednesday 10 December 2025
Sky extends partnership with the Ladies European Tour for a landmark 30th year
Sky and the Ladies European Tour (LET) have announce...
10/12/2025
Wednesday 10 December 2025
Walk-on if you love the darts: James Maddison, Luke ...
10/12/2025
Rohde & Schwarz presents world's first RF power sensor with 0.80 mm RF conne...
10/12/2025
Back to All News
2026 Starts With a Swoon: Kim Seon-ho and Go Youn-jung Lead C...
10/12/2025
Back to All News
Berlin and the Lady with an Ermine Arrives to Netflix on May 15
Entertainment
10 December 2025
GlobalSpain
Link copied to clipboard
THE N...
10/12/2025
It's out of the frying pan and into the sequins for comedian and actor Micha...