
NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.
More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers - one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles - are finalists for CVPR's Best Paper Awards.
NVIDIA is also the winner of the CVPR Autonomous Grand Challenge's End-to-End Driving at Scale track - a significant milestone that demonstrates the company's use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR's Innovation Award.
NVIDIA's research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.
Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.
Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement, said Jan Kautz, vice president of learning and perception research at NVIDIA. At CVPR, NVIDIA Research is sharing how we're pushing the boundaries of what's possible - from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.
At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.
Forget Fine-Tuning: JeDi Simplifies Custom Image Generation Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind - they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.
Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning - where a user trains the model on a custom dataset - but the process can be time-consuming and inaccessible for general users.
JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.
JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand's product catalog.
https://blogs.nvidia.com/wp-content/uploads/2024/06/JeDi-cow-sculpture.mp4
New Foundation Model Perfects the Pose NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.
The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.
FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.
NeRFDeformer Transforms 3D Scenes With a Single Snapshot A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed - or remake the NeRF entirely.
Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.
VILA Visual Language Model Gets the Picture A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.
The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA's unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.
VILA can understand memes and reason based on multiple images or video frames. The VILA model fa
North America Stories
11/07/2025
PARK CITY, UTAH, July 11, 2025 - The nonprofit Sundance Institute announced today the 11 producers chosen for its annual Producers Labs, returning to Ucross Fou...
11/07/2025
L3Harris Technologies President of Intelligence, Surveillance and Reconnaissance Jason Lambert and General Manager of L3Harris Waco facility Sean Ling held a ce...
11/07/2025
ARLINGTON, Va. WETA, the flagship public media station in the national capital area, has launched WETA+, a new streaming service tailored for the local Washingt...
11/07/2025
The Federal Communications Commission has emerged as one of the central players in the broadcast TV landscape in 2025, with its deregulatory policies sparking h...
11/07/2025
Calrec will introduce usability, customization and system enhancements across its entire range of Argo consoles during IBC2025, Sept. 12-15, at the RAI Amsterda...
11/07/2025
LONDON Encompass Digital Media said it will support live and on-demand viewing of the 2025 FIFA Club World Cup across multiple global regions for sports enterta...
11/07/2025
Two-thirds of broadcast engineers reaped the benefits of a pay raise within the last year....
11/07/2025
CARY, N.C. SmallHD has launched the Quantum 27, a new 26.5-inch Quantum-Dot OLED monitor designed to deliver postproduction image quality in a compact, set-frie...
11/07/2025
The Federal Communications Commission's Enforcement Bureau and Tegna have entered into a consent decree that will settle an investigation into the accidenta...
11/07/2025
WASHINGTON Following news in early July that Paramount had settled President Donald Trump's lawsuit, Sens. Edward J. Markey (D-Mass.) and Ben Ray Luj n (D-N...
11/07/2025
Model/Actriz Performs Lead Single Cinderella on The Late Show with Stephen Colbe...
11/07/2025
Behind the Mic: Amazon Prime Preps for First Season of NBA Action; MSG Networks ...
11/07/2025
SVG New Sponsor Spotlight: Suite Studios' Craig Hering on Adapting to Client...
11/07/2025
2025 SVG Content Management Forum Breaks Down AI's Impact, Continued Transit...
11/07/2025
A Journey HOME: University of Nebraska's HuskerVision Goes IP Leaders from the HuskerVision and Lawo share their IP learnings By SVG Staff
Friday, July 1...
11/07/2025
CMSI, Remote Picture Labs, Ace ESPN's Cloud-Based Editing Efforts for Wimble...
11/07/2025
Netflix Enters the Live-Boxing-Production Ring for Round 2 With Historic Taylor-...
11/07/2025
Back to All News
Too Hot to Handle: Italy Is Coming on July 18 Only on Netflix
Entertainment
11 July 2025
GlobalItaly
Link copied to clipboard
July 11, 20...
11/07/2025
Back to All News
Netflix Will Release Death Inc. Seasons 1, 2 and 3
Entertainment
11 July 2025
GlobalSpain
Link copied to clipboard
Season 1
Season 2
Se...
11/07/2025
Ceramics - the humble mix of earth, fire and artistry - have been part of a global conversation for millennia.
From Tang Dynasty trade routes to Renaissance pa...
10/07/2025
NEW YORK - July 10, 2025 - Nielsen, the global leader in audience measurement, data and analytics, today announced that it appointed Richard Pacheco as head of ...
10/07/2025
Local newscasts don't exist in a vacuum. News directors and station management constantly evaluate what's working, what isn't and perhaps most impor...
10/07/2025
Lawo has announced that Stuttgart Media University (Hochschule der Medien, HdM) has comprehensively modernized its central recording studio after selecting an I...
10/07/2025
The Society of Motion Picture and Television Engineers (SMPTE) has opened early-bird registration for the Media Technology Summit, which will take place in a ne...
10/07/2025
NASHVILLE, Tenn. TNDV Television has launched Aspiration 35, a new version of its 40-foot Aspiration truck reimagined for cinematic multicamera productions....
10/07/2025
BURBANK, Calif. Key Code Education, a provider of instructor-led postproduction training, is growing its curriculum with new programs for beginner and intermedi...
10/07/2025
HACKENSACK, N.J. Actus Digital will demonstrate how broadcasters can transform compliance monitoring from a necessary expense into a strategic revenue driver at...
10/07/2025
The Federal Register has published a summary of the Federal Communications Commission's Public Notice seeking comments on its ownership rules that lists a d...
10/07/2025
Back to All News
Netflix Presents the Official Trailer for SuperestarPlay Video
Play Video
Entertainment
10 July 2025
GlobalSpain
Link copied to clipboard...
10/07/2025
In the race to understand our planet's changing climate, speed and accuracy are everything. But today's most widely used climate simulators often strugg...
10/07/2025
As one of the world's largest emerging markets, Indonesia is making strides toward its Golden 2045 Vision - an initiative tapping digital technologies and...
10/07/2025
5G for all? What the DFL's use of Easy5G and RefCam could mean for events in...
10/07/2025
Save the Date: PGA TOUR Studios Welcomes SVG Remote Production Summit on Oct 14-...
10/07/2025
Cloud on the Road: How Remote-Production-Service Providers Are Adapting to a New...
10/07/2025
Seattle Kraken's Ryan Schaber on the NHL Team Taking Live Game Productions I...
10/07/2025
FOX Sports Reboots Small Control Room in Los Angeles as Hub for Vertical-First P...
10/07/2025
SVG Sit-Down: MSE's Zach Leonsis, ViewLift's Rick Allen Go Deep on Joint...
10/07/2025
Back to All News
Bringing Culture Into Focus on My Brilliant Career': Firs...
10/07/2025
Strategic Alliance Combines Daktronics' LED Display and Content Management S...
10/07/2025
Grab a friend and climb toward the clouds - PEAK is now available on GeForce NOW, enabling members to try the hugely popular indie hit on virtually any device.
...
10/07/2025
Coding assistants or copilots - AI-powered assistants that can suggest, explain and debug code - are fundamentally changing how software is developed for both e...
09/07/2025
By Bailey Pennick
There's something arresting about the way Jomo Fray captures the world. The cinematographer, now best known for his unparalleled work on ...
09/07/2025
Key Highlights
Centralized management interface for full control, monitoring, and diagnostics
Scalable, multi-site OTT decryption and distribution
Secure int...
09/07/2025
MELBOURNE, Fla., July 9, 2025 - L3Harris Technologies (NYSE: LHX) has appointed Rob Mitrevski as President, Golden Dome Strategy and Integration, a new role cre...
09/07/2025
Collaboration will result in improved data quality and understanding of genre-le...
09/07/2025
The National Association of Broadcasters is hitting back at critics who oppose its proposal to phase out the current ATSC 1.0 DTV over-the-air standard and tran...
09/07/2025
Zeam Media's hyperlocal streaming platform Zeam has announced a new distribution deal with LG that will bring the streaming service to LG smart TVs and devi...
09/07/2025
MetaBroadcast, the UK's leading metadata management specialist, announced today that it was awarded a three-year contract from TAM Ireland (Television Audie...
09/07/2025
More than 700 professional devices and applications already integrated through open software
Bitfocus, the specialist in media control and monitoring, is show...
09/07/2025
Actus Digital, a LiveU company, will demonstrate how broadcasters can transform compliance monitoring from a necessary expense into a strategic revenue driver a...