Sony Pixel Power calrec Sony

Seamless in Seattle: NVIDIA Research Showcases Advancements in Visual Generative AI at CVPR

17/06/2024

NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.

More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers - one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles - are finalists for CVPR's Best Paper Awards.

NVIDIA is also the winner of the CVPR Autonomous Grand Challenge's End-to-End Driving at Scale track - a significant milestone that demonstrates the company's use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR's Innovation Award.

NVIDIA's research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.

Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.

Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement, said Jan Kautz, vice president of learning and perception research at NVIDIA. At CVPR, NVIDIA Research is sharing how we're pushing the boundaries of what's possible - from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.

At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.

Forget Fine-Tuning: JeDi Simplifies Custom Image Generation Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind - they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.

Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning - where a user trains the model on a custom dataset - but the process can be time-consuming and inaccessible for general users.

JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.

JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand's product catalog.

https://blogs.nvidia.com/wp-content/uploads/2024/06/JeDi-cow-sculpture.mp4

New Foundation Model Perfects the Pose NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.

The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.

FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.

NeRFDeformer Transforms 3D Scenes With a Single Snapshot A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed - or remake the NeRF entirely.

Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.

VILA Visual Language Model Gets the Picture A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.

The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA's unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.

VILA can understand memes and reason based on multiple images or video frames. The VILA model fa
LINK: https://blogs.nvidia.com/blog/visual-generative-ai-cvpr-research/...
See more stories from nvidia

North America Stories

24/04/2026

NAB Show Reports More Than 58,000 Registered Attendees for 2026

Share Copy link Facebook X Linkedin Bluesky Email...

24/04/2026

Dalet Takes Home The Best in Show Award for Dalia at 2026...

Media-aware agentic AI wins big for real-world efficiencies and time to value Dalet, a leading technology and service provider for media-rich organizations, to...

24/04/2026

Mediagenix Sweeps 2026 NAB Awards With Wins for Product o...

Mediagenix wins for its Scheduling Optimization capabilities that help broadcasters and FAST operators move beyond traditional scheduling automation toward cont...

24/04/2026

Setplex Secures Top Honors with NAB Show Project of the Y...

Setplex today announced that it has taken home the NAB Show Project of the Year Award in the Distribution category for its innovative deployment with UVOtv. Dur...

24/04/2026

farmerswife and Cirkus are Exhibiting MPTS 2026

Media and post-production teams are invited to experience next-level resource planning, project management, and connected media workflows at Stand K59 in The Gr...

23/04/2026

NAB Honors Rob Lowe and John Tesh With Hall of Fame Induction

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

Roku, Samsung Dominate CTV Platform Market in U.S.

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

G&D and VuWall Strengthen International Sales Team

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

The 2026 NAB Show Reports More than 58,000 Attendees

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

SmallHD Monitor Overlay License for Hi-5 and Hi-5 SX deli...

Partnership between ARRI and SmallHD brings new Hi-5 license Configurable monitor overlays adapt to individual working styles Supported by SmallHD monitors ru...

23/04/2026

Jeff Cronenweth ASC Sheds Light on Tron Ares with Astera

Lighting Master Cronenweth ASC brings a unique look to each grid world with the help of Astera Jeff Cronenweth on the set of Disney's TRON: ARES. Photo by...

23/04/2026

ZEISS Supreme Primes Shine in Star-Driven Short Dr Sam

DP Chloe Smolkin ( The Late Show, Kidz Bop ) joins director Danielle Beckmann and writer/actor Raji Ahsan behind the camera for the heartfelt short comedy Dr...

23/04/2026

Tag, You're It: GeForce NOW Levels Up Game Discovery With Xbox Game Pass and Ubisoft+ Labels

GeForce NOW is doubling down on what matters most: gamers. This week's upgra...

22/04/2026

Live From NAB 2026: Solid State Logics Berny Carpenter on Expanding System T With Virtual DSP, Cloud Workflows

Solid State Logic is advancing its System T platform with a stronger focus on IP...

22/04/2026

Live From NAB 2026: Dolbys Giles Baker on the Growth of Dolby OptiView, Immersive Vision and Audio for Live Sports

From immersive audio to live streaming, Dolby Laboratories is focused on the fut...

22/04/2026

Live From NAB 2026: Blackmagic Design's Bob Caniglia on Implementing Cinematic Looks in Live Broadcasts

Shallow depth-of-field cameras have taken the industry by storm. Its debut a han...

22/04/2026

NAB 2026: Eastern Kentucky University deploys campus-wide ST 2110 network with Riedel and Bridge Digital

Riedel Communications (Booth C4908) announced that Eastern Kentucky University (...

22/04/2026

SportsTechBuzz at NAB 2026, Day 4: Live Reports From the Show Floor in Vegas

The NAB Show is in full swing, and the SVG and SVG Europe editorial teams are chasing down the hottest stories from all over the Las Vegas Convention Center. He...

22/04/2026

NAB 2026: Blackmagic Design Announces URSA Cine 12K LF 100G

Blackmagic Design has announced the URSA Cine 12K LF 100G, a new model in the URSA Cine family adding 100G Ethernet for SMPTE 2110 live production output up to ...

22/04/2026

Live From NAB 2026: NEPs Martin Stewart Talks 40 Years, the NEP Platform, and Scaling for FIFA World Cup

Celebrating its 40th anniversary, NEP is leaning into hybrid production with the...

22/04/2026

Live From NAB 2026: NEPs Dan Murphy on NEP Platform, TFC, and the Shift to Software-Defined Workflows

NEP VP, Platform Dan Murphy sits down at the 2026 NAB Show to unpack what NEP P...

22/04/2026

The Frequency That Decides the Fight

Why Low Band Electronic Warfare Matters...

22/04/2026

Polish national football team play-off games top monthly programme list

The nation unites around football team's World Cup dream Warsaw, Poland, 20.04.26: Nielsen, a global leader in audience measurement, data, and media intell...

22/04/2026

Nielsen and the Polish Organisation of Advertisers announce strategic partnership to elevate marketing standards in Poland

Warsaw, Poland, 22.04.26: Nielsen, a global leader in audience measurement, data...

22/04/2026

Nielsen helps New Zealand brands expand internationally with greater clarity and confidence

New market intelligence offering gives businesses a clearer view of local consum...

22/04/2026

Glookast Unveils New UX, YouTube and Social Media Connectors, Premiere Panel, Cinnafilm Tachyon Plugin and More at NAB

Glookast Unveils New UX, YouTube and Social Media Connectors, Premiere Panel, Ci...

22/04/2026

Lightcraft Technology to Preview Spark Story at NAB 2026 with Interactive Previs Experience

Lightcraft Technology to Preview Spark Story at NAB 2026 with Interactive Previs...

22/04/2026

Bolin Demos New PTZ Cameras and Controller at 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

Anchor Audio Launches Beacon 3

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

FCC Grants WSWB TV License Transfer to Sinclair

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

Telemundo Puerto Rico Streaming Channel Launches On Prime Video

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

Chyron Announces PRIME Translate

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

TV Tech Announces Winners of Best of Show Awards at 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

This Earth Day, Discover the Sustainable Productions Behind Our Films and Series

Back to All News This Earth Day, Discover the Sustainable Productions Behind Our Films and Series Emma Stewart, Ph.D. Netflix Sustainability Officer Enterta...

22/04/2026

NVIDIA and Google Cloud Collaborate to Advance Agentic and Physical AI

NVIDIA and Google Cloud have collaborated for more than a decade, co engineering a full stack AI platform that spans every technology layer - from performance o...

21/04/2026

Live From NAB 2026: BitFires Colin Bonzey on Growing Spark Platform for Live Cloud-Based Productions

Cloud-based production isnt going anywhere, and BitFire is doubling down by prov...

21/04/2026

Live From NAB 2026: AWSs Jason Dvorkin, Regina Rossi on Driving Innovation With Al-Based Workflows

The topic of artificial intelligence has a stranglehold on the sports-video-prod...

21/04/2026

Live From NAB 2026: T-Mobile for Business' Jason Schnellbacher on Enhancing 5G for Sports Fans, Broadcasters

5G is still a hot topic in live event production, and this workflow continues to...

21/04/2026

Live From NAB 2026: Appears Ed McGivern on Fox Sports Deal, New XM Platform, and VX Software Debut

At the 2026 NAB Show, Ed McGivern, GM and President of Appear US, discusses the ...

21/04/2026

NAB 2026: Studio Network Solutions launches on-premise AI suite for media production workflows

Studio Network Solutions (SNS) has announced an on-premise AI suite designed for...

21/04/2026

NAB 2026: Suite Studios integrates file-streaming technology into Frame.io Drive

Suite Studios has integrated its file-streaming technology into the newly announced Frame.io Drive, a desktop application from Adobe company Frame.io. The colla...

21/04/2026

NAB 2026: Net Insight integrates InSync FrameFormer into Nimbra Edge for media processing

Net Insight has integrated InSync Technology's FrameFormer into the Nimbra E...

21/04/2026

NAB 2026: Fox Sports selects Appear X Platform for live production infrastructure

Fox Sports has selected Appear as a technology partner to support the next phase...

21/04/2026

NAB 2026: Diversified appoints Tyler Affolter as Chief Revenue Officer

Diversified has appointed Tyler Affolter as Chief Revenue Officer (CRO) to lead the company's commercial organisation. The appointment follows the firm'...

21/04/2026

NAB 2026: Layercake integrates Bitmovin into Streamcake platform for end-to-end media orchestration

Layercake has formalised the integration of Bitmovin's video streaming infra...

21/04/2026

NAB 2026: International Judo Federation extends global content distribution partnership with SES

The International Judo Federation (IJF) has extended its distribution partnershi...

21/04/2026

NAB 2026: Glookast integrates Cinnafilm Tachyon plugin to enable GPU-accelerated video processing

Glookast has launched the Cinnafilm Tachyon plugin for its Media Producer and Me...

21/04/2026

NAB 2026: Cadena Tres selects Eutelsat for television signal distribution in Mexico

Eutelsat has entered into an agreement with Cadena Tres, a division of Grupo Ima...

21/04/2026

NAB 2026: Dolby and TV Azteca deploy Dolby Atmos for free-to-air broadcast

Dolby Laboratories and TV Azteca have partnered to introduce Dolby Atmos immersive audio to free-to-air television broadcasts. The implementation utilises the A...

21/04/2026

Verizon and FOX Entertainment leverage 5G and AI for remote production of Extracted

FOX Entertainment partnered with Verizon to overcome significant production hurd...