Sony Pixel Power calrec Sony

Seamless in Seattle: NVIDIA Research Showcases Advancements in Visual Generative AI at CVPR

17/06/2024

NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.

More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers - one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles - are finalists for CVPR's Best Paper Awards.

NVIDIA is also the winner of the CVPR Autonomous Grand Challenge's End-to-End Driving at Scale track - a significant milestone that demonstrates the company's use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR's Innovation Award.

NVIDIA's research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.

Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.

Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement, said Jan Kautz, vice president of learning and perception research at NVIDIA. At CVPR, NVIDIA Research is sharing how we're pushing the boundaries of what's possible - from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.

At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.

Forget Fine-Tuning: JeDi Simplifies Custom Image Generation Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind - they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.

Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning - where a user trains the model on a custom dataset - but the process can be time-consuming and inaccessible for general users.

JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.

JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand's product catalog.

https://blogs.nvidia.com/wp-content/uploads/2024/06/JeDi-cow-sculpture.mp4

New Foundation Model Perfects the Pose NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.

The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.

FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.

NeRFDeformer Transforms 3D Scenes With a Single Snapshot A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed - or remake the NeRF entirely.

Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.

VILA Visual Language Model Gets the Picture A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.

The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA's unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.

VILA can understand memes and reason based on multiple images or video frames. The VILA model fa
LINK: https://blogs.nvidia.com/blog/visual-generative-ai-cvpr-research/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

05/05/2026

NBC Sports' NBA Playoff Viewership Up 58%

Share Copy link Facebook X Linkedin Bluesky Email...

05/05/2026

U.S. Court Upholds Some Patents in LG ATSC 3.0 Infringement Case

Share Copy link Facebook X Linkedin Bluesky Email...

05/05/2026

Gray Media and Allen Media Group Close Station Transactions

Share Copy link Facebook X Linkedin Bluesky Email...

05/05/2026

Digital Domain Welcomes Award-Nominated VFX Supervisor Jelmer Boskma

Digital Domain Welcomes Award-Nominated VFX Supervisor Jelmer Boskma Brie Clayton May 4, 2026 0 Comments Digital Domain, a global leader in visual eff...

04/05/2026

just:play pro 2026 and just:live pro 2026 are available to download!

just:play pro 2026 and just:live pro 2026 are available to download! More Details:At NAB 2026, ToolsOnAir showcased just:play pro 2026 and just:live pro 2026, ...

04/05/2026

just:in mac pro 2026 - The Next Level of Professional Recording on macOS

just:in mac pro 2026 - The Next Level of Professional Recording on macOS More Details:The headline innovation in just:in mac pro 2026 is the new Auto format si...

04/05/2026

SVG Sit-Down: NEP Americas Mike Werteen on How Great Tech, Better People Drive Success

Hardware is still an emphasis - Supershooter 11 is new, and REMI-based 65 is in ...

04/05/2026

Beyond 90 Minutes: How K League's Soccer Blueprint for Growth Has Lessons for Everyone

Head of International Business Development Min Joo Kim explores the league's...

04/05/2026

Audio-Technica ATND1061 and ATUC Discussion Systems Certified for Crestron Automate VX

Audio-Technica has announced that its ATND1061 ceiling array microphone and ATUC...

04/05/2026

Triple B Media Launches Bowling TV, a 24/7 FAST Channel Dedicated to Bowling

Triple B Media has launched Bowling TV, a free ad-supported television (FAST) channel dedicated to bowling. The channel is available on Prime Video, LG Channels...

04/05/2026

PlayMetrics Acquires SportsEngine from Versant

PlayMetrics, a provider of operations management software for youth sports organizations, has announced the completion of its acquisition of substantially all t...

04/05/2026

IHSE GmbH Appoints Dr. Thomas Niessen as CEO

IHSE GmbH has announced that Dr. Thomas Niessen has joined as CEO and Managing Director, effective May 1, 2026. He joins Frank Breitenfelder, who has served as ...

04/05/2026

PMY Group Deploys Optic Crowd Intelligence Platform at Australian Formula 1 Grand Prix

PMY Group deployed its AI-powered crowd intelligence platform, Optic, at the For...

04/05/2026

Behind The Mic: Stephen A. Smith and Skip Bayless to Reunite for First Take Episode; Donna Brothers Worked Final Kentucky Derby

Behind The Mic provides a roundup of recent news regarding on-air talent, includ...

04/05/2026

Spotify Brings Fashion and Podcasting Together With Mina Le and Mia Calabrese

Last week, guests gathered in New York City for On Air, In Style: An Evening with Spotify-a night of conversation, culture, and connection celebrating the inter...

04/05/2026

Avid introduce Pro Tools 2026.4

New music & post-production features added Avid's latest DAW update delivers an array of helpful features aimed at both music and post-production users,...

04/05/2026

SAG-AFTRA, Studios Reach Tentative Agreement

Share Copy link Facebook X Linkedin Bluesky Email...

04/05/2026

Study: Paramount-WBD Deal Signals New Era of Streaming Scale

Share Copy link Facebook X Linkedin Bluesky Email...

04/05/2026

Student Spotlight: Joshua Griffin

Student Spotlight: Joshua Griffin The New Orleans native, who was named the 2026 student commencement speaker for Boston Conservatory at Berklee, talks about ...

04/05/2026

It's Andrew! stomps onto screens this June

It's Andrew! stomps onto screens this June 4 May 2026 The ABC and Screen Australia are delighted to announce that brand new preschool series, It's Andr...

03/05/2026

Melbourne Instruments' Nina gains Braids engine

Polysynth now features Mutable Instruments' macro oscillators Melbourne Instruments have just released a free firmware update that brings the engine beh...

03/05/2026

Introducing the new Mistika Workflows Suite: transformative and cost-effective for every user

Introducing the new Mistika Workflows Suite: transformative and cost-effective f...

03/05/2026

Introducing the new Mistake Workflows Suite: transformative and cost-effective for every user

Introducing the new Mistake Workflows Suite: transformative and cost-effective f...

03/05/2026

Filming begins on the third and final season of Breathless

Back to All News Filming begins on the third and final season of Breathless Entertainment 03 May 2026 GlobalSpain Link copied to clipboard Discover the vi...

02/05/2026

Release Rundown: What to Watch in May, From Saccharine to Tuner

(L-R) Dustin Hoffman and Leo Woodall appear in Tuner by Daniel Roher, an official selection of the 2026 Sundance Film Festival. (Photo courtesy of Sundance In...

02/05/2026

Warm Audio launch the Reamper

Versatile re-amping tool announced Warm Audio are best known for their recreations of sought-after vintage studio gear, but their latest release brings a ne...

02/05/2026

FCC Releases Tentative Agenda for May Open Meeting

Share Copy link Facebook X Linkedin Bluesky Email...

02/05/2026

Sinclair Remains Bullish on Station M&A

Share Copy link Facebook X Linkedin Bluesky Email...

02/05/2026

NABLF Announces 2026 Broadcast Leadership Training Award Winners

Share Copy link Facebook X Linkedin Bluesky Email...

02/05/2026

Gravity Media Taps Custom Consoles for Work on Production Center

Share Copy link Facebook X Linkedin Bluesky Email...

02/05/2026

May 01, 2026

Scripps Research immunologist Dennis Burton elected to American Academy of Arts and Sciences A leader in broadly neutralizing antibodies, Burton has helped driv...

02/05/2026

Dalet Flex LTS Delivers Smarter Search, Faster Editing, and an AI-Ready Foundation for Modern Media

Dalet, a leading technology and service provider for media-rich organizations, t...

01/05/2026

Ratings Roundup: NBA Playoffs Return to NBC Sports up 38%; NFL Draft Down 12% Overall From 2025

Ratings Roundup is a rundown of recent rating news and is derived from press rel...

01/05/2026

BKB Bare Knuckle Boxing Appoints Will Wright as Chief Operating Officer to Drive Global Growth and Operational Excellence

BKB Bare Knuckle Boxing ( BKB ), today announced the appointment of Will Wright ...

01/05/2026

NAB Rewind: Lawo's Andreas Hilmer on the Power of the Edge One AV Stagebox

Lawo has been at the center of the industry's transition to IP and other next-generation technologies. At NAB 2026, its story was the Edge One AV stagebox, ...

01/05/2026

Kentucky Derby 152 to Air Across 19 Networks in 170-Plus Territories

HBA Media, acting on behalf of NBC Sports and Churchill Downs Incorporated, has announced broadcast and streaming distribution for Kentucky Derby 152, taking pl...

01/05/2026

Give Me the Backstory: Get to Know Barbara Kopple, the Director of American Dream

By Bailey Pennick One of the most exciting things about the Sundance Film Festi...

01/05/2026

Find Out Which The Devil Wears Prada 2' Character You Are With Our New Playlist

Florals for spring? Groundbreaking. But a playlist that tells you which The Devi...

01/05/2026

Olivia Rodrigo Takes Over FC Barcelona Jersey for El Clsico Match at Spotify Camp Nou

One of the world's biggest popstars is headed to El Cl sico. Later this mont...

01/05/2026

Heritage Audio announce the Baby RAM Black Edition

Limited-edition model celebrates 15th anniversary Heritage Audio's range of monitor controllers has just gained a new member, the Baby RAM Black Edition...

01/05/2026

Universal Audio release UAD Enigmatic '82 Overdrive Special Amp

Dumble recreation now available as UAD plug-in Along with their renowned processing plug-ins, Universal Audio have been steadily introducing emulations of c...

01/05/2026

UPDATED: Republican AGs Join Nexstar-Tegna Antitrust Suit

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Broadcaster Draper Media Names Bill Vernon President

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Analysts: 'Hollywood's Vertical Video Strategy Is Dead Wrong'

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Lightware UK celebrates new London showroom with launch e...

To celebrate the opening of its new showroom and office, Lightware UK hosted a dedicated launch event at the new London location. The event welcomed partners, c...

01/05/2026

Calrec Puts Broadcaster Choice Centre Stage at MPTS 2026

Choice without compromise The broadcast industrys transformation is accelerating, and traditional broadcasters are having to fundamentally reinvent how they o...