Sony Pixel Power calrec Sony

Seamless in Seattle: NVIDIA Research Showcases Advancements in Visual Generative AI at CVPR

17/06/2024

NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.

More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers - one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles - are finalists for CVPR's Best Paper Awards.

NVIDIA is also the winner of the CVPR Autonomous Grand Challenge's End-to-End Driving at Scale track - a significant milestone that demonstrates the company's use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR's Innovation Award.

NVIDIA's research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.

Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.

Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement, said Jan Kautz, vice president of learning and perception research at NVIDIA. At CVPR, NVIDIA Research is sharing how we're pushing the boundaries of what's possible - from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.

At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.

Forget Fine-Tuning: JeDi Simplifies Custom Image Generation Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind - they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.

Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning - where a user trains the model on a custom dataset - but the process can be time-consuming and inaccessible for general users.

JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.

JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand's product catalog.

https://blogs.nvidia.com/wp-content/uploads/2024/06/JeDi-cow-sculpture.mp4

New Foundation Model Perfects the Pose NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.

The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.

FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.

NeRFDeformer Transforms 3D Scenes With a Single Snapshot A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed - or remake the NeRF entirely.

Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.

VILA Visual Language Model Gets the Picture A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.

The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA's unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.

VILA can understand memes and reason based on multiple images or video frames. The VILA model fa
LINK: https://blogs.nvidia.com/blog/visual-generative-ai-cvpr-research/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

02/05/2026

Dalet Flex LTS Delivers Smarter Search, Faster Editing, and an AI-Ready Foundation for Modern Media

Dalet, a leading technology and service provider for media-rich organizations, t...

01/05/2026

NBCUniversal's Peacock to Be First Streamer to Integrate Dolby's Full Suite of Premium Picture and Sound Innovations

January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...

10/04/2026

Portland Fire+ Streaming Platform Launches

Share Copy link Facebook X Linkedin Bluesky Email...

10/04/2026

Tod Musgrave Joins Proton as U.S. Sales & Marketing Director

Share Copy link Facebook X Linkedin Bluesky Email...

10/04/2026

Proton Expands Minicam Portfolio With Proton Pro At 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

10/04/2026

FCC To Vote on Changes to Audible Crawl Rule

Share Copy link Facebook X Linkedin Bluesky Email...

10/04/2026

Frequency Launches AI Platform for Streaming Television a...

Frequency, the engine behind the worlds leading streaming television channels, today launched its AI platform for Frequency Studio, powering the entire channel ...

09/04/2026

Yospace surpasses 10 billion ads stitched in a single month, as ad-supported streaming surges

Staines-upon-Thames, UK, 09, April, 2026 - Yospace, the trusted leader in Dynam...

09/04/2026

just:play pro 2026 and just:live pro 2026 Sneak Preview News for NAB 2026

just:play pro 2026 and just:live pro 2026 Sneak Preview News for NAB 2026 More Details:At NAB 2026, ToolsOnAir will showcase just:play pro 2026 and just:live p...

09/04/2026

just:in mac pro 2026 - The Next Level of Professional Recording on macOS at NAB 2026

just:in mac pro 2026 - The Next Level of Professional Recording on macOS at NAB ...

09/04/2026

NAB 2026: Zixi to Demonstrate Live Video Workflows and Satellite Replacement

Zixi will demonstrate IP-based live video workflow solutions at NAB Show 2026 (Booth W2057). The industry is moving quickly toward IP-based distribution as br...

09/04/2026

Deloitte Research: Women's Elite Sports Revenues Expected to Reach at Least $3 Billion in 2026

Global women's elite sports revenues are expected to reach at least $3 billi...

09/04/2026

Monitor Engineer Gavin Tempany Mixes Kylie Minogue's Tension Tour on Solid State Logic L550 Plus

Monitor engineer Gavin Tempany mixed Kylie Minogue s Tension Tour on a Solid Sta...

09/04/2026

NAB 2026: KOKUSAI DENKI Electric America to Debut New 4K Camera and Remote Control Panel

KOKUSAI DENKI Electric America will exhibit at NAB Show 2026 (Booth C5507), debu...

09/04/2026

NBC Sports Reviews Innovations and Milestones from Its 2025-26 NBA Regular Season

With the 2025-26 NBA regular season concluded and the playoffs beginning next we...

09/04/2026

NAB 2026: Telestream and Mimir Announce Integration for Ingest-to-Editorial Workflows

Telestream and Mimir have announced an integration connecting Telestream's V...

09/04/2026

NAB 2026: Bitmovin Expands Live Encoding and Observability Solutions for End-to-End Live Streaming Monitoring

Bitmovin has expanded its Live Encoding and Observability solutions to provide r...

09/04/2026

Nashville Predators and Scripps Sports Announce Multi-Year Broadcast Agreement

The Nashville Predators and Scripps Sports have announced a multi-year media rights agreement covering local preseason, regular season, and first-round playoff ...

09/04/2026

ASG Partners with Beam Dynamics for Asset Intelligence Platform

Advanced Systems Group, LLC has announced a partnership with Beam Dynamics to offer the Beam Asset and License Intelligence Platform to its clients. The platfor...

09/04/2026

NAB 2026: Lawo Introduces Edge One Converged Video and Audio Stagebox

Lawo has unveiled Edge One, a combined video and audio stagebox for broadcast and Pro AV workflows. The device will be on display at NAB Show (Booth C2108, Apri...

09/04/2026

NAB 2026: SMPTE to Host ST 2110 IP Media Roadshow

The Society of Motion Picture and Television Engineers (SMPTE) will host the SMPTE ST 2110 IP Media Roadshow on Tuesday, April 21, 2026, at the Las Vegas Conven...

09/04/2026

Atlanta Braves Upgrade Video Displays at Truist Park

The Atlanta Braves have completed upgrades to video displays in and around Truist Park ahead of the 2026 MLB season. The upgrades include the Delta Out-of-Town ...

09/04/2026

USC Installs Daktronics LED Displays Across Four Athletics Venues

The University of Southern California has contracted Daktronics (NASDAQ: DAKT) of Brookings, South Dakota, to manufacture and install 22 LED displays across fou...

09/04/2026

NAB 2026: Backlight to Showcase Iconik and Wildmoka Integration

Backlight, the media technology company behind Iconik and Wildmoka, will showcase its Creative Operations Platform at NAB Show 2026 (Booth N2829, April 19-22). ...

09/04/2026

MotoAmerica Superbike to Air on VICE TV for 2026 Season

MotoAmerica and V10 Entertainment have announced a partnership to broadcast MotoAmerica Superbike racing on VICE TV for the 2026 season. Coverage begins live on...

09/04/2026

Proton Camera Innovations Appoints Tod Musgrave as US Sales and Marketing Director

Proton Camera Innovations has announced the appointment of Tod Musgrave as US Sa...

09/04/2026

Former UEFA, Orange Executive Nicolas Dal Launches OVERCAST Private-Cloud Production Service

Designed specifically for live sports broadcasting, new platform features IP-nat...

09/04/2026

NEWstalgia: How the Return of the NBA on NBC Was Driven by a Bold and Ownable' Graphics Package

Blending 1990s DNA, modern motion theory, and a distinctly colorful brand identi...

09/04/2026

SVG Sit-Down: Christy Media's Amy Vacher on What It Takes To Find the Best Person for the Job

Technical capability is essential, but long-term success often depends on how we...

09/04/2026

Sundance Film Festival: CDMX 2026 by Cinpolis Unveils Official Program for Its Third Edition

15 feature films, including fiction and documentaries, along with six short film...

09/04/2026

Spotify Introduces New Video Controls for Listeners

Spotify has always been about putting listeners in the driver's seat. Today, people don't just want more ways to spend their time; they want that time t...

09/04/2026

New Spotify Video Controls Put Families and Parents in Charge

Our Chief Public Affairs Officer Dustee Jenkins shares how we're building a more positive experience for families on Spotify. As Spotify's Chief Public...

09/04/2026

Get Festival-Ready With These 4 Spotify Features

Festival season is upon us. From sun-soaked weekends out west to iconic stages in Chicago and New York, fans are getting ready to see their favorite artists liv...

09/04/2026

Spotify Introduz Novos Controles de Vdeo para Ouvintes

O Spotify sempre teve como foco colocar os ouvintes no controle. Hoje, as pessoas n o querem apenas mais formas de passar o tempo - elas querem que esse tempo s...

09/04/2026

Novos controles de vdeo do Spotify colocam pais e famlias no comando

Read the original note in English here. Nossa Chief Public Affairs Officer, Dustee Jenkins, compartilha como estamos construindo uma experi ncia mais positiva ...

09/04/2026

Spaces from Smokestack Sounds

New synth focuses on sci-fi scoring Following their formation and debut releases in December 2025, Smokestack Sounds - the brainchild of composer and produc...

09/04/2026

Reason Studios preview Reason 14

Latest version set for May 2026 launch Reason Studios have revealed that the latest version of their DAW software will be launching in May 2026. Currently a...

09/04/2026

Shy Audio introduce EQT-1M

New EQ aimed at mix-bus & mastering duties Shy Audio's first two releases focused on the past, delivering recreations of the budget mixers that were com...

09/04/2026

PBS' The Forsytes' Puts a Glamorous New Spin on the Beloved Family Drama as the series premieres in the US.

Based on John Galsworthy's novels known collectively as The Forsyte Saga a...

09/04/2026

The Forsytes' Renewed For Season 3 At PBS Masterpiece

The Forsytes has been renewed for a third season before the period drama has even premiered on PBS Masterpiece. The adaptation of John Galsworthy's novel, ...

09/04/2026

BBC brings Danny Robins The Witch Farm to the screen Inspired by the hit podcast of the same name, filming begins soon

IThe BBC has commissioned new drama The Witch Farm, starring Gabrielle Creevy (T...

09/04/2026

NAB 2026 Major Announcement

The next big thing To help broadcasters fully embrace dynamic hybrid workflows, Calrec will make a major announcement that unlocks even more freedom for broadca...

09/04/2026

ENCO to Showcase New aiTrack Capabilities at 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

09/04/2026

LTN Unveils Network Enhancements in Advance of C-Band Changes

Share Copy link Facebook X Linkedin Bluesky Email...

09/04/2026

FOR-A Buys Tamura Corp. Information Equipment Business

Share Copy link Facebook X Linkedin Bluesky Email...

09/04/2026

Imagine Showcases Expanded Multiviewer Portfolio at 2026...

Purpose Built Monitoring From Live Production to Master Control to OTT, Across On Prem and Cloud Environments At the 2026 NAB Show (April 19-22, Las Vegas Con...