Sony Pixel Power calrec Sony

Seamless in Seattle: NVIDIA Research Showcases Advancements in Visual Generative AI at CVPR

17/06/2024

NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.

More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers - one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles - are finalists for CVPR's Best Paper Awards.

NVIDIA is also the winner of the CVPR Autonomous Grand Challenge's End-to-End Driving at Scale track - a significant milestone that demonstrates the company's use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR's Innovation Award.

NVIDIA's research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.

Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.

Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement, said Jan Kautz, vice president of learning and perception research at NVIDIA. At CVPR, NVIDIA Research is sharing how we're pushing the boundaries of what's possible - from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.

At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.

Forget Fine-Tuning: JeDi Simplifies Custom Image Generation Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind - they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.

Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning - where a user trains the model on a custom dataset - but the process can be time-consuming and inaccessible for general users.

JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.

JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand's product catalog.

https://blogs.nvidia.com/wp-content/uploads/2024/06/JeDi-cow-sculpture.mp4

New Foundation Model Perfects the Pose NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.

The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.

FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.

NeRFDeformer Transforms 3D Scenes With a Single Snapshot A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed - or remake the NeRF entirely.

Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.

VILA Visual Language Model Gets the Picture A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.

The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA's unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.

VILA can understand memes and reason based on multiple images or video frames. The VILA model fa
LINK: https://blogs.nvidia.com/blog/visual-generative-ai-cvpr-research/...
See more stories from nvidia

North America Stories

06/12/2025

L3Harris Chair and CEO Appears on CNBC at Reagan National Defense Forum

In a live broadcast from the Reagan National Defense Forum, L3Harris Chair and CEO Christopher Kubasik joined Morgan Brennan on CNBCs Closing Bell: Overtime. Ku...

06/12/2025

Survey: M&E Embraces Horizontally Integrated Media Archiving Approach

FORT LAUDERDALE, Fla. A new survey from Pixitmedia by Datacore revealed a major shift in the Media & Entertainment industry in media archiving, with 85% of resp...

06/12/2025

Czech TV Deploys LiveU Solutions in 10 OB Vans

HACKENSACK, N.J. LiveU has announced that the national public broadcaster Czech Television has completed one of the largest LiveU live production deployments fo...

06/12/2025

NATAS Celebrates 76th Technology & Engineering Emmy Award Honorees

NEW YORK The National Academy of Television Arts and Sciences (NATAS) presented the Excellence in Production Technology Emmy Award to NASA+ and Dr. Tom Leight...

05/12/2025

2025 Sports Broadcasting Hall of Fame: Curt Gowdy Jr. - Master Storyteller, Nationally and Regionally

2025 Sports Broadcasting Hall of Fame: Curt Gowdy Jr. - Master Storyteller, Nati...

05/12/2025

SVG Sit-Down: Veritone's Sean King on the Power of Mining Video, Audio Data

SVG Sit-Down: Veritone's Sean King on the Power of Mining Video, Audio DataThe company's Data Refinery offers users total control and governance over da...

05/12/2025

Platinum White Paper: Inside the Nashville Predators' Unified, Flexible, Scalable Production System with Ross Video

Platinum White Paper: Inside the Nashville Predators' Unified, Flexible, Sca...

05/12/2025

Netflix Reaches Agreement To Acquire Warner Bros. Following Planned WBD Split

Netflix Reaches Agreement To Acquire Warner Bros. Following Planned WBD SplitThe deal does not include WBDs sports assets like TNT Sports (US, UK, LatAm), Euros...

05/12/2025

FOX Sports Returns to Indianapolis for Primetime Broadcast of Big Ten Championship

FOX Sports Returns to Indianapolis for Primetime Broadcast of Big Ten Championsh...

05/12/2025

SVG Summit 2025 Preview: Digital Engagement & Monetization Workshop Tackles the Future of the Viewer Experience

SVG Summit 2025 Preview: Digital Engagement & Monetization Workshop Tackles the ...

05/12/2025

Atlanta United Lights Up New Emory Healthcare Studio With First Live Broadcast for World Cup Draw

Atlanta United Lights Up New Emory Healthcare Studio With First Live Broadcast f...

05/12/2025

As Messi Takes the Pitch, MLS, Apple, NEP Roll Out Largest MLS Cup Production Ever

As Messi Takes the Pitch, MLS, Apple, NEP Roll Out Largest MLS Cup Production Ev...

05/12/2025

ESPN Enters College Football's Most Intense Month With Elevated Workflows for Championship Week

ESPN Enters College Football's Most Intense Month With Elevated Workflows fo...

05/12/2025

Sorry, Baby, Peter Hujar's Day, Among Sundance Institute-Supported Projects Nominated for 2026 Film Independent Spirit Awards

It's about that time! Awards season is in full swing, and the Film Independe...

05/12/2025

Netflix to Acquire Warner Bros. in Deal Worth $82.7 Billon

LOS ANGELES Netflix announced it has entered into an agreement to acquire the assets of Warner Bros. for $82.7 billion....

05/12/2025

Gracenote Launches New CTV Ad Platform

NEW YORK Nielsens Gracenote has launched Gracenote Content Connect, a new ad platform that provides agencies, brands, supply-side platforms (SSPs) and demand-si...

05/12/2025

IAB Tech Lab Releases Deals API

NEW YORK In an most important update to the workings of deal-based programmatic advertising, IAB Tech Lab has released version 1.0 of its Deals API for public c...

05/12/2025

Nielsen: NFL Thanksgiving Games Score Big Audiences

NEW YORK Pass the turkey. Pass the stuffing. Pass the cranberry sauce. All are common requests of Americans celebrating Thanksgiving Day with family and f...

05/12/2025

Iris Cloud-Connected Camera Control Platform Now Available

NEW YORK Iris, the new cloud-connected camera control platform, has officially launched with features that turn virtually any PTZ camera into a software-connect...

05/12/2025

Netflix to Acquire Warner Bros. in Deal Worth $82.7B

HOLLYWOOD, Calif. Netflix announced today that it has entered into an agreement to acquire the assets of Warner Bros. for $82.7 billion....

05/12/2025

Iris Cloud-Connected Camera Control Platform Is Now Available

NEW YORK Iris, the new cloud-connected camera control platform, has officially launched with features that turn virtually any PTZ camera into a software-connect...

05/12/2025

FCC Approves AT&T's $1 Billion Acquisition of UScellular Spectrum

WASHINGTON The Federal Communications Commission has approved AT&T's $1.02 billion acquisition of spectrum from UScellular in a decision that was issued sho...

05/12/2025

The Best Coldplay Songs: 21 Tracks That Shoot for the Stars

The Best Coldplay Songs: 21 Tracks That Shoot for the Stars From Yellow to Viva La Vida, Fix You to Paradise, this playlist goes back to the start. December ...

05/12/2025

Zafris Lecture Series Brings Nabil Ayers to Berklee

Zafris Lecture Series Brings Nabil Ayers to Berklee The 32nd annual James G. Zafris Distinguished Lecture series was held on Thursday, November 13, with guest...

05/12/2025

Netflix Unveils the First Look for Land of Sin' - Premiering on January 2, 2026

Back to All News Netflix Unveils the First Look for Land of Sin' - Premier...

05/12/2025

Introducing New Perks to Help You Get Even More from...

Introducing New Perks to Help You Get Even More from LinkedIn Premium Published on Dec 5, 2025 Categories: Company News, Product News LinkedIn Corporate Co...

05/12/2025

Don Lee, Lee Jin-uk, and Lalisa Manobal to Star in Netflix Action Thriller 'TYGO'

Back to All News Don Lee, Lee Jin-uk, and Lalisa Manobal to Star in Netflix Act...

04/12/2025

SVG Sit-Down: ProximaVision's Claudio Lisman on Why Tethered Drones Could Be a Game-Changer for Live Sports Production

SVG Sit-Down: ProximaVision's Claudio Lisman on Why Tethered Drones Could Be...

04/12/2025

SVG Campus Shot Callers: Imry Halevi, Senior Associate Director of Athletics, Content & Strategic Communications, Harvard University

SVG Campus Shot Callers: Imry Halevi, Senior Associate Director of Athletics, Co...

04/12/2025

Platinum White Paper: LiveU Lightweight Sports Production: A Step Change in Sports Storytelling

Platinum White Paper: LiveU Lightweight Sports Production: A Step Change in Spor...

04/12/2025

London to Riyadh: DAZN Brings the Boxing Glamour to New Production Levels for Benavidez v Yarde in Saudi Arabia

London to Riyadh: DAZN brings the boxing glamour to new production levels for Be...

04/12/2025

Analysis: Paramount Bets on the Battering Ram' with Champions League Play

Analysis: Paramount bets on the battering ram' with Champions League play By Callum McCarthy, Editor-at-Large Tuesday, December 2, 2025 - 10:12 Print ...

04/12/2025

Space City Home Network Launches SCHN+ DTC App for Astros and Rockets

Space City Home Network Launches SCHN DTC App for Astros and RocketsThe Rockets and Astros were previously the lone NBA and MLB teams without a DTC appBy Jason...

04/12/2025

SVG Summit 2025 Preview: Content Workflows Workshop Spotlights Evolution of Sports Media Supply Chain

SVG Summit 2025 Preview: Content Workflows Workshop Spotlights Evolution of Spor...

04/12/2025

New Sponsor Spotlight: Geotech's Patrick Wambold On the Unreal Engine Revolution Taking Place in Sports Broadcasting

New Sponsor Spotlight: Geotech's Patrick Wambold On the Unreal Engine Revolu...

04/12/2025

Curt Gowdy Jr. - Master Storyteller, Nationally and Regionally

Curt Gowdy Jr. - Master Storyteller, Nationally and RegionallyBy Jason Dachman, Editorial Director, U.S. Thursday, December 4, 2025 - 1:52 pm Print This Sto...

04/12/2025

Cutting Through Rocks ( ) Shows the Difference That One Person Can Make for Change

(L-R) Rebecca Lichtenfeld, Mohammadreza Eyni, Sara Khaki, and Judith Helfand att...

04/12/2025

L3Harris Supports NOAA's Million Mile Journey to Safeguard Earth from Solar Storms

Coronal mass ejections caused by eruptions on the surface of the sun can have fa...

04/12/2025

Gracenote launches new CTV ad platform making program-level targeting a reality

Gracenote Content Connect enables media ecosystem to precisely align ad campaigns and programming based on rich content signals NEW YORK - December 4, 2025 - N...

04/12/2025

Lightware in 2025 - Celebrating a successful year of inno...

Lightware, a global specialist in AV connectivity, is looking back on a year defined by new advancements, strong collaboration and continued growth. Across the ...

04/12/2025

Riedel and Haivision Join Forces to Advance Wireless Vide...

Riedel Communications today announced a new partnership with Haivision, a leading global provider of mission-critical, real-time video networking and visual col...

04/12/2025

Harmonic and Normann Engineering Achieve Major Milestone...

Harmonic (NASDAQ: HLIT) and Normann Engineering today announced a major milestone in their strategic collaboration, celebrating 20 successful broadband deployme...

04/12/2025

Foundry introduces Multi-Paint support for Mari 7-5 devel...

Creative software developer Foundry today announced Mari 7.5, the latest iteration of its artist-friendly paint toolset that can handle large, detailed assets w...

04/12/2025

Professional Wireless Systems PWS Manages Over 1000 Wirel...

Professional Wireless Systems (PWS), a leading provider of wireless audio solutions and RF management, was on site at Dreamforce 2025 in San Francisco providing...

04/12/2025

Lionsgate and Debmar-Mercury partner with LTN to power di...

LTN's purpose-built IP video network brings all-movie diginet to over 100 stations and streaming platforms in just three months while eliminating satellite ...

04/12/2025

Bitmovin and ThinkAnalytics Partner to Deliver Intelligen...

Bitmovin, the leading provider of video streaming solutions, today announced a strategic partnership with ThinkAnalytics, the global leader in AI-powered data a...

04/12/2025

The HELM and Keslow Camera join forces to launch Keslow L...

The HELM, a global expert in cinematic live broadcast and high-end production workflows, has signed a partnership agreement with Keslow Camera, one of North Ame...

04/12/2025

LiveU Pushes Creative Boundaries at ISE 2026 Powering Ric...

At ISE 2026, LiveU will showcase its expanded IP-video EcoSystem, enabling broadcasters, sports, production companies and pro-AV professionals to share their st...

04/12/2025

Broadcasters See More Potential in Programmatic Advertising

Since the beginning of commercial television, advertising has been a key part of broadcasting. Over the years, the technology for inserting ads into programs ha...