Sony Pixel Power calrec Sony

Seamless in Seattle: NVIDIA Research Showcases Advancements in Visual Generative AI at CVPR

17/06/2024

NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.

More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers - one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles - are finalists for CVPR's Best Paper Awards.

NVIDIA is also the winner of the CVPR Autonomous Grand Challenge's End-to-End Driving at Scale track - a significant milestone that demonstrates the company's use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR's Innovation Award.

NVIDIA's research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.

Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.

Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement, said Jan Kautz, vice president of learning and perception research at NVIDIA. At CVPR, NVIDIA Research is sharing how we're pushing the boundaries of what's possible - from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.

At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.

Forget Fine-Tuning: JeDi Simplifies Custom Image Generation Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind - they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.

Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning - where a user trains the model on a custom dataset - but the process can be time-consuming and inaccessible for general users.

JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.

JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand's product catalog.

https://blogs.nvidia.com/wp-content/uploads/2024/06/JeDi-cow-sculpture.mp4

New Foundation Model Perfects the Pose NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.

The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.

FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.

NeRFDeformer Transforms 3D Scenes With a Single Snapshot A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed - or remake the NeRF entirely.

Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.

VILA Visual Language Model Gets the Picture A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.

The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA's unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.

VILA can understand memes and reason based on multiple images or video frames. The VILA model fa
LINK: https://blogs.nvidia.com/blog/visual-generative-ai-cvpr-research/...
See more stories from nvidia

North America Stories

21/10/2025

Globecast Appoints G Morgan as EVP of Sales for Globecast...

Globecast, the leading provider of broadcast, media and entertainment managed services, has announced the appointment of G Morgan as Executive Vice President of...

21/10/2025

Visual Data Strengthens Global Leadership with Appointmen...

Visual Data announces the appointment of Maz Al-Jumaili as Senior Vice President, Worldwide Localization, to advance client engagement, strategic partnerships, ...

21/10/2025

Grup Mediapro to Deliver Prime Video NBA Coverage to the Hispanic Market

As Amazon's Prime Video prepares to launch its coverage of NBA basketball under a major new deal, Grup Mediapro has announced that it is working with the st...

21/10/2025

ADTH to Upgrade NextGen TV Receivers With Gateway Capabilities

ATLANTA Good news for consumers using an Atlanta DTH receiver to watch ATSC 3.0: with a new software update, they will be able to blanket their homes with Wi-Fi...

21/10/2025

Study: 'More Critical Than Ever for Brands to Focus on Hispanics

While recent news has been heavily focused on Hispanic migration into the U.S., The 2025 Hispanic Market Report from Claritas highlights the fact that this gr...

21/10/2025

RWS Appoints Michael Wayne as Head of Media and Entertainment

MAIDENHEAD, UK RWS has hired Michael Wayne as its head of media and entertainment in Los Angeles where he will lead the company's media localization busines...

21/10/2025

Imagine Communications Acquires Pixel Power From Rohde an...

Imagine Communications and Rohde & Schwarz today announced a definitive agreement under which Imagine will acquire Pixel Power Limited, a wholly owned subsidiar...

21/10/2025

ADTH Announces New NEXTGEN TV Gateway Receiver Implementi...

Atlanta DTH (ADTH) today announced a major update that will expand the functionality of its NEXTGEN TV receiver by enabling gateway capabilities allowing viewer...

21/10/2025

Heartland Video Systems Partners with Zixi for Resilient...

Heartland Video Systems, Inc. (HVS), a premier video systems integration, consulting, and expert ATSC 3.0 implementation firm announces that it has partnered wi...

21/10/2025

QuickLink Appoints Austin Hinton as Solutions Enablement...

QuickLink, the leading provider of award-winning video production and remote guest integration solutions, today announced the appointment of Austin Hinton as it...

21/10/2025

Miri Technologies to Unveil New Resilient Internet Platfo...

nternet connectivity startup Miri Technologies Inc. will use this week's NAB Show New York as the launch pad for its latest ground-breaking innovation, the ...

20/10/2025

Inside TAMS: How Time-Addressable Media Stores Could Redefine Sports Workflows

Inside TAMS: How Time-Addressable Media Stores could redefine sports workflows By Paul Markham Friday, October 17, 2025 - 08:57 Print This Story A penalty...

20/10/2025

Transformational Production: Inside TVN's Remote Production Push for the DFL's Bundesliga 2

Transformational production: Inside TVN's remote production push for the DFL...

20/10/2025

How NBC Sports Transitioned Stamford Facility to One Format: 1080p HDR

How NBC Sports Transitioned Stamford Facility to One Format: 1080p HDRMulti-year plan harmonizes workflows, simplifies operationsBy Ken Kerschbaumer, Editorial ...

20/10/2025

NBA on NBC' Studio Production Team Is Ready for Tip-Off With Coast-to-Coast Tuesday'

NBA on NBC' Studio Production Team Is Ready for Tip-Off With Coast-to-Coast...

20/10/2025

Under Pressure: TVN CEO Markus Osthaus Considers the German Sports Broadcasting Market

Under pressure: TVN CEO Markus Osthaus considers the German sports broadcasting ...

20/10/2025

Carmen Emmi's Plainclothes Evokes the Rawness and Sensuality of New Queer Cinema

(L-R) Maria Dizzia, Carmen Emmi, and Russell Tovey attend the Plainclothes pre...

20/10/2025

The Republic of Korea Selects L3Harris for Airborne Early Warning and Control Aircraft Program

Airborne Early Warning and Control aircraft rendering...

20/10/2025

Imagine Communications Acquires Pixel Power From Rohde & Schwarz

DENVER and MUNICH Imagine Communications today announced its plans to acquire Pixel Power Ltd., a wholly owned subsidiary of Rohde & Schwarz. Financial terms of...

20/10/2025

Globecast Appoints G Morgan as EVP of Sales, Globecast Americas

LOS ANGELES G Morgan has joined Globecast, a provider of broadcast, media and entertainment managed services, as executive vice president of sales, Globecast Am...

20/10/2025

Heartland Video Systems, Zixi Partner on IP Networking for Broadcasters

PLYMOUTH, Wisc. Heartland Video Systems and Zixi have partnered to enable broadcast-quality live video delivery over any IP network....

20/10/2025

A. R. Rahman on Facing Fear and Finding the Divine

A. R. Rahman on Facing Fear and Finding the Divine In an interview with Berklee President Jim Lucchese, the Oscar-winning composer reflects on how courage and...

20/10/2025

NVIDIA and Google Cloud Accelerate Enterprise AI and Industrial Digitalization

NVIDIA and Google Cloud are expanding access to accelerated computing to transform the full spectrum of enterprise workloads, from visual computing to agentic a...

19/10/2025

Sins of Kujo' Comes to Life in New Live-Action Series Set for Spring 2026

Back to All News Sins of Kujo' Comes to Life in New Live-Action Series Set for Spring 2026 Entertainment 19 October 2025 GlobalJapan Link copied to cl...

18/10/2025

NESN Taps Harmonic for Primary Live Sports Distribution

New England Sports Network (NESN) has chosen Harmonic, working with Astound Business Solutions, as its enterprise technology partner to transform primary distri...

18/10/2025

DirecTV Launches Gray's Gulf Coast Sports & Entertainment Network

NEW ORLEANS, La. In the run-up to the start of the NBA season, WVUE-TV and Gray Local Media have announced a deal with DirecTV that will greatly expand access t...

18/10/2025

Berklee Celebrates 40 Years of the Fall Together Concert

Berklee Celebrates 40 Years of the Fall Together Concert Faculty composers Bob Pilkington and Greg Hopkins are among the featured artists for this year's ...

17/10/2025

NEP Group Receives New Equity Investment From 26North Partners LP, Co-Investors

NEP Group Receives New Equity Investment From 26North Partners LP, Co-InvestorsCarlyle remains the largest shareholder as the company prepares for the futureBy ...

17/10/2025

Apple Lands Five-Year Deal for F1 Distribution in the U.S.

Apple Lands Five-Year Deal for F1 Distribution in the U.S.Besides airing on Apple TV, the sport will be amplified on other Apple servicesBy Ken Kerschbaumer, Ed...

17/10/2025

SVG Sit-Down: Marshall Electronics' Bernie Keach on the Future of PTZ Cameras

SVG Sit-Down: Marshall Electronics' Bernie Keach on the Future of PTZ Camera...

17/10/2025

L2 Productions' REMI Facility in Austin Can Produce Content From Anywhere

L2 Productions' REMI Facility in Austin Can Produce Content From AnywhereMusic festivals, sports events are produced via flypacks and remote control roomsBy...

17/10/2025

Give Me the Backstory: Get to Know Sarah Dowland, the Filmmaker Behind Sue Bird: In The Clutch

By Lucy Spicer One of the most exciting things about the Sundance Film Festival...

17/10/2025

Cooper Raiff Returns to the Sundance Film Festival With His Independent Series Hal & Harper

(L-R) Christopher Meyer, Addison Timlin, Cooper Raiff, Lili Reinhart, Alyah Chan...

17/10/2025

Sports Fishing Championship Deploys DigitalGlue Storage Platform

MURRIETA, Calif. The Sports Fishing Championship (SFC) has deployed DigitalGlue's creative.space storage platform to streamline video production by centrali...

17/10/2025

TV Ad Impressions for Football Spiked in Q3

BELLEVUE, Wash. Football continued to cement its reputation as a bulwark of TV advertising in Q3 2025 with new data from iSpot that showed both the NFL and coll...

17/10/2025

Reeling in the Chaos Sports Fishing Championship Simplifi...

The Sports Fishing Championship (SFC), the premier competitive saltwater fishing series, has transformed its production workflow by adopting creative.space, the...

17/10/2025

QuickLink Unveils StudioPro Version 4 With Major Enhancem...

QuickLink, a leading provider of award-winning multi-camera video productions and remote contribution solutions, announces the release of StudioPro Version 4, ...

17/10/2025

Westcoast Pixel dazzles with dynamic 3D video projections

Although the annual Grammy Awards celebration is best known for recognizing achievements in the recording industry, the show often proves a visual spectacle as ...

17/10/2025

Alex Dunfey Promoted to CTO at OpenDrives

OpenDrives, Inc., a leading provider of software-defined data storage and data services, has promoted Alex Dunfey to Chief Technology Officer (CTO) from his for...

17/10/2025

University of Arizona Scales Up Broadcast Capabilities Wi...

The University of Arizona (UofA) has significantly upgraded its broadcast communication infrastructure with the integration of Riedel Communications' advanc...

17/10/2025

NESN Redefines Regional Sports Video Delivery with Harmon...

Harmonic (NASDAQ: HLIT) today announced that New England Sports Network (NESN), owned by Fenway Sports Group and Delaware North, has selected Harmonic as its en...

17/10/2025

Austin PBS Expands Facility-Wide Production Communication...

Austin PBS has recently upgraded its facility-wide communications infrastructure, deploying Clear-Com 's Eclipse HX, FreeSpeak II beltpacks, and V-Series ...

17/10/2025

ZEISS Opens BETA Registration for CinCraft Virtual Lens T...

ZEISS announces an open call for the closed BETA testing phase of CinCraft Virtual Lens Technology, the innovative digital tool that brings authentic lens chara...

17/10/2025

Lightware powers hybrid learning transformation at Centri...

Situated in the town of Kokkola, Centria University of Applied Sciences offers higher education across five core fields: engineering, business, social and healt...

17/10/2025

Pebble to automate CobbTV

Public information channel in Georgia, USA, to implement a powerful, simple, and cost-effective playout automation platform. Pebble, the leading automation, co...

17/10/2025

HBO Maxs Global Expansion Surpasses 100 Market Milestone

HBO Max is reporting that it has launched in 15 new markets, including Bangladesh, Cambodia, Macau, Pakistan, Sri Lanka and Ukraine, boosting the streaming serv...

17/10/2025

Netflix Expands Into Video Podcasts With Spotify Deal

Netflix said it will make a major push into video podcasts, inking a wide-ranging deal with Spotify through which it will offer 16 podcasts in the U.S. starting...

17/10/2025

Viamedia Rebrands as Viamedia.ai

Lexington, Ky. As part of a push to highlight its advanced advertising capabilities, Viamedia has launched a new AI-powered ad tech platform and officially rebr...

17/10/2025

QuickLink to Showcase StudioPro Version 4 at NAB Show New York

NEW YORK QuickLink has announced the release of StudioPro Version 4, which the company is calling the most significant upgrade yet to its flagship video product...