Into the Omniverse: Three Workflows for Improving Vision AI Agent Accuracy With Synthetic Data and Fine-Tuning

30/06/2026

Editor's note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners, and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

Vision AI agents are becoming a practical way to automatically turn video data from the physical world into operational intelligence in factories, cities, warehouses and transportation systems.

That shift is accelerating as more AI workloads move closer to where data is generated. Gartner projects that more than two-thirds of enterprise-managed data will be created and processed outside the data center or cloud by 2028, and that over two-thirds of all enterprises globally will deploy edge AI by 2029, up from 10% in 2025 (1).

But more edge data doesn't automatically create more intelligence. As much as 90% of existing edge data goes unprocessed, according to the same Gartner report.

Turning that data into useful action requires vision AI agents that can understand video, adapt to real-world conditions and connect insights to operational workflows. These agents often run near cameras, machines and sensors, where models must meet latency, power, cost and connectivity requirements while adapting to site-specific conditions.

To build those agents, developers need repeatable ways to generate training data, fine-tune models and deploy agentic video applications across edge and cloud environments.

NVIDIA Metropolis agent skills and blueprints give developers reusable workflows to build, operate and optimize vision AI agents across that lifecycle.

For the simulation and synthetic data side of that work, Universal Scene Description, or OpenUSD, provides a common framework for describing, composing and reusing 3D worlds. Built on OpenUSD, NVIDIA Omniverse libraries help teams build simulation, synthetic data generation and digital twin workflows that model real-world environments and expand scenario coverage across conditions such as lighting, weather, traffic patterns, camera angles, occlusion and rare events.

Where Vision AI Agent Projects Can Get Stuck As organizations move toward autonomous vision agents, three challenges often come up:

Accuracy Plateaus With Data Gaps: Vision AI agents need to spot rare defects, abnormal events and changing environments. In manufacturing, for example, an inspection model may perform well on common scratches or dents but struggle with a new hairline crack not represented in the training data.

Lack of Fine-Tuning Expertise: Once teams identify a performance gap, improving the model is rarely a simple handoff. Fine-tuning requires labeled datasets, training configuration, experiment tracking, evaluation and decisions about whether there's improvement for the target use case. Many organizations building vision AI agents don't have large in-house machine learning teams to manage that process quickly, especially across many sites, products or camera views.

Complex, Time-Consuming Agent Assembly Workflows: Deploying a vision AI agent requires more than running inference. Developers have to stitch together video pipelines, AI models, metadata, embeddings, indexing, search, alerts, reporting and system integrations. Customizing that workflow for a specific environment adds significant time and requires specialized expertise. Without OpenUSD's shared scene description layer, teams often rebuild 3D environments from scratch each time conditions or deployment sites change.

A Full-Lifecycle Approach to Vision AI Agents NVIDIA agent skills and blueprints - used alongside NVIDIA Omniverse for OpenUSD-based simulation and synthetic data generation, NVIDIA Metropolis for model development and video AI deployment - give developers reusable starting points for key parts of those workflows:

The Defect Image Generation skill helps create synthetic defect data.

The Video Data Augmentation skill helps expand scenario coverage.

NVIDIA TAO skills enable model fine-tuning.

NVIDIA video search and summarization (VSS) skills help turn video understanding into deployable workflows for alerts, reporting, stream management and more.

Instead of rebuilding every step from scratch, developers can use these reusable workflows to generate data, improve models and deploy vision AI agents faster.

Visual Inspection: Generating the Data That Production Lines Don't Have In manufacturing, the more successful a factory is at preventing defects, the harder it becomes to collect enough defect examples to train the next inspection model.

Roboflow is integrating the NVIDIA Defect Image Generation skill and NVIDIA Cosmos world foundation models into its vision AI platform to generate synthetic defect images for customers like Corning when real training data is scarce, enabling near-perfect detection performance while significantly reducing the need for daily manual image review.

In a benchmark conducted with Corning's optical fiber manufacturing engineering team, a model trained on just eight real defect images - augmented with synthetic data generated by the NVIDIA Defect Image Generation skill - reached an average precision of 95% and perfect recall on the most challenging defect class. This performance surpassed a baseline model trained solely on real data, effectively compressing a multi-quarter inspection project into just a few days.

Watch how synthetic data generation workflows help developers create the data needed to train and improve physical AI models:

Smart Cities: From Video Analytics to Autonomous Operations Large-scale city operations show why vision AI agents need connected workflows, not just inference.

Linker Vision is building smart city AI systems with the NVIDIA Metropolis Blueprint for VSS to accelerate the deployment of video reasoning agents across city infrastructure. In this workflow, VSS skills can help package com

LINK:	https://blogs.nvidia.com/blog/vision-ai-agent-skills-omniverse-metropo...
	See more stories from nvidia

Into the Omniverse: Three Workflows for Improving Vision AI Agent Accuracy With Synthetic Data and Fine-Tuning

More from Nvidia

30/06/2026

29/06/2026

29/06/2026

25/06/2026

23/06/2026

23/06/2026

23/06/2026

22/06/2026

22/06/2026

22/06/2026

22/06/2026

22/06/2026

22/06/2026

21/06/2026

18/06/2026

18/06/2026

18/06/2026

17/06/2026

16/06/2026

16/06/2026

16/06/2026

12/06/2026

11/06/2026

10/06/2026

10/06/2026

09/06/2026

07/06/2026

07/06/2026

07/06/2026

07/06/2026

04/06/2026

04/06/2026

03/06/2026

03/06/2026

02/06/2026

02/06/2026

01/06/2026

01/06/2026

31/05/2026

31/05/2026

31/05/2026

28/05/2026

28/05/2026

26/05/2026

21/05/2026

21/05/2026

19/05/2026

18/05/2026

14/05/2026

13/05/2026