
AI is creating value for everyone - from researchers in drug discovery to quantitative analysts navigating financial market changes.
The faster an AI system can produce tokens, a unit of data used to string together outputs, the greater its impact. That's why AI factories are key, providing the most efficient path from time to first token to time to first value.
AI factories are redefining the economics of modern infrastructure. They produce intelligence by transforming data into valuable outputs - whether tokens, predictions, images, proteins or other forms - at massive scale.
They help enhance three key aspects of the AI journey - data ingestion, model training and high-volume inference. AI factories are being built to generate tokens faster and more accurately, using three critical technology stacks: AI models, accelerated computing infrastructure and enterprise-grade software.
Read on to learn how AI factories are helping enterprises and organizations around the world convert the most valuable digital commodity - data - into revenue potential.
From Inference Economics to Value Creation Before building an AI factory, it's important to understand the economics of inference - how to balance costs, energy efficiency and an increasing demand for AI.
Throughput refers to the volume of tokens that a model can produce. Latency is the amount of tokens that the model can output in a specific amount of time, which is often measured in time to first token - how long it takes before the first output appears - and time per output token, or how fast each additional token comes out. Goodput is a newer metric, measuring how much useful output a system can deliver while hitting key latency targets.
User experience is key for any software application, and the same goes for AI factories. High throughput means smarter AI, and lower latency ensures timely responses. When both of these measures are balanced properly, AI factories can provide engaging user experiences by quickly delivering helpful outputs.
For example, an AI-powered customer service agent that responds in half a second is far more engaging and valuable than one that responds in five seconds, even if both ultimately generate the same number of tokens in the answer.
Companies can take the opportunity to place competitive prices on their inference output, resulting in more revenue potential per token.
Measuring and visualizing this balance can be difficult - which is where the concept of a Pareto frontier comes in.
AI Factory Output: The Value of Efficient Tokens The Pareto frontier, represented in the figure below, helps visualize the most optimal ways to balance trade-offs between competing goals - like faster responses vs. serving more users simultaneously - when deploying AI at scale.
The vertical axis represents throughput efficiency, measured in tokens per second (TPS), for a given amount of energy used. The higher this number, the more requests an AI factory can handle concurrently.
The horizontal axis represents the TPS for a single user, representing how long it takes for a model to give a user the first answer to a prompt. The higher the value, the better the expected user experience. Lower latency and faster response times are generally desirable for interactive applications like chatbots and real-time analysis tools.
The Pareto frontier's maximum value - shown as the top value of the curve - represents the best output for given sets of operating configurations. The goal is to find the optimal balance between throughput and user experience for different AI workloads and applications.
The best AI factories use accelerated computing to increase tokens per watt - optimizing AI performance while dramatically increasing energy efficiency across AI factories and applications.
The animation above compares user experience when running on NVIDIA H100 GPUs configured to run at 32 tokens per second per user, versus NVIDIA B300 GPUs running at 344 tokens per second per user. At the configured user experience, Blackwell Ultra delivers over a 10x better experience and almost 5x higher throughput, enabling up to 50x higher revenue potential.
How an AI Factory Works in Practice An AI factory is a system of components that come together to turn data into intelligence. It doesn't necessarily take the form of a high-end, on-premises data center, but could be an AI-dedicated cloud or hybrid model running on accelerated compute infrastructure. Or it could be a telecom infrastructure that can both optimize the network and perform inference at the edge.
Any dedicated accelerated computing infrastructure paired with software turning data into intelligence through AI is, in practice, an AI factory.
The components include accelerated computing, networking, software, storage, systems, and tools and services.
When a person prompts an AI system, the full stack of the AI factory goes to work. The factory tokenizes the prompt, turning data into small units of meaning - like fragments of images, sounds and words.
Each token is put through a GPU-powered AI model, which performs compute-intensive reasoning on the AI model to generate the best response. Each GPU performs parallel processing - enabled by high-speed networking and interconnects - to crunch data simultaneously.
An AI factory will run this process for different prompts from users across the globe. This is real-time inference, producing intelligence at industrial scale.
Because AI factories unify the full AI lifecycle, this system is continuously improving: inference is logged, edge cases are flagged for retraining and optimization loops tighten over time - all without manual intervention, an example of goodput in action.
Leading global security technology company Lockheed Martin has built its own AI factory to support diverse uses across its business. Through its
More from Nvidia
29/05/2025
Ready for a front-row seat to the next scientific revolution?
That's the idea behind Doudna - a groundbreaking supercomputer announced today at Lawrence Be...
29/05/2025
Large language models (LLMs), trained on datasets with billions of tokens, can generate high-quality content. They're the backbone for many of the most popu...
29/05/2025
GeForce NOW is supercharging Valve's Steam Deck with a new native app - delivering the high-quality GeForce RTX-powered gameplay members are used to on a po...
28/05/2025
Building effective agentic AI systems requires rethinking how technology interac...
27/05/2025
Over a century ago, Henry Ford pioneered the mass production of cars and engines...
27/05/2025
NVIDIA and Google share a long-standing relationship rooted in advancing AI inno...
22/05/2025
GeForce NOW is turning up the heat this summer with a hot new deal. For a limited time, save 40% on six-month Performance memberships and enjoy premium GeForce ...
21/05/2025
As robots increasingly make their way to the largest enterprises' manufacturing plants and warehouses, the need for access to critical business and operatio...
20/05/2025
Industrial AI is transforming how factories operate, innovate and scale.
The convergence of AI, simulation and digital twins is poised to unlock new levels of ...
19/05/2025
Agentic AI is redefining scientific discovery and unlocking research breakthroughs and innovations across industries. Through deepened collaboration, NVIDIA and...
19/05/2025
Across robot training and development, NVIDIA Research is uncovering breakthroughs in areas such as multimodal generative AI and synthetic data generation.
The...
19/05/2025
Generative AI is transforming PC software into breakthrough experiences - from digital humans to writing assistants, intelligent agents and creative tools.
NVI...
18/05/2025
Electricity. The Internet. Now it's time for another major technology, AI, to sweep the globe.
NVIDIA founder and CEO Jensen Huang took the stage at a pack...
18/05/2025
Empowering engineering teams with more tools for building AI factories, NVIDIA t...
18/05/2025
The age of video analytics AI agents is here.
Video is one of the defining feat...
18/05/2025
TSMC, Cadence, KLA, Siemens and Synopsys are advancing semiconductor manufacturi...
18/05/2025
NVIDIA is highlighting significant momentum for its new Grace CPU C1 this week at the COMPUTEX trade show in Taipei, with a strong showing of support from key o...
18/05/2025
Researchers across Taiwan are tackling complex challenges in AI development, climate science and quantum computing. Their work will soon be boosted by a new sup...
18/05/2025
Leading healthcare organizations across the globe are using agentic AI, robotics...
18/05/2025
Quantum computing promises to shorten the path to solving some of the world'...
15/05/2025
Editor's note: This post is part of Into the Omniverse, a series focused on ...
15/05/2025
AI is creating value for everyone - from researchers in drug discovery to quantitative analysts navigating financial market changes.
The faster an AI system ca...
15/05/2025
Steel clashes and war drums thunder as a new age of battle dawns - one that will test even the mightiest Slayer.
This GFN Thursday, DOOM: The Dark Ages - the b...
14/05/2025
Think tap to pay - but smarter and safer. Visa is tapping into AI to enhance services for its global network of customers, focused on fraud prevention, personal...
14/05/2025
Electronic music icon Don Diablo is known for pushing the boundaries of music, v...
13/05/2025
As the manufacturing industry faces challenges - such as labor shortages, reshor...
13/05/2025
Editor's note: This post is part of the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copi...
12/05/2025
NVIDIA today received multiple accolades at COMPUTEX's Best Choice Awards, in recognition of innovation across the company.
The NVIDIA GeForce RTX 5090 GPU...
08/05/2025
Artificial intelligence is helping identify and treat diseases faster with better results for humankind. Natural disasters like wildfires are next.
Fires in th...
08/05/2025
Calling all wiseguys - 2K's acclaimed Mafia franchise is available to stream...
08/05/2025
As AI use cases continue to expand - from document summarization to custom software agents - developers and enthusiasts are seeking faster, more flexible ways t...
07/05/2025
A new supercomputer offered by Cadence, a leading provider of technology for ele...
07/05/2025
Enterprises across industries are exploring AI to rethink problem-solving and re...
02/05/2025
With graduation season approaching, a new cohort of students is embarking on next steps, aiming to use their passions and skills to make a real, tangible impact...
01/05/2025
May brings more than just rainbows and sunshine - it's also time for fresh adventures and epic battles. This GFN Thursday spotlights 20 can't-miss games...
01/05/2025
For Nicolas Simon, advancing the field of robotics is a personal mission that could change his siblings' lives.
Two-thirds of Simon's family members us...
30/04/2025
The quick-service restaurant (QSR) industry is being reinvented by AI.
For exam...
30/04/2025
AI-powered image generation has progressed at a remarkable pace - from early exa...
28/04/2025
As enterprises increasingly adopt AI, securing AI factories - where complex, agentic workflows are executed - has never been more critical.
NVIDIA is bringing ...
28/04/2025
Agentic AI is redefining the cybersecurity landscape - introducing new opportunities that demand rethinking how to secure AI while offering the keys to addressi...
28/04/2025
Oracle has stood up and optimized its first wave of liquid-cooled NVIDIA GB200 N...
24/04/2025
Advancing AI requires a full-stack approach, with a powerful foundation of computing infrastructure - including accelerated processors and networking technologi...
24/04/2025
Get the controllers ready and clear the calendar - it's a jam-packed GFN Thu...
23/04/2025
As AI models evolve and adoption grows, enterprises must perform a delicate balancing act to achieve maximum value.
That's because inference - the process ...
23/04/2025
Financial services has long been at the forefront of adopting technological innovations. Today, generative AI and agentic systems are redefining the industry, f...
23/04/2025
AI is rapidly reshaping what's possible on a PC - whether for real-time image generation or voice-controlled workflows. As AI capabilities grow, so does the...
23/04/2025
An AI agent is only as accurate, relevant and timely as the data that powers it....
22/04/2025
Whether at sea, land or in the sky - even outer space - NVIDIA technology is helping research scientists and developers alike explore and understand oceans, wil...
22/04/2025
Traditionally, data centers have relied on air cooling - where mechanical chillers circulate chilled air to absorb heat from servers, helping them maintain opti...
22/04/2025
About 15% of the world's population - over a billion people - are affected by neurological disorders, from commonly known diseases like Alzheimer's and ...