
As AI models evolve and adoption grows, enterprises must perform a delicate balancing act to achieve maximum value.
That's because inference - the process of running data through a model to get an output - offers a different computational challenge than training a model.
Pretraining a model - the process of ingesting data, breaking it down into tokens and finding patterns - is essentially a one-time cost. But in inference, every prompt to a model generates tokens, each of which incur a cost.
That means that as AI model performance and use increases, so do the amount of tokens generated and their associated computational costs. For companies looking to build AI capabilities, the key is generating as many tokens as possible - with maximum speed, accuracy and quality of service - without sending computational costs skyrocketing.
As such, the AI ecosystem has been working to make inference cheaper and more efficient. Inference costs have been trending down for the past year thanks to major leaps in model optimization, leading to increasingly advanced, energy-efficient accelerated computing infrastructure and full-stack solutions.
According to the Stanford University Institute for Human-Centered AI's 2025 AI Index Report, the inference cost for a system performing at the level of GPT-3.5 dropped over 280-fold between November 2022 and October 2024. At the hardware level, costs have declined by 30% annually, while energy efficiency has improved by 40% each year. Open-weight models are also closing the gap with closed models, reducing the performance difference from 8% to just 1.7% on some benchmarks in a single year. Together, these trends are rapidly lowering the barriers to advanced AI.
As models evolve and generate more demand and create more tokens, enterprises need to scale their accelerated computing resources to deliver the next generation of AI reasoning tools or risk rising costs and energy consumption.
What follows is a primer to understand the concepts of the economics of inference, enterprises can position themselves to achieve efficient, cost-effective and profitable AI solutions at scale.
Key Terminology for the Economics of AI Inference Knowing key terms of the economics of inference helps set the foundation for understanding its importance.
Tokens are the fundamental unit of data in an AI model. They're derived from data during training as text, images, audio clips and videos. Through a process called tokenization, each piece of data is broken down into smaller constituent units. During training, the model learns the relationships between tokens so it can perform inference and generate an accurate, relevant output.
Throughput refers to the amount of data - typically measured in tokens - that the model can output in a specific amount of time, which itself is a function of the infrastructure running the model. Throughput is often measured in tokens per second, with higher throughput meaning greater return on infrastructure.
Latency is a measure of the amount of time between inputting a prompt and the start of the model's response. Lower latency means faster responses. The two main ways of measuring latency are:
Time to First Token: A measurement of the initial processing time required by the model to generate its first output token after a user prompt.
Time per Output Token: The average time between consecutive tokens - or the time it takes to generate a completion token for each user querying the model at the same time. It's also known as inter-token latency or token-to-token latency.
Time to first token and time per output token are helpful benchmarks, but they're just two pieces of a larger equation. Focusing solely on them can still lead to a deterioration of performance or cost.
To account for other interdependencies, IT leaders are starting to measure goodput, which is defined as the throughput achieved by a system while maintaining target time to first token and time per output token levels. This metric allows organizations to evaluate performance in a more holistic manner, ensuring that throughput, latency and cost are aligned to support both operational efficiency and an exceptional user experience.
Energy efficiency is the measure of how effectively an AI system converts power into computational output, expressed as performance per watt. By using accelerated computing platforms, organizations can maximize tokens per watt while minimizing energy consumption.
How the Scaling Laws Apply to Inference Cost The three AI scaling laws are also core to understanding the economics of inference:
Pretraining scaling: The original scaling law that demonstrated that by increasing training dataset size, model parameter count and computational resources, models can achieve predictable improvements in intelligence and accuracy.
Post-training: A process where models are fine-tuned for accuracy and specificity so they can be applied to application development. Techniques like retrieval-augmented generation can be used to return more relevant answers from an enterprise database.
Test-time scaling (aka long thinking or reasoning ): A technique by which models allocate additional computational resources during inference to evaluate multiple possible outcomes before arriving at the best answer.
While AI is evolving and post-training and test-time scaling techniques become more sophisticated, pretraining isn't disappearing and remains an important way to scale models. Pretraining will still be needed to support post-training and test-time scaling.
Profitable AI Takes a Full-Stack Approach In comparison to inference from a model that's only gone through pretraining and post-training, models that harness test-time scaling generate multiple tokens to solve a complex problem. This results in more accurate and relevant model outputs - but
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
02/05/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
01/05/2026
January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...
01/04/2026
January 4 2026, 18:00 (PST) DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION
Douyin Users Can Now Create And Share Videos With Stun...
17/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
17/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
17/03/2026
QuickLink's Latest StudioEdge Models to Make North American Debut at NAB 202
Brie Clayton March 16, 2026
0 Comments
The Multi-platform Remote Gues...
17/03/2026
Frankenstein Graded with DaVinci Resolve Studio
Brie Clayton March 16, 2026
0 Comments
Sonnenfeld enhances the controlled interplay between warm and c...
17/03/2026
New Voyavox from Link Electronics with Real-Time Speech-to-Text Captioning to be...
17/03/2026
Berklee City Music Stewards META Fellowship Supporting Massachusetts Music Educa...
16/03/2026
DAZN will allow fans in select international territories to watch the NCAA men...
16/03/2026
IDM and The Skate Board Association (SBA) have announced a partnership with Coop...
16/03/2026
Solid State Logic (SSL) will debut the Net I/O ST 2110 Bridge at NAB 2026 (booth C6907), a standalone unit that converts between ST 2110 and Dante audio formats...
16/03/2026
Marshall Electronics (Booth C8339) is introducing its first all-IP 4K POV camera, the CV574-WP, at NAB 2026. The camera carries an IP67 weatherproof rating for ...
16/03/2026
Sony Electronics' Camera Verify (beta), a feature of its Camera Authenticity Solution which enables news organizations to share content authenticity informa...
16/03/2026
FloSports has announced a partnership with Storied Sports, a content and IP studio founded by former espnW and The Players' Tribune executives, to develop s...
16/03/2026
Montreux Jazz Festival has announced a multi-year collaboration with Gravity Media, who will become the Festival's Audio Visual Production Provider followin...
16/03/2026
USSI Global, a provider of customized network, broadcast and digital signage systems and services, has announced Ralph Annunziata joined the company on Jan. 5 a...
16/03/2026
Boland Communications (booth C3519) will exhibit at NAB Show 2026 in Las Vegas, ...
16/03/2026
Built in partnership with Diversified, the system
As the sports broadcast indus...
16/03/2026
Global FAST (Free Ad-supported Streaming TV) viewership grew 21% year-over-year ...
16/03/2026
Behind The Mic provides a roundup of recent news regarding on-air talent, includ...
16/03/2026
A still from Mr. Nobody Against Putin by David Borestein and Pavel Talankin, a...
16/03/2026
Promises studio-grade fidelity for the stage
Cloudvocal have announced the launch of a new instrument mic designed for professional live performers and engi...
16/03/2026
Popular MIDI/CV converter & interface overhauled
Kenton have announced the launch of the USB Solo Mk2, a new and improved version of their compact MIDI to C...
16/03/2026
Running from 16-29 March 2026
Starting from today (16 March) and running until 29 March 2026, Sonarworks are offering discounts of up to 40% across their ra...
16/03/2026
Dr. Robert H. Goddard and a liquid oxygen-gasoline rocket in the frame from which it was fired on March 16, 1926, at Auburn, Massachusetts. Credit: NASA....
16/03/2026
Precision-guided munitions shown in production illustrate one of many operational systems benefiting from modernized M-Code GPS, supporting assured positioning,...
16/03/2026
NEW YORK - March 16, 2026 - A E Global Media and Nielsen today announced a new,...
16/03/2026
aconnic AG (ISIN: DE000A0LBKW6), Munich, is delivering the first commercial 100-Gigabit systems following successful validation and certification for customer n...
16/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/03/2026
Shotoku USA, Shotoku Broadcast Systems' North American operation, will unveil significant additions to its platform at NAB 2026. Topping the list is the wor...
16/03/2026
Ikegami USA will demonstrate the latest additions to its wide range of broadcast-quality cameras, controllers and monitors on Central Hall booth C3819 during th...
16/03/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
16/03/2026
ELEMENTS launches GRID at NAB Show 2026
Brie Clayton March 15, 2026
0 Comments
North Hall, Booth N1717
ELEMENTS returns to NAB Show 2026, with an exp...
16/03/2026
Blackmagic Design Cameras Capture Artist Salavat Fidai's Micro Sculptures
Brie Clayton March 15, 2026
0 Comments
6K sensor and open gate capabilit...
16/03/2026
DHD to Introduce Latest Generation Broadcast Audio Mixers at NAB 2026, Las Vegas
Brie Clayton March 15, 2026
0 Comments
Hero image: Front of DHD RM1 P...
16/03/2026
16 Mar 2026
VEON Files its 2025 Annual Report on Form 20-F Dubai and New York, March 16, 2026 - VEON Ltd. (Nasdaq: VEON), a global digital operator ( VEON'...
16/03/2026
Six Couples. 100 Days Apart. One Question: Does Absence Make the Heart Grow Fond...
16/03/2026
Monday 16 March 2026
Tina Fey, Jamie Dornan and Riz Ahmed announced as first th...
16/03/2026
Back to All News
Netflix Has Released the Trailer for Love at Last, Starring Ed...
16/03/2026
The reporting option was introduced following extensive consultation with publis...
16/03/2026
After a nail-biting Grand Finale, Rose of Tralee Katelyn Cummins has been announced as the winner of Dancing with the Stars 2026.
The four finalists each dance...
16/03/2026
Welcome to Moore Street will begin on RT One and RT Player on Thursday 19 Marc...
15/03/2026
Visit ToolsOnAir at NAB Las Vegas 2026
More Details:From April 19-22, join us at NAB Show Las Vegas in the North Hall, Booth N1258, for an exclusive preview of...
15/03/2026
Latest dark drama, thrillers & tension library announced
The Very Loud Indeed Co.'s latest Kontakt library has just arrived, delivering a third instalme...
15/03/2026
Johannesburg, 14 March 2026 - On 13 and 14 March 2026, the 19th Annual South Afr...