
Under the hood of every AI application are algorithms that churn through data in their own language, one based on a vocabulary of tokens.
Tokens are tiny units of data that come from breaking down bigger chunks of information. AI models process tokens to learn the relationships between them and unlock capabilities including prediction, generation and reasoning. The faster tokens can be processed, the faster models can learn and respond.
AI factories - a new class of data centers designed to accelerate AI workloads - efficiently crunch through tokens, converting them from the language of AI to the currency of AI, which is intelligence.
With AI factories, enterprises can take advantage of the latest full-stack computing solutions to process more tokens at lower computational cost, creating additional value for customers. In one case, integrating software optimizations and adopting the latest generation NVIDIA GPUs reduced cost per token by 20x compared to unoptimized processes on previous-generation GPUs - delivering 25x more revenue in just four weeks.
By efficiently processing tokens, AI factories are manufacturing intelligence - the most valuable asset in the new industrial revolution powered by AI.
What Is Tokenization? Whether a transformer AI model is processing text, images, audio clips, videos or another modality, it will translate the data into tokens. This process is known as tokenization.
Efficient tokenization helps reduce the amount of computing power required for training and inference. There are numerous tokenization methods - and tokenizers tailored for specific data types and use cases can require a smaller vocabulary, meaning there are fewer tokens to process.
For large language models (LLMs), short words may be represented with a single token, while longer words may be split into two or more tokens.
The word darkness, for example, would be split into two tokens, dark and ness, with each token bearing a numerical representation, such as 217 and 655. The opposite word, brightness, would similarly be split into bright and ness, with corresponding numerical representations of 491 and 655.
In this example, the shared numerical value associated with ness can help the AI model understand that the words may have something in common. In other situations, a tokenizer may assign different numerical representations for the same word depending on its meaning in context.
For example, the word lie could refer to a resting position or to saying something untruthful. During training, the model would learn the distinction between these two meanings and assign them different token numbers.
For visual AI models that process images, video or sensor data, a tokenizer can help map visual inputs like pixels or voxels into a series of discrete tokens.
Models that process audio may turn short clips into spectrograms - visual depictions of sound waves over time that can then be processed as images. Other audio applications may instead focus on capturing the meaning of a sound clip containing speech, and use another kind of tokenizer that captures semantic tokens, which represent language or context data instead of simply acoustic information.
How Are Tokens Used During AI Training? Training an AI model starts with the tokenization of the training dataset.
Based on the size of the training data, the number of tokens can number in the billions or trillions - and, per the pretraining scaling law, the more tokens used for training, the better the quality of the AI model.
As an AI model is pretrained, it's tested by being shown a sample set of tokens and asked to predict the next token. Based on whether or not its prediction is correct, the model updates itself to improve its next guess. This process is repeated until the model learns from its mistakes and reaches a target level of accuracy, known as model convergence.
After pretraining, models are further improved by post-training, where they continue to learn on a subset of tokens relevant to the use case where they'll be deployed. These could be tokens with domain-specific information for an application in law, medicine or business - or tokens that help tailor the model to a specific task, like reasoning, chat or translation. The goal is a model that generates the right tokens to deliver a correct response based on a user's query - a skill better known as inference.
How Are Tokens Used During AI Inference and Reasoning? During inference, an AI receives a prompt - which, depending on the model, may be text, image, audio clip, video, sensor data or even gene sequence - that it translates into a series of tokens. The model processes these input tokens, generates its response as tokens and then translates it to the user's expected format.
Input and output languages can be different, such as in a model that translates English to Japanese, or one that converts text prompts into images.
To understand a complete prompt, AI models must be able to process multiple tokens at once. Many models have a specified limit, referred to as a context window - and different use cases require different context window sizes.
A model that can process a few thousand tokens at once might be able to process a single high-resolution image or a few pages of text. With a context length of tens of thousands of tokens, another model might be able to summarize a whole novel or an hourlong podcast episode. Some models even provide context lengths of a million or more tokens, allowing users to input massive data sources for the AI to analyze.
Reasoning AI models, the latest advancement in LLMs, can tackle more complex queries by treating tokens differently than before. Here, in addition to input and output tokens, the model generates a host of reasoning tokens over minutes or hours as it thinks about how to solve a given problem.
These
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
06/10/2025
France T l visions, France's leading broadcaster, has received the 2025 EBU ...
17/09/2025
Tech Focus: Audio Training, Part 2 - Manufacturers Offer Extensive Online Learni...
17/09/2025
Tech Focus: Audio Training, Part 1 - A1 Shortage Remains a Major-League Challeng...
17/09/2025
It was the ultimate convergence of pop culture and literary prestige: Last night, Dua Lipa brought her Service95 Book Club podcast to the stage for a special li...
17/09/2025
During August, streaming's share of TV viewing in Mexico showed an increase of 0.4% compared to the previous month, accounting for 25% of TV viewing.
Discl...
17/09/2025
CYPRESS, Calif. FOR-A America has named Jo Aun as senior manager of product engineering, a new role responsible for guiding the planning, development and rollou...
17/09/2025
PlayBox Neo, in partnership with CIS Group, a leading provider of media and broadcast technology solutions, has successfully deployed PlayBox Neo's Dual Cha...
17/09/2025
In a relationship that mirrors societal advances in sustainability, Brightline Lighting and the Federal Energy Regulatory Commission (FERC) Headquarters have en...
17/09/2025
Clear-Com is proud to support the world-class productions of Alley Theatre, one of the oldest and largest nonprofit resident theatres in the United States. With...
17/09/2025
Arch Platform Technologies (www.archpt.io), a pioneer in automated, scalable cloud infrastructure for high-performance workflows, today announced a Strategic Co...
17/09/2025
Over 300 selected decision-makers from start-ups, corporates, and VC funds worldwide will gather for the third edition of the event, united by a single goal: to...
17/09/2025
Telestream, a global leader in media workflow technologies, is excited to announce that its flagship Vantage platform and its next-generation AI capabilities re...
17/09/2025
Mediagenix, a global leader in smart content solutions that profitably connect the right content to the right audience, proudly announces its three Best of Show...
17/09/2025
In a move to further establish a firm foothold across South East Asia, PlayBox Neo, the well-respected name in broadcast playout and channel branding, has appoi...
17/09/2025
Wisycom, a global leader in advanced wireless audio solutions, announced two major wireless solutions at IBC 2025 (Stand 8.D30). This includes the Portable RF-o...
17/09/2025
Six Berklee Alumni Win Emmy Awards The recipients were recognized for their contributions to acclaimed programs Severance, The Studio, The Penguin, SNL50: The...
17/09/2025
Applications Open for Berklee in Santo Domingo The weeklong contemporary music program will run January 5-10, 2026.
By
Colette Greenstein
September 17, 2025
...
17/09/2025
Ukrainian Students Find Creative Consonance' at Berklee Valencia Through ELIA's UAx Platform, six students from Kyiv joined Berklee Valencia for a week...
17/09/2025
Earlier this year Avid announced Kenna Hilburn as its new senior vice president of product. Recently Hilburn was promoted to Avids new Chief Product Officer, su...
17/09/2025
Transatlantic collaboration combines experience and agility to drive innovation in network design and delivery
Luxembourg, September 16, 2025 - SES, a leading ...
17/09/2025
NEW YORK Madhive has announced that the Fox Television Stations have joined its Live Sports Marketplace....
17/09/2025
SYRACUSE, N.Y. Sony Electronics has announced that it is partnering with the Newhouse School at Syracuse University to provide state-of-the-art equipment, hands...
17/09/2025
SAN JOSE, Calif. Roku has announced that the first smart projector using its Roku TV operating system, the Aurzen Roku TV Smart Projector D1R Cube, is now avail...
17/09/2025
Wednesday 17 September 2025
UK artists capture icons of stage and screen, inclu...
17/09/2025
Jo Returns to FOR-A as Senior Manager of Product Management and Engineering...
17/09/2025
For the Moon Safari anniversary tour, AIR opened the doors to their backstage. Just a few hours before the Paris concert, DPA met with two key figures of the te...
17/09/2025
Auditions will be held in Dublin, Cork and Galway
The County Parade returns f...
16/09/2025
SVG All-Stars: Leigh Michaud, Manager, Remote Operations, ESPNThe UConn grad rose from ESPN's mailroom to become one of its most valuable ops leadersBy Bran...
16/09/2025
Live From IBC 2025: Friday's Latest From Halls 1-4, Outdoor Exhibits in Amst...
16/09/2025
Live From IBC 2025: Saturday's Latest From Halls 5-7 in Amsterdam By SVG Staff
Friday, September 12, 2025 - 17:00
Print This Story
The SVG Europe and ...
16/09/2025
Live From IBC 2025: Sunday's Latest From Halls 8-10 in Amsterdam By SVG Staff
Saturday, September 13, 2025 - 17:00
Print This Story
The SVG Europe and...
16/09/2025
Live From IBC 2025: Monday's Latest From Halls 11-14 in Amsterdam By SVG Staff
Sunday, September 14, 2025 - 17:00
Print This Story
The SVG Europe and ...
16/09/2025
Amazon Prime Video Picks Up Four Hours of Early-Round Masters Coverage in 2026 By Jason Dachman, Editorial Director, U.S.
Tuesday, September 16, 2025 - 10:15...
16/09/2025
VERSANT Inks Deal for League One Volleyball as Women's Sports Rights Slate G...
16/09/2025
ESPN VP, Corporate Communications, Katina Arnold Named SVP, Disney Advertising C...
16/09/2025
IBC 2025 in Review: SVG Europe's Full Collection of Video Interviews From th...
16/09/2025
Hace una d cada, la m sica latina representaba apenas el 8% de las reproducciones globales en Spotify. Hoy, constituye m s de una cuarta parte (27%) de toda la ...
16/09/2025
A decade ago, Latin music made up just 8% of global Spotify streams. Today, it a...
16/09/2025
Spotify is expanding our video lineup with a new partnership with Zoo 55, part of ITV Studios. For the first time, acclaimed content from ITV Studios is landing...
16/09/2025
At DSEI 2025, James Dunne of L3Harris Maritime UK chaired a panel on aligning the supply chain to the warfighter, where leaders discussed modernising support fo...
16/09/2025
Calrec has strengthened its collaboration with audio metering expert RTW by integrating RTW's new TMxCore metering platform across its full range of Argo IP...
16/09/2025
College Football Scores Top Telecast in August with 16M+ Viewers on FOX, Followe...
16/09/2025
Collaboration marks the first SSP integration of Gracenote IDs, enabling show-le...
16/09/2025
AMSTERDAM The organizers of IBC2025 are reporting that 43,858 visitors from more than 170 countries attended the event, which had more than 1,300 exhibitors and...
16/09/2025
Wooden Camera announces the release of its new Accessory Collection for the FUJIFILM GFX ETERNA 55. The highlights of this collection include vital power soluti...
16/09/2025
Anton/Bauer, a leading manufacturer of mobile power solutions for broadcast and cinematic equipment, has announced the launch of Anton/Bauer Fleet Management, a...
16/09/2025
Teradek, a leading provider of video transmission and live production solutions, today announced the launch of Prism Jetpack, a groundbreaking 5G video contribu...
16/09/2025
Astera, the leader in wireless LED lighting solutions, announces the ultra-versatile SolaBulb. Building on the success of the Astera bulb family, SolaBulb intro...
16/09/2025
As the world gathered at TED2025 to explore the provocative theme "Humanity Reimagined", Clear-Com , supported by NETGEAR networking infrastructure, delivered f...