
Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.
NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.
AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.
But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.
Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.
Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.
NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:
NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.
NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.
NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.
Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.
With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.
Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:
Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service
Google Cloud's Vertex AI, Google Kubernetes Engine
Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service
Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine
Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.
For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.
Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.
The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.
From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.
Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.
Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.
Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
Most recent headlines
11/12/2025
Dalet, a leading provider of cloud-native, end-to-end media workflow solutions, ...
04/12/2025
ToolsOnAir Blackmagic Design HyperDeck Event Presets for just:in mac pro 2025 & ...
04/12/2025
ToolsOnAir AJA Ki Pro Event Presets for just:in mac pro 2025 & just:in linux
More Details:Starting with version 5.5, both just:in mac pro and just:in linux sol...
04/12/2025
Wangu Kanuri from Kenya and Godwin Asediba from Ghana are two of this years finalists for Thomsons Young Journalist of the Year Award. The pair are runners-up i...
04/12/2025
SVG Sit-Down: ProximaVision's Claudio Lisman on Why Tethered Drones Could Be...
04/12/2025
SVG Campus Shot Callers: Imry Halevi, Senior Associate Director of Athletics, Co...
04/12/2025
Platinum White Paper: LiveU Lightweight Sports Production: A Step Change in Spor...
04/12/2025
London to Riyadh: DAZN brings the boxing glamour to new production levels for Be...
04/12/2025
Analysis: Paramount bets on the battering ram' with Champions League play By Callum McCarthy, Editor-at-Large
Tuesday, December 2, 2025 - 10:12
Print ...
04/12/2025
Space City Home Network Launches SCHN DTC App for Astros and RocketsThe Rockets and Astros were previously the lone NBA and MLB teams without a DTC appBy Jason...
04/12/2025
SVG Summit 2025 Preview: Content Workflows Workshop Spotlights Evolution of Spor...
04/12/2025
New Sponsor Spotlight: Geotech's Patrick Wambold On the Unreal Engine Revolu...
04/12/2025
Curt Gowdy Jr. - Master Storyteller, Nationally and RegionallyBy Jason Dachman, Editorial Director, U.S.
Thursday, December 4, 2025 - 1:52 pm
Print This Sto...
04/12/2025
(L-R) Rebecca Lichtenfeld, Mohammadreza Eyni, Sara Khaki, and Judith Helfand att...
04/12/2025
SBS launches Future Frames initiative to support emerging First Nations video ed...
04/12/2025
Coronal mass ejections caused by eruptions on the surface of the sun can have fa...
04/12/2025
Gracenote Content Connect enables media ecosystem to precisely align ad campaigns and programming based on rich content signals
NEW YORK - December 4, 2025 - N...
04/12/2025
Lightware, a global specialist in AV connectivity, is looking back on a year defined by new advancements, strong collaboration and continued growth. Across the ...
04/12/2025
Riedel Communications today announced a new partnership with Haivision, a leading global provider of mission-critical, real-time video networking and visual col...
04/12/2025
Harmonic (NASDAQ: HLIT) and Normann Engineering today announced a major milestone in their strategic collaboration, celebrating 20 successful broadband deployme...
04/12/2025
Creative software developer Foundry today announced Mari 7.5, the latest iteration of its artist-friendly paint toolset that can handle large, detailed assets w...
04/12/2025
Professional Wireless Systems (PWS), a leading provider of wireless audio solutions and RF management, was on site at Dreamforce 2025 in San Francisco providing...
04/12/2025
LTN's purpose-built IP video network brings all-movie diginet to over 100 stations and streaming platforms in just three months while eliminating satellite ...
04/12/2025
Bitmovin, the leading provider of video streaming solutions, today announced a strategic partnership with ThinkAnalytics, the global leader in AI-powered data a...
04/12/2025
The HELM, a global expert in cinematic live broadcast and high-end production workflows, has signed a partnership agreement with Keslow Camera, one of North Ame...
04/12/2025
At ISE 2026, LiveU will showcase its expanded IP-video EcoSystem, enabling broadcasters, sports, production companies and pro-AV professionals to share their st...
04/12/2025
Since the beginning of commercial television, advertising has been a key part of broadcasting. Over the years, the technology for inserting ads into programs ha...
04/12/2025
MUNICH and MILAN Warner Bros. Discovery said HBO Max is expanding into Germany, Italy, Austria, Switzerland, Luxembourg and Liechtenstein on Jan. 13, 2026, and ...
04/12/2025
SAN FRANCISCO AudioShake has launched its first streaming-capable software development kits (SDKs) designed specifically for real-time music detection and copyr...
04/12/2025
NASHVILLE The mobile and REMI production company TNDV has announced that it headed south into Mexico to live-produce the three-day 2025 Zane Grey Championship P...
04/12/2025
BURBANK, Calif. Hollywood Professionals Association Executive Director Phil Kubel has stepped down from the organization to pursue new opportunities, the group ...
04/12/2025
WASHINGTON The Federal Communications Commission said it has closed 2,048 inactive proceedings, the largest number of dormant dockets ever terminated in a singl...
04/12/2025
A new tech blog from Netflix highlights the importance of the AV1 open video codec, which now powers about 30% of the platform's streaming and discusses a v...
04/12/2025
Thursday 4 December 2025
Sky set to co-produce the story behind the world's most famous whale
Image Credit - Free Willy Keiko Foundation
Sky will co-pro...
04/12/2025
Thursday 4 December 2025
Sky Original documentary Murder at the Post Office to ...
04/12/2025
Back to All News
Hugo Silva, Leonor Watling, Esther Acebo and Gorka Otxoa Star ...
04/12/2025
Back to All News
Step Inside the World of Troll 2: VFX Breakdown Featuring Dire...
04/12/2025
OBJECT MATRIX
OverviewObject Matrix
OM Cloud
Quattro
SWARMOverviewSwarmSingle Node Swarm
Ngenea
Pixstor
Swarm Support
Object Matrix Support
Pixstor & N...
04/12/2025
FOX Advertising Announces Plans for 2026 Upfront Presentation Annual Presentation for Advertisers to Take Place Monday, May 11 at New Location, the Historic N...
04/12/2025
Developers, researchers, hobbyists and students can take a byte out of holiday s...
04/12/2025
The ninth series of Dancing with the Stars returns to screens in early
2026 and will be proudly sponsored by Hyundai
Filling our Sunday evenings with glitz an...
04/12/2025
GeForce NOW is decking the digital halls with 30 new games to keep spirits high all month long.
Join the fun with Hogwarts Legacy, the LEGO Harry Potter Collec...
04/12/2025
Scientists find cancer weak spot in backup DNA repair system New findings from Scripps Research reveal how certain tumors survive DNA damage and point to a stra...
03/12/2025
ToolsOnAir Composition Builder 2025 Boilerplate
More Details: The Composition Builder 2025 application for macOS enables TV stations and Live Event broadcast...
03/12/2025
ToolsOnAr just:live pro 2025 Boilerplate
More Details: just:live pro 2025 is a Single Channel Live Production Playout solution for video and static or real-t...
03/12/2025
ToolsOnAr just:play pro 2025 Boilerplate
More Details: just:play pro 2025 is a Single Channel automated 24/7 Master Control playout solution with SD, HD and ...
03/12/2025
ToolsOnAr live:cut 2025 Boilerplate
More Details: live:cut is an option to just:in mac pro 2025 and enables multicamera production workflows for up to 16 cam...
03/12/2025
ToolsOnAir Just In Mac Lite NDI 2025 Boilerplate
More Details: The Just In Mac Lite NDI application is a streamlined media capture solution designed specific...
03/12/2025
ToolsOnAir Just In Mac Lite 2025 Boilerplate
More Details: The Just In Mac Lite application is a streamlined media capture solution designed specifically for...
03/12/2025
ToolsOnAir just:in mac pro 2025 Boilerplate
More Details: just:in mac pro is a macOS-based client-server multichannel capture solution to record SDI, HDMI, N...