
Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.
NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.
AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.
But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.
Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.
Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.
NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:
NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.
NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.
NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.
Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.
With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.
Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:
Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service
Google Cloud's Vertex AI, Google Kubernetes Engine
Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service
Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine
Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.
For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.
Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.
The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.
From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.
Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.
Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.
Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
06/09/2026
June 9 2026, 23:00 (PDT) Dolby and MagentaTV Bring Fans Closer to the FIFA Worl...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
11/06/2026
The immense size of the tourney and its Atlantic-spanning operation also disting...
11/06/2026
Nielsen has released a new soccer fandom consumer research report, The Fans Behind The Game: FIFA World Cup 2026 Edition, examining the soccer audience in the...
11/06/2026
Telemundo will launch its FIFA World Cup 2026 coverage on Thursday, June 11 with...
11/06/2026
FuboTV Inc. has announced a distribution agreement with NBCUniversal. Fubo customers can now stream Telemundo and Universo, with NBC Sports Network (NBCSN), NBC...
11/06/2026
DAZN has announced its in-app features for FIFA World Cup 2026 coverage in Spain...
11/06/2026
Roblox has released the 2026 Roblox Digital Expression Report: Wave 4 - Sports D...
11/06/2026
FIFA has unveiled DNA, the Official FIFA World Cup 2026 Anthem, performed by A...
11/06/2026
ESPN will provide English- and Spanish-language news and information coverage of FIFA World Cup 2026 across its U.S. media platforms from June 11 through July 1...
11/06/2026
The latest product of the outstanding RIT Sports Network program, this recent grad from Long Island is carving out a promising path in broadcast engineering
In...
11/06/2026
DAZN has announced a multi-year agreement to make DSPORTS channels available to ...
11/06/2026
Laura Dern at the 1986 Sundance Institute Directors Lab (Photo by Eric Edwards)
By Lucy Spicer
It takes a village to bring together the Sundance Institute lab...
11/06/2026
As podcast formats evolve in the streaming era, podcasting needs updated, transp...
11/06/2026
As Spotify's global RADAR program enters its sixth year in Italy, a new class of artists is stepping into the spotlight. Today, we're announcing the six...
11/06/2026
Pride Month is a time for celebration, reflection, and amplifying the diverse stories and perspectives from the LGBTQIA+ community that enrich our world. To hel...
11/06/2026
First in new line of muted string libraries
VSL have just announced the launch of two new string libraries that represent the first two instalments in a new...
11/06/2026
New colour option for 61-key Launchkey MK4
At Superbooth 2025, Novation introduced the Launchkey Mini 37 White and Launchkey 49 White, bringing an additiona...
11/06/2026
Larger, but still compact!
Arturia's popular compact MIDI controller keyboard is now available in a, well, slightly less compact version! The new MiniLa...
11/06/2026
Eurosatory 2026: Rohde & Schwarz shapes the new-generation battlefield Rohde & Schwarz unveils next generation SIGINT/EW and CUAS solutions on uncrewed system...
11/06/2026
Rohde & Schwarz unveils NEMACS - Directional, ultra secure connectivity for the ...
11/06/2026
MTI FILM acquires Mango/New Edit Posted by MTI Film on June 10, 2026
LOS ANGELES, CA - June 2026 - MTI FILM, the multiple Emmy Award winning Hollywood post-p...
11/06/2026
Study underscores the need for authoritative content intelligence to build trust...
11/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/06/2026
Maxon Brings Real-Time Architectural Visualization to AIA26 With New Redshift fo...
11/06/2026
ABC Kid's Caper Crew Shoots Australian Adventure with Blackmagic Design
Brie Clayton June 11, 2026
0 Comments
DP Judd Overton and team bring Wes A...
11/06/2026
PTZOptics, and LayerJot today announced live demonstrations at InfoComm 2026 showing how prompt-based AI, robotic camera control, and high-performance computing...
11/06/2026
Lightware, an industry leader in signal management, announces the release of GPIO-Button-10S, a dedicated control interface enabling straightforward press-to-a...
11/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/06/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
11/06/2026
Kiloview, a leading innovator in AV-over-IP video solutions, will return to InfoComm 2026 (Booth# N8327) with broadcast-grade AV-over-IP solutions designed for ...
11/06/2026
Australian Games Industry Glossary of Terms 10 June 2026
From DAU and EULA to COT and QADE, here's a list of game industry terms, industry jargon and their...
11/06/2026
Berklee's Tonya Butler Named Music Business Educator of the Year The Music Business Association honored Butler at its annual Bizzy Awards.
June 10, 2026
...
11/06/2026
Ann Mincieli to Receive Honorary Doctorate at Berklee NYC Graduate Commencement The five-time Grammy-winning engineer and producer, known for her longstanding...
11/06/2026
Thursday 11 June 2026
Daisy May Cooper rallies the nation ahead of ICC Women's T20 World CupTurn on cookies to view this content. Go to Privacy options and...
11/06/2026
Back to All News
Hadewych Minis and Geert van Rampelberg to Star in New Netflix...
11/06/2026
Back to All News
Official Trailer for Anime Adaptation of Thunder 3' Unvei...
11/06/2026
Summer solstice shows from C il House and Late Date from 9pm on Saturday 20 Jun...
11/06/2026
The GeForce NOW summer sale kicked off today with limited-time savings of up to ...
10/06/2026
DAZN-owned digital-media company launches three fan-first series leaning into cr...
10/06/2026
Clear-Com has announced the appointment of Jason Dino as Southwest Regional Sales Manager USA, covering Southern California and the Southwest region. Dino joins...
10/06/2026
An 11% decrease in number of global broadcast deals reflects the organization...
10/06/2026
The Women Without Boundaries Awards recognize women whose work is advancing the future of media, broadcast, AV, workplace technology, digital experience, and re...
10/06/2026
Today is match day minus two for FIFA and HBS. On Thursday, there will be two ma...
10/06/2026
SES is supporting broadcast distribution of the world's biggest football tou...
10/06/2026
NDI has announced that BirdDog has become the first hardware manufacturer to achieve full NDI 6.3 compatibility across its complete lineup of cameras, encoders,...
10/06/2026
Vince Caputo and Scott Carter, winners of the 2026 Sports Emmy for Outstanding Post Produced Audio have been announced as presenters for the 2026 SVG Advanced A...