
Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.
NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.
AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.
But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.
Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.
Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.
NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:
NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.
NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.
NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.
Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.
With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.
Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:
Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service
Google Cloud's Vertex AI, Google Kubernetes Engine
Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service
Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine
Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.
For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.
Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.
The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.
From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.
Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.
Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.
Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
North America Stories
24/01/2026
Masami Kawai Selected as the 2026 Merata Mita Fellow; Isabella Madrigal and Tsanavi Spoonhunter Named 2026 Graton Fellows During Native Forum Celebration in Par...
24/01/2026
The MC-55A Peregrine aircraft will give the Royal Australian Air Force information superiority and serve as strategic assets for future Australian Defence Force...
24/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
24/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
24/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
23/01/2026
WWE's Virtual Production Playbook: How the Professional Wrestling Super Powe...
23/01/2026
Tight set up: Squeezing the PSA's Tournament of Champions into Grand Central...
23/01/2026
Evolving production: The PSA on bringing squash to more viewers at the Tournamen...
23/01/2026
AFC Championship Preview: Behind the Scenes With NFL on CBS' Producer Jim R...
23/01/2026
NFC Championship Preview: FOX Sports Director Rich Russo Talks Technology, Story...
23/01/2026
Coalition military forces operating across the vast geography of the Indo-Pacific rely on interoperable, secure data links to share intelligence, surveillance a...
23/01/2026
Artist rendering of L3Harris Technologies' AERIS next generation airborne early warning and control solution....
23/01/2026
The U.S. Air Force AMP Increment II aircraft at L3Harris' facility in Waco, Texas. L3Harris has modernized C-130 avionics since 1985, delivering digital coc...
23/01/2026
Multi-year deal utilizes Nielsen's full suite of local audience marketing in...
23/01/2026
New York, NY January 21, 2026 - Neptune BidCo US Inc. (the Issuer or the Co...
23/01/2026
ALT Systems, Inc., a leading system integrator and technology solutions provider for the media and entertainment industry, today announced the launch of PixSpan...
23/01/2026
The Alliance for IP Media Solutions (AIMS) will mark a major milestone for Pro AV-over-IP at ISE 2026 with the official launch of Internet Protocol Media Experi...
23/01/2026
KRK, a leader in professional studio monitoring for nearly four decades, will unveil the all new V Series Five at the 2026 NAMM Show, offering attendees an excl...
23/01/2026
SMPTE , the home of media professionals, technologists, and engineers, today announced Steve LLamb, Vice President of Technology Standards and Solutions for Cin...
23/01/2026
IBC today announces that the call for Technical Papers is now open for the IBC2026 Conference, inviting innovators from across the global media, entertainment, ...
23/01/2026
Grass Valley has announced that Asharq News, the leading multi-platform Arabic news service owned by the Saudi Research & Media Group (SRMG), has expanded its c...
23/01/2026
At the SET Expo 2025, a consortium including Qualcomm Technologies, Inc., Motorola, and Rohde & Schwarz successfully demonstrated a real-world proof-of-concept ...
23/01/2026
Dalet, a leading technology and service provider for media-rich organizations, today announced the appointment of Gwen Braygreen as Executive Vice President and...
23/01/2026
Alfalite, Brainstorm, Dejero, Domo Broadcast Systems, FOR-A, KitPlus, Ontario Soluciones and RGB Spectrum partner to demonstrate revolutionary integrated soluti...
23/01/2026
Vizrt, the leader in live production technology revolutionizing viewer experience and engagement, expands its team to ignite a new era of professional-grade pro...
23/01/2026
LOGIC media solutions, an Amazon Web Services (AWS) Advanced Partner specialising in AWS-based media workflows, is one of the official launch partners of the ne...
23/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
23/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
23/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
23/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
23/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
23/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
23/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
23/01/2026
Back to All News
Pavane' Drops Teaser Ahead of February 20 Debut - A Tende...
22/01/2026
SVG Students To Watch: Chuck Luarasi, Curry CollegeThe Massachusetts native is cutting his teeth with Harvard Athletics, Cape Cod Baseball LeagueBy Brandon Cost...
22/01/2026
Follow the Money, Episode 4: Talking Tech, Sports, and Private Capital With Sam ...
22/01/2026
Fever pitch: WRC is back for the start of the 2026 season with Rallye Monte-Carl...
22/01/2026
FloSports Prepares To Broadcast Outdoor Hockey Game Amidst Brutally Cold Tempera...
22/01/2026
As Paramount Enters the Octagon, UFC's Craig Borsari Previews Production Pl...
22/01/2026
By Jordan Crucchiola
It's a desire you hear so often among those in filmmaking circles. I just want to make cool stuff with my friends. With the NEXT selec...
22/01/2026
Brittany Shyne attends the 2025 Sundance Film Festival premiere of Seeds at The Ray Theatre on January 25, 2025, in Park City, UT. (Photo by Robin Marshall/Sh...
22/01/2026
Joel Edgerton and Felicity Jones appear in Train Dreams by Clint Bentley, an off...
22/01/2026
MELBOURNE, Fla., Jan 22, 2026 - L3Harris Technologies (NYSE: LHX) has received a...
22/01/2026
Strategic hire marks latest milestone in Gracenote's continued expansion into CTV advertising & monetization
New York - January 21, 2026 - Nielsen's Gr...
22/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
22/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
22/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
22/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
22/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...
22/01/2026
Share Share by:
Copy link
Facebook
X
Linkedin
Pinterest
Bluesky
Email...