
Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.
NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.
AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.
But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.
Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.
Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.
NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:
NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.
NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.
NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.
Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.
With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.
Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:
Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service
Google Cloud's Vertex AI, Google Kubernetes Engine
Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service
Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine
Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.
For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.
Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.
The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.
From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.
Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.
Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.
Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
02/05/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
01/05/2026
January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...
01/04/2026
January 4 2026, 18:00 (PST) DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION
Douyin Users Can Now Create And Share Videos With Stun...
12/02/2026
Chyron unveils PRIME 5.3, the latest software release of the company's powerful engine for live production graphics. PRIME 5.3 delivers the first official i...
12/02/2026
The vendor's VP of Product Management explains how quality assurance, monito...
12/02/2026
LTN announces the appointment of three experienced executives to lead its new Technology organization: Michal Miskin-Amir as EVP and Head of Technology, Jonatha...
12/02/2026
Riedel Communications has officially opened a new office in Kuala Lumpur, Malays...
12/02/2026
Grass Valley has won a competitive NATO-wide tender to provide the new camera system for NATO's main broadcast studio at its Brussels headquarters. The proj...
12/02/2026
Canon U.S.A announces that the vast majority of broadcast lenses utilized on the NBC live broadcast for the Big Game between New England and Seattle on Sunday w...
12/02/2026
The National Basketball Association (NBA) and NBC Sports announce the entertainm...
12/02/2026
The International Olympic Committee (IOC) announces that beIN MEDIA GROUP ( beIN ), the leading global sports, entertainment and media organisation, has secured...
12/02/2026
The Big 12 Conference and ASB GlassFloor introduces a full LED video sports floor that will debut at the 2026 Phillips 66 Big 12 Men's and Women's Baske...
12/02/2026
ESPN announces Year of the Super Bowl, a sweeping 12-month, multi-platform cel...
12/02/2026
Continuing its commitment to serving the faith-based broadcast and live event community, mobile production company TNDV, a division of Live Media Group, will hi...
12/02/2026
The production team of the long-running German investigative series Achtung Abz...
12/02/2026
Vizrt announces the launch of four Campus Stadium Production Bundles, designed t...
12/02/2026
At NAB Show, LiveU will showcase its broadest IP-video EcoSystem to date, design...
12/02/2026
Welcome to the Sports Video Group's new interview series, Follow the Money, ...
12/02/2026
400 Gbps of bandwidth, layered redundancy, and mobile-first connectivity powered...
12/02/2026
Valentine's Day often comes with a soundtrack. In fact, Spotify data shows that more people used Blend, our shared playlist feature, on February 14, 2025, t...
12/02/2026
Some days you want your music to reflect a specific feeling, memory, or vibe that goes beyond a single artist or genre. You want to do more than listen. You wan...
12/02/2026
Our Medicine S2: Frontline Medicine Through A Blak Lens
12 February, 2026
Media releases
A Bigger, Bolder Second Series showcasing First Nations Frontline ...
12/02/2026
L3Harris' VAMPIRE system fires Thales Belgian-made 70 MM rocket from an FZ60...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
The production team of the long-running German investigative series Achtung Abzocke recently upgraded its cameras for the show's 12th season. The objectiv...
12/02/2026
Leading provider of video streaming solutions, Bitmovin, has appointed Ian Baglow as Co-CEO alongside existing CEO and Co-Founder Stefan Lederer. Under this str...
12/02/2026
Vizrt, a leading viewer engagement platform and a trusted expert in live production technologies, today announces the launch of four Campus Stadium Production B...
12/02/2026
Strategic agreement to deliver S3 cloud storage in Switzerland with full data sovereignty and local control including at the level of individual cantons plu...
12/02/2026
Mad About Video is a leading specialist in video for live events and installations throughout Malta. In operation since 2011, it has evolved from a company focu...
12/02/2026
JAGGAER, a global leader in digital procurement and supplier collaboration solutions, today announced the successful delivery of a procurement digitalization pr...
12/02/2026
At NAB Show, LiveU will showcase its broadest IP-video EcoSystem to date, designed to help broadcasters and content creators embrace digital first operations, d...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
The six-part crime drama, created by Claire Oakley and produced by Little Door P...
12/02/2026
Wuppertal February 12, 2026
Riedel Opens Kuala Lumpur Office to Strengthen Glo...
12/02/2026
Back to All News
Netflix unveils the trailer for That Night
Entertainment
12 February 2026
GlobalSpain
Link copied to clipboard
WATCH THE TRAILER
DOWNLOA...
12/02/2026
The Digital Product Passport: A New Era of Transparency and Sustainability
Arvato Systems supports companies in getting started with the digital product passp...
12/02/2026
At leading institutions across the globe, the NVIDIA DGX Spark desktop supercomputer is bringing data center class AI to lab benches, faculty offices and studen...
12/02/2026
A diagnostic insight in healthcare. A character's dialogue in an interactive...
12/02/2026
The GeForce NOW sixth-anniversary festivities roll on this February, continuing a monthlong celebration of NVIDIA's cloud gaming service.
This week brings ...
12/02/2026
TIME100 Health list features Scripps Research Professor Darrell Irvine Irvine is recognized for his work in empowering the immune system to fight disease, which...
11/02/2026
FYI: Phone Support Maintenance One thing we pride ourselves on here at Utah Scientific is our 24-hour support included with our signature 10-year hardware warra...
11/02/2026
Leading provider of video streaming solutions, Bitmovin, has appointed Ian Baglow as Co-CEO alongside existing CEO and Co-Founder Stefan Lederer. Under this str...