Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac

LINK:	https://blogs.nvidia.com/blog/ai-inference-platform/...
	See more stories from nvidia

North America Stories

06/08/2026

Hisense Adds Dolby Vision 2 to Select Models

Share Copy link Facebook X Linkedin Bluesky Email...

06/08/2026

MediaKind to showcase one unified portfolio at IBC2026

At IBC2026, MediaKind will make its first major appearance as a unified global powerhouse in video, showcasing one of the world's most comprehensive video i...

06/08/2026

Big Blue Marble to unveil AI-assisted piracy detection an...

Big Blue Marble (#5.A63) will demonstrate how its integrated technology and operational expertise help media companies scale premium services with less complexi...

06/08/2026

NIH expected to award Scripps Research nearly $4.2 million over 5 years to advance tools for vaccine design

LA JOLLA, CA-Scripps Research has received more than $500,000 in first-year fund...

06/08/2026

Improving vaccine design for Ebola, HIV and more

LA JOLLA, CA-Viruses are masters at invading our cells thanks to specialized proteins that coat their surfaces. When scientists design vaccines, they often crea...

06/08/2026

How a chemical reaction triggers brain inflammation in Alzheimer's disease

LA JOLLA, CA-The brain has its own immune system, which detects threats and mounts a defense. A growing body of evidence has shown that in Alzheimer's disea...

06/08/2026

Jin-Quan Yu elected to the National Academy of Sciences

LA JOLLA, CA-Scripps Research chemist Jin-Quan Yu has been elected to the National Academy of Sciences (NAS), one of the highest honors a scientist can achieve....

06/08/2026

Scripps Research ranks third in 2026 Cure Innovation Index

LA JOLLA, CA-Scripps Research ranked third in the inaugural 2026 Cure Innovation Index recognizing the top-performing institutes and centers across the United S...

06/08/2026

Scripps Research chemist Benjamin Cravatt elected to American Philosophical Society

Benjamin Cravatt, the Gilula Chair of Chemical Biology and a professor of chemis...

06/08/2026

Scripps Research immunologist Dennis Burton elected to American Academy of Arts and Sciences

LA JOLLA, CA-Dennis Burton, professor and the James & Jessie Minor Chair in Immu...

06/08/2026

How changes to proteins can alter drug interactions for new precision therapies

LA JOLLA, CA-Inside every human cell, proteins are constantly being tagged with small chemical modifications after they're produced. Known as post-translati...

06/08/2026

Scripps Research establishes endowed chair honoring renowned structural biologist Ian Wilson

LA JOLLA-Scripps Research has established the Ian Wilson Endowed Chair, a new fa...

06/08/2026

Scripps Research's Skaggs Graduate School awards doctoral degrees to 34th graduating class

Scripps Research's Skaggs Graduate School of Chemical and Biological Science...

06/08/2026

Scripps Research chemist Jin-Quan Yu is named a Fellow of the Royal Society

LA JOLLA, CA-Professor Jin-Quan Yu of Scripps Research has been elected to the Fellowship of the Royal Society, the U.K.'s national academy of sciences and ...

06/08/2026

Experimental HIV vaccine achieves a long-sought goal

LA JOLLA, CA-For years, researchers have been hoping for vaccines that protect people against not just one strain of HIV, but every strain of the quickly mutati...

06/08/2026

Calibr-Skaggs advances CLF065, a regenerative GLP-2 therapy, into two Phase 2 IBD studies

LA JOLLA, CA-The Calibr-Skaggs Institute for Innovative Medicines, the nonprofit...

06/08/2026

Chemists snap together complex 3D molecules from highly reactive radicals'-without losing their shape

LA JOLLA, CA-Building the complex 3D molecules needed for new medicines has alwa...

06/08/2026

A fentanyl countermeasure that adapts to combat future black-market drugs

LA JOLLA, CA-Fentanyl and related variants of the synthetic opioid kill more Americans each year than car accidents and gun violence combined. In too-high doses...

06/08/2026

Two Scripps Research assistant professors named 2026 Baxter Young Investigators

LA JOLLA, CA-What do decoding communication between organs and reimagining the future of genome editing have in common? They're among the scientific questio...

06/08/2026

Calibr-Skaggs awarded $5.1M by NIH to develop long-acting hepatitis B virus therapy

LA JOLLA, CA-Of the 1.2 million people living with HIV in the United States, app...

06/08/2026

Lab studies explain how new cancer drug works as it enters patient testing

LA JOLLA, CA-For some people, cancer immunotherapies are life-changing. These treatments can turn the body's own immune system against a tumor, either elimi...

06/08/2026

Newly identified molecule strengthens the eye's response to damage in retinal disease

LA JOLLA, CA-Many conditions that cause vision loss share a common feature: the ...

06/08/2026

Molecular scissors caught in action: A structural blueprint for RNA therapeutics

LA JOLLA, CA-RNA interference is a natural mechanism for living cells to control whether specific genes are being used or not. Crowned with the 2006 Nobel Priz...

06/08/2026

Immune molecule may drive excessive drinking in alcohol use disorder

LA JOLLA, CA-The drugs that keep rheumatoid arthritis in check may one day help people stop drinking. A new Scripps Research study shows that an anti-inflammato...

06/08/2026

Back in action: Researchers make drug-resistant bacteria vulnerable again

LA JOLLA, CA-Antibiotic resistance is one of the most urgent threats to global health, linked to an estimated 4.7 million deaths worldwide in 2019 alone. As mor...

06/08/2026

Scripps Research scientists demonstrate a faster, cheaper route to making critical drugs using common table sugar

LA JOLLA, CA-Some of the world's best-selling diabetes drugs depend on a che...

06/08/2026

Scripps Research scientists awarded $2M to advance global disease surveillance

LA JOLLA, CA-Detecting infectious disease threats early and responding quickly can dramatically alter the course of an infectious outbreak. Technologies such as...

06/08/2026

Joan Pulupa joins Scripps Research faculty to study the organization of DNA in brain cells and its links to neurodegeneration

LA JOLLA, CA-Molecular biophysicist Joan Pulupa will join Scripps Research in Ja...

06/08/2026

Scripps Research scientists train the immune system to make antibodies against numerous HIV strains

LA JOLLA, CA-HIV is globally so diverse, consisting of hundreds of thousands of ...

06/08/2026

ASG Ups Michele Ferreira to Chief Business Officer

Share Copy link Facebook X Linkedin Bluesky Email...

06/08/2026

WNBC and WNJU Expand New York Giants Deal

Share Copy link Facebook X Linkedin Bluesky Email...

06/08/2026

Utah Scientific to Highlight NBOSS at IBC 2026

Share Copy link Facebook X Linkedin Bluesky Email...

06/08/2026

Disney, TikTok Ink Global Short-Form Content-Sharing Deal

Share Copy link Facebook X Linkedin Bluesky Email...

06/08/2026

Kane Peterson Joins QuickLink's North America Team

Share Copy link Facebook X Linkedin Bluesky Email...

06/08/2026

FCC Returns $881 Million in Unused TV Broadcaster Relocation Funds

Share Copy link Facebook X Linkedin Bluesky Email...

06/08/2026

PlayBox Neo to highlight secure and scalable workflow del...

At this year's SET EXPO, PlayBox Neo will present recent innovations across its PlayBox Neo Suite and integrated range of broadcast media solutions. By show...

06/08/2026

Modern Streaming Solutions Private Limited Partners with...

Live demo at IBC2026 VisualOn Booth, Hall 5, Stand A55 Amsterdam, Netherlands Modern Streaming Solutions Private Limited, a rising force in India's dig...

06/08/2026

How Karukera Studio Built a Media Production Hub with SNS EVO

How Karukera Studio Built a Media Production Hub with SNS EVO Melanie Ciotti August 5, 2026 0 Comments Hero images displays Karukera Studio in Sainte-...

06/08/2026

NHK drama series Rosanjin no Kamado shot on PYXIS 6K

NHK drama series Rosanjin no Kamado shot on PYXIS 6K Brie Clayton August 5, 2026 0 Comments Blackmagic PYXIS 6K and DaVinci Resolve Studio capture the...

06/08/2026

Camera Match: AutoSetup 3d Plugin / script for Photomontages- Cinema 4D

Camera Match: AutoSetup 3d Plugin / script for Photomontages- Cinema 4D Jamie Cardoso August 5, 2026 0 Comments If you do any kind of Architectural Vi...

06/08/2026

Into the Omniverse: How Open World Models Push the Frontier of Physical AI

Editor's note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners and enterprises can transform their workflows u...

06/08/2026

GeForce NOW Shakes Up August With 26 New Games

August is here, bringing 26 new games for GeForce NOW members. Command the seas in World of Warships: Legends and discover what's next in the GeForce NOW ...

05/08/2026

SVG Regional Sports Production Summit 2026: All Sessions Now Available to Watch on SVG PLAY

RSNs, teams, leagues, and streamers explore the present and future of local spor...

05/08/2026

SVG New Sponsor Spotlight: Hitachi Vantara's Lenny Khaitov on Building Resilient Data Infrastructure for Sports Production

As sports-production workflows generate larger volumes of unstructured data and ...

05/08/2026

NESN's Women's Celebration Night Returns With Expanded Focus on the Next Generation

Three aspiring broadcasters will contribute on-camera and inside the production ...

05/08/2026

Union County Paints an Authentic Picture of Recovery and Community

Adam Meeks attends the premiere of his film Union County, an official selection of the 2026 Sundance Film Festival. (Photo by Jemal Countess/Sundance Institut...

05/08/2026

Australia's retirement sector advertising rises more than 10% as millions plan for life after work according to Nielsen

Advertisers invested $53.6 million in the retirement sector in the last 12 month...

05/08/2026

New Nielsen data shows IKEA's New Zealand launch attracted more than 243,000 shoppers in the last month

Nielsen CMI and Ad Intel data reveals who's visiting IKEA, where else they s...

05/08/2026

Nielsen expands Four-Screen Ad Deduplication across CTV in Japan

Media buyers and sellers can now compare deduplicated campaign reach across computer, mobile, connected TV and linear TV across all measurable CTV publishers in...

05/08/2026

Stranger Things,' Bluey,' The Pitt' Top Nielsen's Streaming Charts Through First Half of 2026

Stranger Things' Leads All Streaming Titles with 23.3 Billion Minutes in Fir...

View most recent headlines