Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

Most recent headlines

25/11/2025

Tracy Bonareri Onchoke: Winner, Young Journalist Award 2025

Tracy Bonareri Onchoke, an investigative journalist from Kenya is the winner of the Thomson Foundation's Young Journalist Award 2025. The 26-year-old-sele...

25/11/2025

SVG All-Stars: Blayke Scheer, Senior Director, Creative Content, YES Network

SVG All-Stars: Blayke Scheer, Senior Director, Creative Content, YES NetworkThe Indiana alum has turned storytelling into an artform for more than two decadesBy...

25/11/2025

Op-Ed: With FCC's C-Band Auction on the Horizon, Broadcasters Need Proven, Cost-Effective Alternatives

Op-Ed: With FCC's C-Band Auction on the Horizon, Broadcasters Need Proven, C...

25/11/2025

Analysis: Is Baller League Really the Future of Sport?

Analysis: Is Baller League really the future of sport? By Callum McCarthy, Editor-at-Large Tuesday, November 25, 2025 - 10:10 Print This Story With KSI on...

25/11/2025

Platinum Whitepaper: The Growth of Broadcast in the World of Major Large Scale Events with SOS Global

Platinum Whitepaper: The Growth of Broadcast in the World of Major Large Scale E...

25/11/2025

SVG Summit 2025 Preview: SVG Women's Sports Workshop

SVG Summit 2025 Preview: SVG Women's Sports WorkshopBy Samantha Gabay Tuesday, November 25, 2025 - 10:27 am Print This Story | Subscribe Story Highlig...

25/11/2025

SVG New Sponsor Spotlight: CacheFly's Matt Levine on the Evolving Role of the CDN and Prioritizing Throughput

SVG New Sponsor Spotlight: CacheFly's Matt Levine on the Evolving Role of th...

25/11/2025

Peacock's EA SPORTS Madden NFL Cast Levels Up on Thanksgiving With SkyCam as the Primary Angle and More Madden Elements

Peacock's EA SPORTS Madden NFL Cast Levels Up on Thanksgiving With SkyCam as...

25/11/2025

Sauna Is an Intimate Exploration of Queer Love and Identity

Mathias Broe attends the 2025 Sundance Film Festival premiere of Sauna at Library Center Theatre. (Photo by Michael Hurcomb/Shutterstock for Sundance Film Fes...

25/11/2025

5 Reasons to Try Spotify Premium This Holiday Season

The best playlists, podcasts, and audiobooks bring a little extra magic to your daily routine. With new features and offerings, Spotify Premium delivers even mo...

25/11/2025

New Study Reveals Australians Love Discovering New Music

Comprehensive new research confirms what we already knew: Australian music fans love the quality, quantity, and access they have to new and local music on strea...

25/11/2025

Why Use a SIM Card With The SNYPER-5G

Applicable Products Objectives The purpose of this application note is to give a brief background on 5G (NR) wireless communication an explain the reason a SN...

25/11/2025

Lionsgate and Nielsen expand partnership to deliver first-ever combined FAST channel and digital network measurement

Nielsen will now measure both Lionsgate's FAST channel MovieSphere and Movie...

25/11/2025

AP Switches to DaVinci Resolve Studio for Global News Production

FREMONT, Calif. Blackmagic Design has announced that The Associated Press (AP) has completed the transition of its global video editing platform to DaVinci Reso...

25/11/2025

Berklees Inaugural Nat King Cole and Natalie Cole Scholarship Awarded to Paris Pineyro

Berklees Inaugural Nat King Cole and Natalie Cole Scholarship Awarded to Paris P...

25/11/2025

Traditional TV Players Gained Viewers in October: Nielsen Gauge

NEW YORK NFL and college football coverage, the MLB postseason and the new fall broadcast-TV season contributed to major gains for traditional media companies a...

25/11/2025

Tower Products CEO Jim Veltrie to Retire Dec. 30

SAUGERTIES, N.Y. Tower Products, a manufacturer and distributor of pro video and audio equipment here, said President and CEO Jim Veltrie will retire from the c...

25/11/2025

Sinclair Makes Unsolicited Bid to Buy Scripps at $7 a Share

Following last week's disclosure that it had acquired a 8.2% stake in E.W. Scripps, Sinclair has filed papers with the Securities and Exchange Commission pr...

25/11/2025

VEON's QazCode and MeetKai Sign Agreement to Power National LLM Training and Local-Language Agentic Services Across VEON Markets

25 Nov 2025 VEON's QazCode and MeetKai Sign Agreement to Power National LLM...

25/11/2025

UKTV acquires three shows from Paramount Global Content Distribution for U, U&W and U&alibi

UKTV has acquired a high-profile slate of US dramas from Paramount Global Conten...

25/11/2025

Will Sharpe, Paul Bettany and Gabrielle Creevy star in a spectacular five-part event series Amadeus: Full Trailer Released

A symphony of genius, rivalry and vengeance, boldly reimagined from Peter Shaffe...

25/11/2025

Bradford Young named 2025 FilmLight Colour Awards Jury President'

Article courtesy of Cinematography World Read the article FilmLight has finalised the prestigious 2025 FilmLight Colour Awards jury and welcomed award-winning...

25/11/2025

Correccin de color en Chespirito: Sin Querer Queriendo

Article courtesy of Prensario Read the article La serie fue dirigida por Juli n de Tavira, Rodrigo Santos, y David Leche Ruiz, con direcci n de fotograf a a...

25/11/2025

Nosferatu,' Sinners,' The Studio' and Severance' Colourists Nominated for FilmLight Colour Awards

Article courtesy of The Hollywood Reporter Read the article The awards, celebr...

25/11/2025

Harbor rolls out Nara globally

Article courtesy of Televisual Read the article Already live in Los Angeles and rolling out in New York and London, Nara gives producers, colourists, conform ...

25/11/2025

ARTONE FILM integrates Baselight M

Article courtesy of Digital Media World Read the article ARTONE post-house in Tokyo is the first facility in Japan to integrate Baselight M, choosing its prec...

25/11/2025

Inside the Secret World of Hollywood's Master Colourists

Article courtesy of The Hollywood Reporter Read the article Once hidden in post-production suites, the artists who make movies and TV shows look the way they ...

25/11/2025

FilmLight Colour Awards The Winners

Article courtesy of Deadline Read the article The Brutalist' & Bad Bunny's Nuevayol' Music Video Among 2025 FilmLight Colour Award Winners - Cam...

25/11/2025

FLUX.2 Image Generation Models Now Released, Optimized for NVIDIA RTX GPUs

Black Forest Labs - the frontier AI research lab developing visual generative AI models - today released the FLUX.2 family of state-of-the-art image generation ...

24/11/2025

HBO's The Shuffle' Reveals Longtime Connection of Sports and Entertainment

HBO's The Shuffle' Reveals Longtime Connection of Sports and Entertainm...

24/11/2025

2025 Sports Broadcasting Hall of Fame: Hiroshi Kiriyama, Sony Broadcast (and Industry) Technology Icon

2025 Sports Broadcasting Hall of Fame: Hiroshi Kiriyama, Sony Broadcast (and Ind...

24/11/2025

SVG Summit 2025 Preview: FIFA, NBC Olympics, Fox Sports, CBS Sports, Netflix, NFL, NBA, MLB, USTA Power Dec. 16 Conversations

SVG Summit 2025 Preview: FIFA, NBC Olympics, Fox Sports, CBS Sports, Netflix, NF...

24/11/2025

Case Study: YES Network Streamlines Broadcast Operations with Beam Dynamics

Case Study: YES Network Streamlines Broadcast Operations with Beam DynamicsBy SVG Staff Monday, November 24, 2025 - 11:18 am Print This Story | Subscribe ...

24/11/2025

Platinum White Paper: More of Everything: How Broadcasters are Changing Their Approach to Meet Rises in Consumer Demand with Calrec

Platinum White Paper: More of Everything: How Broadcasters are Changing Their Ap...

24/11/2025

Versant Media USA Sports President Matt Hong on How Versant Has Best of Both Worlds: a Start-Up Mentality and $7 Billion Revenues

Versant Media USA Sports President Matt Hong on How Versant Has Best of Both Wor...

24/11/2025

SVG Sit-Down: NABA Director-General Rebecca Hanson on How FCC's C-Band Auction Will Impact Broadcasters

SVG Sit-Down: NABA Director-General Rebecca Hanson on How FCC's C-Band Aucti...

24/11/2025

SVG New Sponsor Spotlight: Bolin Technology's Sapan Doshi on the Proliferation of PTZ Cameras for Sports Venues and Broadcasters

SVG New Sponsor Spotlight: Bolin Technology's Sapan Doshi on the Proliferati...

24/11/2025

Spotify and Acne Studios Welcome Robyn Back to the Stage in Los Angeles

Robyn made her long-awaited return to the stage this week, as Spotify and Acne Studios brought friends and top fans together for an unforgettable evening at the...

24/11/2025

Bara Is Back: The New Spotify Camp Nou Opens Its Gates

After more than two years of redevelopment, FC Barcelona returned to its spiritual home on November 22, hosting Athletic Club in the first La Liga match at the ...

24/11/2025

L3Harris' Next-Generation Weather Imager Ready to Deliver Life-Saving Weather Data Under Critical NOAA Satellite Program

The L3Harris next-generation imager for NOAA's GeoXO satellite system will c...

24/11/2025

JioStar and Nielsen Unveil Breakthrough Cross-Screen Measurement Study, Redefining Advertising Effectiveness in Live Sports

Mumbai - November 24, 2025 - In a first-of-its-kind initiative, JioStar, in coll...

24/11/2025

Fall Sports Plus Fresh Broadcast Slate Equals Big Gains, Reshuffled Company Rankings in Nielsen's October Media Distributor Gauge

Disney Achieves Largest Monthly Share Increase, Followed by FOX and Paramount, w...

24/11/2025

Sinclair Promotes Sean LaRose to VP and General Manager in Rochester

ROCHESTER, N.Y. Sinclair said it has elevated Sean LaRose, director of sales at WUHF and partner station WHAM here, to vice president and general manager, effec...

24/11/2025

Chaos and Connection: Meet the Unfiltered Cast of Reality Dating Series Badly in Love' as Trailer Debuts

Back to All News Chaos and Connection: Meet the Unfiltered Cast of Reality Dati...

24/11/2025

Kyivstar Launches Starlink Direct to Cell Satellite Connectivity in Ukraine

24 Nov 2025 Kyivstar Launches Starlink Direct to Cell Satellite Connectivity in Ukraine Today's Launch Makes Ukraine the First Country in Europe Where Star...

24/11/2025

AI On: 3 Ways Specialized AI Agents Are Reshaping Businesses

Editor's note: This post is part of the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copi...

24/11/2025

THREE NEW PRO DANCERS JOIN THE CAST OF DANCING WITH THE STARS FOR 2026

Ahead of the new ninth series of Dancing with the Stars, kicking off in January 2026, RT and Shinawil have announced the arrival of three new faces to the hit ...

22/11/2025

Deadline Extended for 2025 Best in Market Awards

The deadline for entries for the 2025 Best in Market Awards has been extended to 23:59 PST on November 28, 2025....

22/11/2025

Clear-Com Unveils 4-Channel HelixNet Beltpack - Expanding...

Clear-Com announced the upcoming launch of its 4-Channel HelixNet beltpack, a next-generation advancement of its widely used 2-channel model. The new beltpack...