Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

North America Stories

27/11/2025

Vizrt Launches Viz One 8.1 With AI-Powered Features

LONDON Vizrt has added several AI-driven advanced features offering improved speed, intelligence and accuracy in the newest version of its media asset managemen...

27/11/2025

Prime Video Debuts AI-Powered Video Recaps

Prime Video has launched AI-powered video season recaps in a beta version for select English-language Prime Original series in the U.S., a move Amazon is callin...

27/11/2025

Stranger Things 5': Prepare for One Last Adventure With Our Final Season Coverage Guide

Back to All News Stranger Things 5': Prepare for One Last Adventure With O...

27/11/2025

Elastic Compute for a Sustainable Media Industry

The media industry has a paradox at its core. It's an industry built on light, color and imagination, yet behind the scenes, it's powered by one of the ...

27/11/2025

The Ultimate Black Friday Deal Is Here

Black Friday is leveling up. Get ready to score one of the biggest deals of the season - 50% off the first three months of a new GeForce NOW Ultimate membership...

26/11/2025

SVG Sit-Down: Prime Video EP Mike Muriano Previews Massive Black Friday Slate Featuring NFL, NBA, and Golf

SVG Sit-Down: Prime Video EP Mike Muriano Previews Massive Black Friday Slate Fe...

26/11/2025

Inside the Archives: Winter Is in the Air and in Our Festival Films

A cinematic snow sculpture at the 1995 Sundance Film Festival. Photo by Randall Michelson...

26/11/2025

JioStar and Nielsen Unveil Breakthrough Cross-Screen MeasurementStudy, Redefining Advertising Effectiveness in Live Sports

Mumbai, November 24, 2025: In a first-of-its-kind initiative, JioStar, in collab...

26/11/2025

ITN Deploys IP-Based Production Control Room

LONDON Factual content producer ITN Productions has launched a new low-latency IP gallery for news bulletins....

26/11/2025

YouTube TV, TelevisaUnivision End Lengthy Blackout

MIAMI TelevisaUnivision said it struck a new multiyear distribution agreement with YouTube TV that includes distribution of TelevisaUnivision's U.S. network...

26/11/2025

OpenDrives Bridges the Gap Between IT and Creatives with...

OpenDrives, Inc., a leader in software-defined data storage and data services, today announced the launch of the Atlas Corporate Creative Solution. This new Atl...

26/11/2025

Disguise to Showcase Future of Event Visuals at LDI 2025

Disguise, the industry-leading company powering the world's biggest live performances, is partnering with pioneering LED wall manufacturer DVS to give atten...

26/11/2025

HighField AI Expands Global Channel Partner Network to Ac...

HighField AI, the pioneer in agentic and multimodal automation for broadcast and media production, today announced the expansion of its global channel partner n...

26/11/2025

Mono Streaming selects PlayBox Neo to manage English Prem...

As high-stakes Premier League fixtures approach and additional premium content launches, with MONO positioning themselves to dominate Thailand's sports stre...

26/11/2025

Bell Centre arena in Montreal elevates fan experience wit...

Hosting a wide variety of events from high-intensity NHL games to complex live music concerts and major entertainment productions, Montreal's 21,000 capacit...

26/11/2025

Vizrt launches AI-powered advances for speed and accuracy...

Vizrt, the leader in live production technology revolutionizing viewer engagement and experience, releases AI-driven advances focusing on speed, intelligence, a...

26/11/2025

ITN Launches Low-Latency IP Control Room Powered by Teche...

ITN Productions, an award-winning factual content producer, today launched a new low-latency IP gallery for news bulletins. Responsible for delivering a leading...

26/11/2025

Ikegami Maintains Initiative in Broadcast Systems Develop...

Ikegami reports ongoing advances throughout 2025 in developing and delivering coordinated television production solutions that maximize quality, versatility and...

26/11/2025

Fubo, NBCUniversal Trade Barbs in Carriage Dispute

Following the Nov. 21 blackout of NBCUniversal channels on Fubo, the two sides have traded barbs about their inability to reach a new carriage deal....

26/11/2025

Global Sports Rights Spending to Top $78 Billion in 2030

LONDON As TV sports rights become increasingly important for both broadcasters and streamers, Ampere Analysis predicts global investment in the genre will surpa...

26/11/2025

Vubiquity Earns AWS Media & Entertainment Competency Status

LOS ANGELES Vubiquity said it has achieved the Amazon Web Services (AWS) Media & Entertainment Competency as part of the AWS Partner Network (APN). This designa...

26/11/2025

Comcast Pays $1.5 Million to Settle FCC Data Breach Probe

WASHINGTON The Federal Communications Commission's Enforcement Bureau said it has entered into a consent decree with Comcast calling for the cable company t...

26/11/2025

Berklee Named to the Hollywood Reporters Top Music Schools List

Berklee Named to the Hollywood Reporters Top Music Schools List The publication highlights the college's screen scoring program, industry partnerships, and ...

26/11/2025

Animated Series Love Through a Prism' Casts New Light on Romance Between Aristocrat and Exchange Student in London

Back to All News Animated Series Love Through a Prism' Casts New Light on ...

26/11/2025

NALIP Unveils Fifth Cohort of Director Incubator

Back to All News NALIP Unveils Fifth Cohort of Director Incubator Social Impact 26 November 2025 United States Link copied to clipboard The National Assoc...

26/11/2025

Netflix Deepens Partnership with Taiwan's 62nd Golden Horse Film Festival, Launches New Talent and Storytelling Initiatives

Back to All News Netflix Deepens Partnership with Taiwan's 62nd Golden Hors...

25/11/2025

SVG All-Stars: Blayke Scheer, Senior Director, Creative Content, YES Network

SVG All-Stars: Blayke Scheer, Senior Director, Creative Content, YES NetworkThe Indiana alum has turned storytelling into an artform for more than two decadesBy...

25/11/2025

Op-Ed: With FCC's C-Band Auction on the Horizon, Broadcasters Need Proven, Cost-Effective Alternatives

Op-Ed: With FCC's C-Band Auction on the Horizon, Broadcasters Need Proven, C...

25/11/2025

Analysis: Is Baller League Really the Future of Sport?

Analysis: Is Baller League really the future of sport? By Callum McCarthy, Editor-at-Large Tuesday, November 25, 2025 - 10:10 Print This Story With KSI on...

25/11/2025

Platinum Whitepaper: The Growth of Broadcast in the World of Major Large Scale Events with SOS Global

Platinum Whitepaper: The Growth of Broadcast in the World of Major Large Scale E...

25/11/2025

SVG Summit 2025 Preview: SVG Women's Sports Workshop

SVG Summit 2025 Preview: SVG Women's Sports WorkshopBy Samantha Gabay Tuesday, November 25, 2025 - 10:27 am Print This Story | Subscribe Story Highlig...

25/11/2025

SVG New Sponsor Spotlight: CacheFly's Matt Levine on the Evolving Role of the CDN and Prioritizing Throughput

SVG New Sponsor Spotlight: CacheFly's Matt Levine on the Evolving Role of th...

25/11/2025

Peacock's EA SPORTS Madden NFL Cast Levels Up on Thanksgiving With SkyCam as the Primary Angle and More Madden Elements

Peacock's EA SPORTS Madden NFL Cast Levels Up on Thanksgiving With SkyCam as...

25/11/2025

Sauna Is an Intimate Exploration of Queer Love and Identity

Mathias Broe attends the 2025 Sundance Film Festival premiere of Sauna at Library Center Theatre. (Photo by Michael Hurcomb/Shutterstock for Sundance Film Fes...

25/11/2025

Lionsgate and Nielsen expand partnership to deliver first-ever combined FAST channel and digital network measurement

Nielsen will now measure both Lionsgate's FAST channel MovieSphere and Movie...

25/11/2025

AP Switches to DaVinci Resolve Studio for Global News Production

FREMONT, Calif. Blackmagic Design said the Associated Press has completed the transition of its global video-editing platform to DaVinci Resolve Studio....

25/11/2025

Berklees Inaugural Nat King Cole and Natalie Cole Scholarship Awarded to Paris Pineyro

Berklees Inaugural Nat King Cole and Natalie Cole Scholarship Awarded to Paris P...

25/11/2025

Traditional TV Players Gained Viewers in October: Nielsen Gauge

NEW YORK NFL and college football coverage, the MLB postseason and the new fall broadcast-TV season contributed to major gains for traditional media companies a...

25/11/2025

Tower Products CEO Jim Veltrie to Retire Dec. 30

SAUGERTIES, N.Y. Tower Products, a manufacturer and distributor of pro video and audio equipment here, said President and CEO Jim Veltrie will retire from the c...

25/11/2025

Sinclair Makes Unsolicited Bid to Buy Scripps at $7 a Share

Following last week's disclosure that it had acquired a 8.2% stake in E.W. Scripps, Sinclair has filed papers with the Securities and Exchange Commission pr...

25/11/2025

FLUX.2 Image Generation Models Now Released, Optimized for NVIDIA RTX GPUs

Black Forest Labs - the frontier AI research lab developing visual generative AI models - today released the FLUX.2 family of state-of-the-art image generation ...

24/11/2025

HBO's The Shuffle' Reveals Longtime Connection of Sports and Entertainment

HBO's The Shuffle' Reveals Longtime Connection of Sports and Entertainm...

24/11/2025

2025 Sports Broadcasting Hall of Fame: Hiroshi Kiriyama, Sony Broadcast (and Industry) Technology Icon

2025 Sports Broadcasting Hall of Fame: Hiroshi Kiriyama, Sony Broadcast (and Ind...

24/11/2025

SVG Summit 2025 Preview: FIFA, NBC Olympics, Fox Sports, CBS Sports, Netflix, NFL, NBA, MLB, USTA Power Dec. 16 Conversations

SVG Summit 2025 Preview: FIFA, NBC Olympics, Fox Sports, CBS Sports, Netflix, NF...

24/11/2025

Case Study: YES Network Streamlines Broadcast Operations with Beam Dynamics

Case Study: YES Network Streamlines Broadcast Operations with Beam DynamicsBy SVG Staff Monday, November 24, 2025 - 11:18 am Print This Story | Subscribe ...

24/11/2025

Platinum White Paper: More of Everything: How Broadcasters are Changing Their Approach to Meet Rises in Consumer Demand with Calrec

Platinum White Paper: More of Everything: How Broadcasters are Changing Their Ap...

24/11/2025

Versant Media USA Sports President Matt Hong on How Versant Has Best of Both Worlds: a Start-Up Mentality and $7 Billion Revenues

Versant Media USA Sports President Matt Hong on How Versant Has Best of Both Wor...

24/11/2025

SVG Sit-Down: NABA Director-General Rebecca Hanson on How FCC's C-Band Auction Will Impact Broadcasters

SVG Sit-Down: NABA Director-General Rebecca Hanson on How FCC's C-Band Aucti...

24/11/2025

SVG New Sponsor Spotlight: Bolin Technology's Sapan Doshi on the Proliferation of PTZ Cameras for Sports Venues and Broadcasters

SVG New Sponsor Spotlight: Bolin Technology's Sapan Doshi on the Proliferati...

24/11/2025

L3Harris' Next-Generation Weather Imager Ready to Deliver Life-Saving Weather Data Under Critical NOAA Satellite Program

The L3Harris next-generation imager for NOAA's GeoXO satellite system will c...