Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

North America Stories

26/03/2026

Nationals.TV Now Available on Charter, Comcast, Cox, DIRECTV, Verizon Fios and More

Nationals.TV, the Washington Nationals' local broadcast channel, is now avai...

26/03/2026

MLB.TV Launches on the ESPN App Starting Opening Day

Starting Thursday, March 26, MLB.TV will be available on the ESPN App, giving fans access to more than 2,000 out-of-market regular season games. Fans can also r...

26/03/2026

Why YEP Changed My Career (And Why It Might Change Yours)

By Matt Klein Hello to everyone reading this - whether you're a current YEP, HPA member, or just YEP-curious. I'm Matt Klein, one of the co-chairs of t...

26/03/2026

CBS Sports Inks New WNBA Deal

Share Copy link Facebook X Linkedin Bluesky Email...

26/03/2026

Apogee Insight Acquires PMA Research

Share Copy link Facebook X Linkedin Bluesky Email...

26/03/2026

NBC Announces Exclusive Coverage of Sail4th 250 Tall Ships in NYC

Share Copy link Facebook X Linkedin Bluesky Email...

26/03/2026

UK Group of media tech companies take the spotlight at NA...

28 participating companies, from start-ups to blue-chips, lead the UK Group as part of the GREAT Britain and Northern Ireland presence across all the Halls at N...

26/03/2026

Arkona Unveils BLADE//planner and Major Usability Enhancements at NAB 2026

Arkona Unveils BLADE//planner and Major Usability Enhancements at NAB 2026 Brie Clayton March 26, 2026 0 Comments New graphical configuration tool and...

26/03/2026

Bitfocus showcases complete control at NAB Show 2026

Bitfocus showcases complete control at NAB Show 2026 Brie Clayton March 26, 2026 0 Comments Continuing development drives advances in security, availa...

26/03/2026

Allen Media Group To Deploy Anoki ContextIQ

Share Copy link Facebook X Linkedin Bluesky Email...

26/03/2026

LG Announces New Premium FAST Channels

Share Copy link Facebook X Linkedin Bluesky Email...

26/03/2026

IABM to Host Breakfast Event at 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

26/03/2026

Nexstar Defends Tegna Deal in Calif. Court Filing

Share Copy link Facebook X Linkedin Bluesky Email...

26/03/2026

Nevion introduces powerful new Panel Builder to enhance VideoIPath broadcast control capability

Nevion introduces powerful new Panel Builder to enhance VideoIPath broadcast con...

26/03/2026

2026 Oscar Nominated Films Powered by Blackmagic Design

2026 Oscar Nominated Films Powered by Blackmagic Design Brie Clayton March 25, 2026 0 Comments DaVinci Resolve Studio used on 27 of this year's no...

26/03/2026

Leader to present full suite of advanced Test & Measurement solutions at NAB Show 2026

Leader to present full suite of advanced Test & Measurement solutions at NAB Sho...

26/03/2026

Boston Conservatory to Present New England and Collegiate Premiere of Groundbreaking Opera Time to Act

Boston Conservatory to Present New England and Collegiate Premiere of Groundbrea...

26/03/2026

Have a Wish Worth Dying For? Netflix's First Korean YA Horror Series If Wishes Could Kill' Premieres April 24

Back to All News Have a Wish Worth Dying For? Netflix's First Korean YA Hor...

26/03/2026

'The Truthers,' Starring Jose Coronado and Stphanie Magnin, Premieres on Netflix on 24 July

Back to All News The Truthers, Starring Jose Coronado and St phanie Magnin, Pre...

26/03/2026

Into the Omniverse: NVIDIA GTC Showcases Virtual Worlds Powering the Physical AI Era

Editor's note: This post is part of Into the Omniverse, a series focused on ...

26/03/2026

Game On: Five New Titles Now Streaming on GeForce NOW

That gaming backlog won't clear itself - GeForce NOW is here to help. Stream the latest titles straight from the cloud across a variety of devices. This we...

25/03/2026

In The Hot Seat: The Art of Directing a Premier League Match

Live match directors Sarah Cheadle (Sky Sports), Rob Levi (TNT Sports), and Andrew Swift (BBC Sport) sit down with the Premier League's Rachel Nightingale t...

25/03/2026

SVG Students To Watch: Kyle Maier, St. Bonaventure University

The senior from Upstate New York is manning the mic while also interning for the athletic department's sports-information team In the live-sports-video ind...

25/03/2026

NAB 2026: Synamedia Launches Edge Watermarking Solution, Marks 10 Years of ContentArmor

Synamedia has announced ContentArmor Edge Watermarking, a server-side solution t...

25/03/2026

SES Taps K2 Space to Build meoSphere MEO Satellite Network

SES has announced meoSphere, a medium Earth orbit (MEO) satellite network targeted for operation by 2030. The first phase will pair SES-developed software-defin...

25/03/2026

Reuters and TVU Networks Begin Satellite-to-IP Migration for Live News Distribution

TVU Networks is working with Reuters on a phased migration from satellite to a c...

25/03/2026

Nielsen Names Three Senior Hires in Sports, Advertising, and Publishing Roles

Nielsen has announced three senior appointments. Seth Ladetsky has been named Head of Global Sports. Trevor Fellows will lead Nielsen's advertiser and agenc...

25/03/2026

Anoki and Amagi Bring Scene-Level Intelligence to In-Content CTV Ads

Anoki and Amagi have launched In-Scene Ads powered by Anoki ContextIQ across Amagi's portfolio of in-content ad formats for Free Ad-supported Streaming TV (...

25/03/2026

NAB 2026: Arkona to Unveil BLADE//planner and Platform Updates

Arkona Technologies will announce a series of enhancements to its BLADE//runner platform at NAB 2026 (Booth C.1808). The updates focus on usability and workflow...

25/03/2026

San Diego Padres Partners With Daktronics to Enhance Petco Park

Daktronics has installed two tower displays and a video wall in the Lexus Club at Petco Park in San Diego ahead of the 2026 season. Continuing to improve the ...

25/03/2026

NAB 2026: MultiDyne Marks 50th Anniversary

MultiDyne Video & Fiber Optic Systems is celebrating its 50th anniversary as NAB Show 2026 approaches. The company was founded in 1976 by Vincent Jachetta, an N...

25/03/2026

NAB 2026: IPC to Debut with One Connect Intercom Platform and New One Link Keypanels

IPC, a provider of integrated communication solutions, will make its NAB 2026 de...

25/03/2026

ESPN Tops 2026 Sports Emmy Nominations With 63 Nods

Live production categories were led by NBC, FOX, and ESPN's NFL coverage...

25/03/2026

Atlanta Braves and Spectrum Reach Multiyear Distribution Agreement for BravesVision

The Atlanta Braves and Spectrum have announced a multiyear distribution agreemen...

25/03/2026

The AI Doc Asks the Question No One Wants to Answer

(L-R) Charlie Tyrell and Daniel Roher attend The AI Doc: Or How I Became An Apocaloptimist Premiere during the 2026 Sundance Film Festival at The Ray Theatre ...

25/03/2026

L3Harris, RFTEQ Sign Agreement to Advance Sovereign Electronic Warfare Capability in Australia

L3Harris Technologies and RFTEQ Pty Ltd signed a memorandum of understanding to ...

25/03/2026

L3Harris to Provide Autonomous Underwater Capability for US Navy Submarines

L3Harris delivers combat-ready Torpedo Tube Launch and Recovery system, which deploys and retrieves Iver4 900 autonomous underwater vehicles through submarine t...

25/03/2026

Nielsen Names New Senior Leaders Supporting Sports, Advertising and Publishing Clients

The company expands leadership team under Chief Revenue Officer Amilcar Perez S...

25/03/2026

Stable TV Viewership in Poland in February as Warner Bros. Discovery Retains Top Spot

Winter Olympic Games Opening Ceremony features in top 10 programmes of the month...

25/03/2026

Mediaproxy to Show Upgrades to LogServer at 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

25/03/2026

Hitomi transforms production synchronisation with the lau...

Providing wide view timing visibility across the entire production chain...

25/03/2026

Bitfocus showcases complete control at NAB Show 2026

Continuing development drives advances in security, availability, access and connectivity...

25/03/2026

Caudalie Paris HQ elevates brand experience with INFiLED...

Caudalie, the renowned French cosmetics brand, has unveiled a state-of-the-art 200-seat auditorium at its new headquarters in the historic Marais district of ce...

25/03/2026

Telestream Unlocks Adobe-Centric Media Pipeline and Strea...

Telestream, a global leader in media workflow technologies, today announced expanded integration with Adobe Premiere, Adobe Media Encoder (AME), and Frame.io, d...

25/03/2026

Marshall Electronics Showcases New Feature Rich CV320 and...

Marshall Electronics is expanding its lineup of high-performance POV cameras designed for broadcast, live production and professional AV applications with the d...

25/03/2026

OOONA Achieves TPN Gold Star Shield - the Highest Level o...

OOONA, a global provider of professional management and production tools for the media localization industry, announced today that it has been awarded the TPN G...

25/03/2026

Gray Media to Simulcast 2026 Atlanta Braves Home Opener

Share Copy link Facebook X Linkedin Bluesky Email...

25/03/2026

2026 NAB Show Exhibitor Insight: Appear

Share Copy link Facebook X Linkedin Bluesky Email...

25/03/2026

Deepfakes Vulnerable to AI Fingerprint Hacks, Study Finds

Share Copy link Facebook X Linkedin Bluesky Email...