Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

North America Stories

27/05/2026

Telestream Appoints Benjamin Desbois as CEO, Effective July 1

Telestream has announced that its Board of Directors has appointed Benjamin Desbois as Chief Executive Officer, effective July 1, 2026. Desbois, currently Teles...

27/05/2026

FOX MLB Leads Live-Event Categories; ESPN Is Tops Overall at 47th Annual Sports Emmy Awards

ESPN garnered 10 awards; NBC's Sunday Night Football received the Outstandin...

27/05/2026

Matrox Video Marks 50th Anniversary, Announces New Product Launch for June

Matrox Video is celebrating its 50th anniversary, marking five decades of operations from its headquarters in Montreal, Canada. Founded in 1976, the company has...

27/05/2026

MLB Announces Fan Engagement Initiatives for Americas 250th Anniversary

Major League Baseball has announced a series of initiatives tied to America's Semiquincentennial, including a national marketing campaign, Fourth of July br...

27/05/2026

Advanced Systems Group Hires Brian Gross as Account Manager for Audio Team

Advanced Systems Group (ASG) has announced that Brian Gross has joined the company as an Account Manager on its Audio team, based in the Burbank office. He will...

27/05/2026

Nielsen Research: Hispanic Fans, Asian Markets Drive Global Soccer Audience Ahead of World Cup 2026

Nielsen has released new research on soccer fandom ahead of the FIFA World Cup 2...

27/05/2026

ESL FACEIT Group Debuts First Ever Esports Vertical Stream Co-Developed With TikTok

ESL FACEIT Group (EFG) has unveiled a new partnership with TikTok to bring broad...

27/05/2026

Two Weeks Away: FIFA Outlines Production Plans for Highly Anticipated North American-Based World Cup

FIFA's Oscar Sanchez gives a deeper look to how this tournament will be cove...

27/05/2026

SVG Students To Watch: Maggie Lynn, Virginia Tech

The soon-to-be senior from Charlottesville is building her skills in replay, TD, and even creative content for HokieVision and its ACC Network productions In t...

27/05/2026

A Global Festival of Football: FOX Sports Illustrates Strategy to Bring Every FIFA Mens World Cup Match to the U.S. Audience

FOX Sports' Mike Davies breaks down the vision for this summer's showcas...

27/05/2026

Top-Tier Storytelling: Host Broadcast Services Works at Capturing the Atmosphere of the FIFA Mens World Cup

HBS's Paul King, FIFA's Oscar Sanchez preview how the masses at home wil...

27/05/2026

Matt Gangl & Pete Macheska on FOX MLBs Huge Night and an Unforgettable Postseason Run

FOX's MLB coverage dominated the night at the 47th Annual Sports Emmy Awards...

27/05/2026

FOXs Mike Davies and Team on Outstanding Technical Team Win for 2025 World Series

One of the most memorable Postseasons in baseball history would have had no memo...

27/05/2026

NBC Sports Rob Hyland Reflects on an Unforgettable Sunday Night Football Season

NBC's Sunday Night Football is among the most decorated and most watched programs in the history of television. It added to its jam-packed trophy case on Tu...

27/05/2026

Prime Videos John Ward and Mike Francis on Groundbreaking NBA on Prime Video Studio

The 2026 Sports Emmys marked a watershed moment for Prime Video Sports. After bu...

27/05/2026

Countdown to FIFA World Cup 2026: SVG Launches SportsTechLive Blog in Lead-up to Winter Games

With the Opening Match just over two weeks away, the entire sports-production-te...

27/05/2026

L3Harris Introduces the XL Converge 300P Portable Public Safety Radio

The XL Converge 300P radio system emerges with a groundbreaking feature set enhancing the mission-critical communications of public safety, federal and critica...

27/05/2026

Modernizing Public Safety Communications

Pairing Two47 MCX software with existing LTE networks means tailored system upgrades that can save time, money and lives....

27/05/2026

L3Harris Strengthens Global Solid Rocket Motor Supply Chain With New PAC-3 Propulsion Supplier

PAC-3 MSE offers improved range, speed, and maneuverability, making it an effect...

27/05/2026

Brightcove Adds New Features to Its AI Suite for Video Advertising

Share Copy link Facebook X Linkedin Bluesky Email...

27/05/2026

Star Trek VFX: Recreating John Knoll's Iconic Warp Stars without a Slitscan Camera

Star Trek VFX: Recreating John Knoll's Iconic Warp Stars without a Slitscan ...

27/05/2026

Adventure World Uses Blackmagic Replay for Marine Live

Adventure World Uses Blackmagic Replay for Marine Live Brie Clayton May 27, 2026 0 Comments Large screen displays and slow motion replays dynamically ...

27/05/2026

Berklee Alumna and Assistant Professor Olivia Prez-Collellmir to Premiere Original Work at Gaud Centennial in Barcelona

Berklee Alumna and Assistant Professor Olivia P rez-Collellmir to Premiere Origi...

27/05/2026

Gravity Media Expands Into Creative Services With New Agency

Share Copy link Facebook X Linkedin Bluesky Email...

27/05/2026

Tegna Names Patrick Paolini as CEO

Share Copy link Facebook X Linkedin Bluesky Email...

27/05/2026

Telestream Taps Company Vet Benjamin Desbois as CEO

Share Copy link Facebook X Linkedin Bluesky Email...

27/05/2026

HDR10+ Technologies to Launch Eclipsa Video Certification Program

Share Copy link Facebook X Linkedin Bluesky Email...

27/05/2026

ATSC to Gather in Washington Next Week for Annual Meeting

Share Copy link Facebook X Linkedin Bluesky Email...

27/05/2026

Telestream Appoints Benjamin Desbois as Chief Executive O...

Co-founder Dan Castles to transition to Executive Chair; internal promotion reinforces continuity and long-term growth Telestream, a global leader in media wor...

27/05/2026

Big Blue Marble Announces First End-to-End 5G Broadcast S...

Big Blue Marble today announced that its Nakolos platform is the first end-to-end 5G Broadcast solution worldwide to implement the complete feature set introduc...

27/05/2026

Lightware Continues Its ESG Commitment Through Girls Day...

Lightware recently hosted the Girls' Day event in April at its headquarters in Budapest, welcoming students for an interactive introduction to engineering a...

26/05/2026

Matrox Video Marks 50 Year Milestone

Share Copy link Facebook X Linkedin Bluesky Email...

26/05/2026

Roku Expands Premium Subscriptions With Fox One

Share Copy link Facebook X Linkedin Bluesky Email...

26/05/2026

Brian Gross Joins ASG's Audio Team as Account Manager

Share Copy link Facebook X Linkedin Bluesky Email...

26/05/2026

MPA Urges FCC Not to Reclassify vMVPDs

Share Copy link Facebook X Linkedin Bluesky Email...

26/05/2026

Cobalt Digital to Showcase End-to-End IPMX Ecosystem at I...

Cobalt Digital to Showcase End-to-End IPMX Ecosystem at InfoComm 2026, Making ST 2110 Easy for Pro AV blueCORE standalone processors headline solutions designe...

26/05/2026

Matrox Video Celebrates 50 Years of Innovation and Looks...

Matrox Video today celebrates its 50th anniversary, marking five decades of innovation, engineering excellence, and customer-focused evolution from its headquar...

26/05/2026

Cuez Helps ITV Studios Run Three Live Shows as One includ...

ITV Studios, the production arm of the UKs largest commercial broadcaster, has deployed the Cuez live production platform to unify the management of three back-...

26/05/2026

CETA Software releases Morpheus, an AI tool for real-time post-production project oversight

CETA Software releases Morpheus, an AI tool for real-time post-production projec...

26/05/2026

Ikegami Accelerates Motorsport Broadcast Innovation with...

In a move set to redefine motorsports coverage across the Asia Pacific region, Ikegami Electronics announces that Two Wheels Motor Racing Sdn Bhd (TWMR), a lead...

26/05/2026

Netflix Releases Official Trailer and Poster for 'The Root Of The Game,' A Documentary Series Premiering June 20

Back to All News Netflix Releases Official Trailer and Poster for The Root Of T...

26/05/2026

'MED,' Production Begins on Netflix's First Medical Drama From Brazil, Starring Clara Moneke

Back to All News MED, Production Begins on Netflixs First Medical Drama From Br...

26/05/2026

Netflix Announces Five New Brazilian Productions and Expands Its Presence at Rio2C 2026

Back to All News Netflix Announces Five New Brazilian Productions and Expands I...

26/05/2026

Netflix Celebrates the Release of the New Animated Series 'Due Spicci' With a Major Event at Circo Massimo in Rome

Back to All News Netflix Celebrates the Release of the New Animated Series Due ...

26/05/2026

Made In New Mexico: Building The Boroughs' From the Ground Up

Back to All News Made In New Mexico: Building The Boroughs' From the Ground Up A photo from The Boroughs.' (Courtesy of Netflix 2026) Entertainme...

26/05/2026

Broadcast Pix Introduces ONix Pro Control Panel

Smart Production Control. Total Confidence. Tyngsborough, MA, May 27, 2026 - Broadcast Pix today announced the ONix Pro Control Panel, its most advanced hard...

26/05/2026

NVIDIA Vera CPU Is Packing a Heavy-Hitting Punch' Against Competition

The shift to agentic AI creates a new CPU requirement for the AI factory: fast cores, massive memory bandwidth and the ability to sustain high performance when ...

25/05/2026

SVG All-Stars: LJ Helbig, Senior Manager, Broadcast Engineering, FloSports

The former University of Wyoming wrestler is essential in helping the rapidly growing streamer delivers more than 50,000 live events per year The sports-produc...

25/05/2026

Enabling Persistent Arctic Surveillance

Image courtesy of GA-ASI...