Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

North America Stories

05/05/2026

Gray Media Closes Purchase of 10 Allen Media Group Stations

Share Copy link Facebook X Linkedin Bluesky Email...

05/05/2026

Dang Ly Joins Operative as Chief Product Officer

Share Copy link Facebook X Linkedin Bluesky Email...

05/05/2026

CIMM, TVB Release Local TV Currency Measurement Guidelines

Share Copy link Facebook X Linkedin Bluesky Email...

05/05/2026

ARRI Introduces Omnibar LED Linear Fixture

Share Copy link Facebook X Linkedin Bluesky Email...

05/05/2026

France Televisions Continues ST 2110 Migration With Imagi...

Project Marks First Major Broadcast Deployment of Latest Addition to SNP Lineup Imagine Communications today announced that France T l visions is the first br...

05/05/2026

Shotoku Broadcast Systems Wins 2026 NAB Show Product of t...

Shotoku Broadcast Systems Wins 2026 NAB Show Product of the Year Award Shotoku Broadcast Systems announced today that its Swoop range of robotic cranes has be...

05/05/2026

DigitalGlues creativespace Intelligence Wins Futures Best...

DigitalGlue's creative.space Intelligence Wins Future's Best of Show Award, Presented by TV Tech creative.space Intelligence (CSI), part of the creativ...

05/05/2026

Zixi Showcases Next-Generation Live Video Workflows and M...

Zixi, a leader in live video delivery and workflow orchestration, will showcase next-generation broadcast workflows at the Media Production and Technology Show ...

05/05/2026

Stingr marks its launch with a new approach to second-screen interactivity

Stingr marks its launch with a new approach to second-screen interactivity Brie Clayton May 5, 2026 0 Comments Huge leap forward in revenues and engag...

05/05/2026

Shotoku Broadcast Systems Wins 2026 NAB Show Product of the Year Award

Shotoku Broadcast Systems Wins 2026 NAB Show Product of the Year Award Brie Clayton May 5, 2026 0 Comments Shotoku Broadcast Systems announced today tha...

05/05/2026

DHD to Promote Latest Advances in Audio Production at MPT...

Following a successful NAB Show in Las Vegas, DHD will promote examples from its wide range of broadcast-quality audio production equipment at the May 13th-14th...

05/05/2026

LucidLink Redefines Cloud Media Workflows at MPTS 2026

LucidLink today announced its programme for MPTS 2026, where it will exhibit at Stand M59 at Olympia London, 13 to 14 May. The company will showcase its latest ...

05/05/2026

Limecraft Announces Version 2026-3 of its Cloud-Based Tel...

Limecraft today announces the release of Limecraft 2026.3, the third platform update in its 2026 release cycle. Limecraft is an AI-powered production platform t...

05/05/2026

Stingr marks its launch with a new approach to second-scr...

Huge leap forward in revenues and engagement...

05/05/2026

Broadcast Solutions strengthens CTO Office for technical...

Broadcast Solutions, a leading system integrator and provider of innovative solutions for the broadcast media industry, has taken another significant step in st...

05/05/2026

Operative Appoints Dang Ly as Chief Product Officer to Ac...

Operative today announced the appointment of Dang Ly as Chief Product Officer, signaling the company's accelerating commitment to delivering the next genera...

05/05/2026

World Skills Cafe Returns to IBC2026

The Media Talent Manifesto (MTM) today announces the return of the World Skills Caf at IBC2026, positioning the event as a critical industry forum to confront ...

05/05/2026

ARRI unveils Omnibar: compact, modular, battery-powered IP65 LED bars with precise pixel control

ARRI unveils Omnibar: compact, modular, battery-powered IP65 LED bars with preci...

05/05/2026

NBC Sports' NBA Playoff Viewership Up 58%

Share Copy link Facebook X Linkedin Bluesky Email...

05/05/2026

U.S. Court Upholds Some Patents in LG ATSC 3.0 Infringement Case

Share Copy link Facebook X Linkedin Bluesky Email...

05/05/2026

Gray Media and Allen Media Group Close Station Transactions

Share Copy link Facebook X Linkedin Bluesky Email...

05/05/2026

Digital Domain Welcomes Award-Nominated VFX Supervisor Jelmer Boskma

Digital Domain Welcomes Award-Nominated VFX Supervisor Jelmer Boskma Brie Clayton May 4, 2026 0 Comments Digital Domain, a global leader in visual eff...

04/05/2026

SVG Sit-Down: NEP Americas Mike Werteen on How Great Tech, Better People Drive Success

Hardware is still an emphasis - Supershooter 11 is new, and REMI-based 65 is in ...

04/05/2026

Beyond 90 Minutes: How K League's Soccer Blueprint for Growth Has Lessons for Everyone

Head of International Business Development Min Joo Kim explores the league's...

04/05/2026

Audio-Technica ATND1061 and ATUC Discussion Systems Certified for Crestron Automate VX

Audio-Technica has announced that its ATND1061 ceiling array microphone and ATUC...

04/05/2026

Triple B Media Launches Bowling TV, a 24/7 FAST Channel Dedicated to Bowling

Triple B Media has launched Bowling TV, a free ad-supported television (FAST) channel dedicated to bowling. The channel is available on Prime Video, LG Channels...

04/05/2026

PlayMetrics Acquires SportsEngine from Versant

PlayMetrics, a provider of operations management software for youth sports organizations, has announced the completion of its acquisition of substantially all t...

04/05/2026

IHSE GmbH Appoints Dr. Thomas Niessen as CEO

IHSE GmbH has announced that Dr. Thomas Niessen has joined as CEO and Managing Director, effective May 1, 2026. He joins Frank Breitenfelder, who has served as ...

04/05/2026

PMY Group Deploys Optic Crowd Intelligence Platform at Australian Formula 1 Grand Prix

PMY Group deployed its AI-powered crowd intelligence platform, Optic, at the For...

04/05/2026

Behind The Mic: Stephen A. Smith and Skip Bayless to Reunite for First Take Episode; Donna Brothers Worked Final Kentucky Derby

Behind The Mic provides a roundup of recent news regarding on-air talent, includ...

04/05/2026

SAG-AFTRA, Studios Reach Tentative Agreement

Share Copy link Facebook X Linkedin Bluesky Email...

04/05/2026

Study: Paramount-WBD Deal Signals New Era of Streaming Scale

Share Copy link Facebook X Linkedin Bluesky Email...

04/05/2026

Student Spotlight: Joshua Griffin

Student Spotlight: Joshua Griffin The New Orleans native, who was named the 2026 student commencement speaker for Boston Conservatory at Berklee, talks about ...

03/05/2026

Introducing the new Mistika Workflows Suite: transformative and cost-effective for every user

Introducing the new Mistika Workflows Suite: transformative and cost-effective f...

03/05/2026

Introducing the new Mistake Workflows Suite: transformative and cost-effective for every user

Introducing the new Mistake Workflows Suite: transformative and cost-effective f...

03/05/2026

Filming begins on the third and final season of Breathless

Back to All News Filming begins on the third and final season of Breathless Entertainment 03 May 2026 GlobalSpain Link copied to clipboard Discover the vi...

02/05/2026

Release Rundown: What to Watch in May, From Saccharine to Tuner

(L-R) Dustin Hoffman and Leo Woodall appear in Tuner by Daniel Roher, an official selection of the 2026 Sundance Film Festival. (Photo courtesy of Sundance In...

02/05/2026

FCC Releases Tentative Agenda for May Open Meeting

Share Copy link Facebook X Linkedin Bluesky Email...

02/05/2026

Sinclair Remains Bullish on Station M&A

Share Copy link Facebook X Linkedin Bluesky Email...

02/05/2026

NABLF Announces 2026 Broadcast Leadership Training Award Winners

Share Copy link Facebook X Linkedin Bluesky Email...

02/05/2026

Gravity Media Taps Custom Consoles for Work on Production Center

Share Copy link Facebook X Linkedin Bluesky Email...

02/05/2026

May 01, 2026

Scripps Research immunologist Dennis Burton elected to American Academy of Arts and Sciences A leader in broadly neutralizing antibodies, Burton has helped driv...

01/05/2026

Ratings Roundup: NBA Playoffs Return to NBC Sports up 38%; NFL Draft Down 12% Overall From 2025

Ratings Roundup is a rundown of recent rating news and is derived from press rel...

01/05/2026

BKB Bare Knuckle Boxing Appoints Will Wright as Chief Operating Officer to Drive Global Growth and Operational Excellence

BKB Bare Knuckle Boxing ( BKB ), today announced the appointment of Will Wright ...

01/05/2026

NAB Rewind: Lawo's Andreas Hilmer on the Power of the Edge One AV Stagebox

Lawo has been at the center of the industry's transition to IP and other next-generation technologies. At NAB 2026, its story was the Edge One AV stagebox, ...

01/05/2026

Kentucky Derby 152 to Air Across 19 Networks in 170-Plus Territories

HBA Media, acting on behalf of NBC Sports and Churchill Downs Incorporated, has announced broadcast and streaming distribution for Kentucky Derby 152, taking pl...

01/05/2026

Give Me the Backstory: Get to Know Barbara Kopple, the Director of American Dream

By Bailey Pennick One of the most exciting things about the Sundance Film Festi...

01/05/2026

UPDATED: Republican AGs Join Nexstar-Tegna Antitrust Suit

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Broadcaster Draper Media Names Bill Vernon President

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Analysts: 'Hollywood's Vertical Video Strategy Is Dead Wrong'

Share Copy link Facebook X Linkedin Bluesky Email...