Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

02/05/2026

Dalet Flex LTS Delivers Smarter Search, Faster Editing, and an AI-Ready Foundation for Modern Media

Dalet, a leading technology and service provider for media-rich organizations, t...

01/05/2026

NBCUniversal's Peacock to Be First Streamer to Integrate Dolby's Full Suite of Premium Picture and Sound Innovations

January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...

20/04/2026

Rohde & Schwarz rolls out its full ARDRONIS counter UAS suite in a demonstration van at Counter UAS Technology Europe 2026

Rohde & Schwarz rolls out its full ARDRONIS counter UAS suite in a demonstration...

20/04/2026

Protecting America's Shores: L3Harris Keeps the Coast Guard Mission-Ready

L3Harris delivers integrated communications, navigation and C4ISR capabilities that empower the U.S. Coast Guard to protect Americas maritime interests and resp...

20/04/2026

Google Cloud Embraces the Rise of Agentic Production

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

Creators Go All in on AI, Niche Content

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

NBC Sports' Jon Miller: Broadcast Is Having a Moment'

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

Beyond the Lift and Shift': Cloud Migration's New Mandate

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

Virtual Production Finds Its Footing

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

Corporate Creators: All Companies Are Media Companies Now

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

IABM Rebrands as the International Association of MediaTech

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

CBS Detroit Debuts New AR/VR Technology-Driven Studio

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

Fox Sports Taps Appear X Platform for Remote Production

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

CueScript and Lighting Design Group Expand Customer Oppor...

CueScript and Lighting Design Group Expand Customer Opportunities Through New Partnership Find both companies at 2026 NAB Show in CueScript Booth # C 4720 ...

20/04/2026

Layercake Deepens Bitmovin Integration to Power End-to-En...

[Sydney, NSW, 20 April 2026] - Layercake, the company behind the intelligent media orchestration platform Streamcake, today announced the formalisation of its i...

20/04/2026

FOX Sports selects Appear X Platform for next-generation...

Deployment spans FOX Sports' REMI infrastructure, IP production for a major global soccer event, and its Jewel Events production systems Appear, a global l...

20/04/2026

Pro Sound Effects Launches the Industry's First and Only Native Sound Effects Integration for Avid Media Composer at NAB 2026

Pro Sound Effects Launches the Industry's First and Only Native Sound Effect...

20/04/2026

SBE Elevates Fred Willard to SBE Fellow

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

Blackmagic Design Announces Blackmagic Camera for iOS 3.3 Update

Blackmagic Design Announces Blackmagic Camera for iOS 3.3 Update Brie Clayton April 20, 2026 0 Comments New update adds camera control and monitoring ...

20/04/2026

Maxon Announces Free Tools and Mobile Expansion of ZBrush and Cinema 4D

Maxon Announces Free Tools and Mobile Expansion of ZBrush and Cinema 4D Brie Clayton April 20, 2026 0 Comments Cinema 4D brings professional 3D workfl...

20/04/2026

Vizrt AI Keyer kills the green screen and creates virtual scenes in any environment

Vizrt AI Keyer kills the green screen and creates virtual scenes in any environm...

20/04/2026

Register now - Market & Audience Department Ask Me Anything (AMA) Session

Register now - Market & Audience Department Ask Me Anything (AMA) Session 11 February 2026 Screen Australia Head of Market & Audience Rakel Tansley Talking to...

20/04/2026

Screen Australia appoints Tanya Phegan as Narrative Content Head of Development

Screen Australia appoints Tanya Phegan as Narrative Content Head of Development 17 March 2026 Tanya Phegan Screen Australia has today announced the appointmen...

20/04/2026

Applications Open for Skip Ahead 11

Applications Open for Skip Ahead 11 19 March 2026 Past Skip Ahead recipients (L-R): Macfarlane Bros, Rainbow Bop, Lyanna Kea. Screen Australia and YouTube Aus...

20/04/2026

Screen Australia empowers the next games generation, including new creatives from neighbouring disciplines

Screen Australia empowers the next games generation, including new creatives fro...

20/04/2026

Official Co-production Ask Me Anything (AMA) Session

Official Co-production Ask Me Anything (AMA) Session 26 March 2026 Image (L-R): Mix Tape, Michele McDonald, Flower & Flour. Interested in international Co-pro...

20/04/2026

Screen Australia announces Narrative Content funding for 91 projects, including four short films paired with industry mentors

Screen Australia announces Narrative Content funding for 91 projects, including ...

20/04/2026

Production Infrastructure and Capacity Analysis (PICA) pinpoints four key workforce challenges in the Australian screen industry

Production Infrastructure and Capacity Analysis (PICA) pinpoints four key workfo...

20/04/2026

Australians in Film and Screen Australia Announce the 2026 Participants in the Talent Gateway and Global Producers Program

Australians in Film and Screen Australia Announce the 2026 Participants in the T...

20/04/2026

Screen Australia relaunches website with new tools and improved user experience

Screen Australia relaunches website with new tools and improved user experience 16 April 2026 Screen Australia relaunches website Screen Australia has relaun...

20/04/2026

Ikegami Announces VFE-P07D Monocular OLED Viewfinder

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

EVS Launches Choreon Robotic Control Solution

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

Ross Video Showcases End-To-End Production Ecosystem at 2026 Nab Show

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

Heidi Steffen to Become President of TitanTV

Share Copy link Facebook X Linkedin Bluesky Email...

20/04/2026

Ready for the Next Step?

Thomas Valter Director of Product Management, RTW March 26, 2026 For users of the classic TouchMonitor, one obvious question is: what does the TMxCore offer...

20/04/2026

Fox Corporation Executives to Discuss Third Quarter Fiscal 2026 Financial Results Via Webcast

Fox Corporation Executives to Discuss Third Quarter Fiscal 2026 Financial Result...

20/04/2026

NVIDIA and Partners Showcase the Future of AI-Driven Manufacturing at Hannover Messe 2026

Manufacturing is at an inflection point. Across every major industrial economy, ...

20/04/2026

Autonomous AI at Scale: Adobe Agents Unlock Breakthrough Creative Intelligence With NVIDIA and WPP

AI agents are transforming how work gets done across all industries, acceleratin...

19/04/2026

NAB Show 2026 Is Here! Follow All of our Live Coverage!

Blackmagic Design has announced the ATEM 4 M/E Constellation IP and ATEM 4 M/E Constellation IP Plus, two SMPTE 2110-native live production switchers. The ATEM ...

19/04/2026

Live From NAB 2026: Grass Valley CEO Jon Wilson on AMPPs Explosive Growth, Hybrid Workflows, and Whats New at the Show

Grass Valley is finding the right balance between its hardware heritage with an ...

19/04/2026

Live From NAB 2026: Oracles Kip Schauer on Why OCI Is Doubling Down on Media, Sports, and Broadcast

Oracle's strategy rests on the foundational strengths of Oracle Cloud Infras...

19/04/2026

Live From NAB 2026: Program Productions Jess Kowatch on Whats New with ProCrewz and the Impact of AI on Crewing

Program Productions, the live sports production industry's leading crewer, i...

19/04/2026

Live From NAB 2026: Aggrekos Joe Scionti on Powering the Super Bowl, PGA Championship, and the Road to the FIFA World Cup

At the 2026 NAB Show in Las Vegas, SVG sat down with Joe Scionti, Account Manage...

19/04/2026

NAB 2026: Evertz to highlight evertz.io XChange for live event management and market switching

Evertz (Booth N817) is set to present new services within its evertz.io platform...