Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

North America Stories

12/05/2026

With over 50bn euros in assets under management and recor...

500 selected leaders from around the world across start-ups, corporates, and venture capital. Over 50bn in assets under management among attending investors, a...

12/05/2026

Randy Koeman Joins CVP as Part of Continued European Expa...

CVP, one of Europe's leading suppliers of professional video and broadcast solutions, has appointed Randy Koeman as Sales Manager, Netherlands, marking anot...

12/05/2026

Taiwan Television selects PlayBox Neo Suite multi-channel...

Taiwan Television has expanded its HD content ingest workflow capabilities by investing in the PlayBox Neo Suite - a leading multi-channel multi-server UHD/HD/S...

12/05/2026

Jigsaw24 Appoints Tom Laker and Ashish Chauhan to Support...

Jigsaw24 is strengthening its media team with the appointment of Tom Laker as Client Director and Ashish Chauhan as Project Manager. These strategic hires bring...

12/05/2026

Pebble brings future-ready playout innovation to Broadcas...

Broadcast playout leader showcases hybrid IP infrastructure and workflow automation with Videoland deployment as regional benchmark...

12/05/2026

dzjinius to Showcase All-in-one Production Management Pla...

dzjinius, the all-in-one production management platform for the media and entertainment industry, will showcase its centralised approach to production workflows...

12/05/2026

G&D and CT Square Launch New Joint Venture in India

Share Copy link Facebook X Linkedin Bluesky Email...

12/05/2026

Congress Urged to Protect Live Sports on Broadcast TV

Share Copy link Facebook X Linkedin Bluesky Email...

12/05/2026

Roku to Stream Inaugural Enhanced Games in North America

Share Copy link Facebook X Linkedin Bluesky Email...

12/05/2026

Austrian Broadcaster Selects Lawo, SLG Broadcast for Studio Upgrade

Share Copy link Facebook X Linkedin Bluesky Email...

12/05/2026

NUGEN Audio Debuts Enhanced DialogCheck at MPTS 2026

NUGEN Audio will debut Version 1.1 of DialogCheck, its intelligent dialog intelligibility and compliance tool, at the 2026 Media Production & Technology Show (S...

12/05/2026

Disguise Powers Interactive LED Castle for Laura Pausini

The reactive visuals can adapt to any stage or screen setup and comprise 50 layers of Notch content Today, Disguise announced that its high performance GX 3 m...

12/05/2026

DoPchoice Intros Light-Shaping Accessories for ARRI Omnib...

Snapgrid , Snapbag & Airglow Expand Creative Control for the ARRI Modular LED BarFollowing the launch of the ARRI Omnibar, DoPchoice introduces a dedicated li...

12/05/2026

NVIDIA and SAP Bring Trust to Specialized Agents

From finance and procurement to supply chain and manufacturing, specialized AI agents are moving into the enterprise systems where business decisions are made, ...

11/05/2026

Ross Production Services 48-Foot HyperMax-1 Truck Quickly in Demand

New production unit features not only wealth of Ross gear but also Sony cameras, Canon lenses, and a Calrec Argo S audio board...

11/05/2026

Solid State Logic Launches TCA Tour Portable Audio Production System

Solid State Logic (SSL) has announced TCA Tour, a portable fly-away audio production system built from System T components for broadcast, touring, and live prod...

11/05/2026

Riedel Communications Named Official Connectivity Integration Provider for Glasgow 2026 Commonwealth Games

Riedel Communications has announced it will serve as Official Connectivity Integ...

11/05/2026

G&D and NETGEAR AV Launch KVM-over-IP Plugin for Automated Network Configuration

Guntermann and Drunck (G&D) and NETGEAR AV have announced a plugin that automates network configuration for G&D KVM-over-IP deployments on NETGEAR AV Line infra...

11/05/2026

The Famous Group and Elite Edge Announce Unified Ownership

The Famous Group (TFG) and Elite Edge, a production studio specializing in live-action production, show opens, set design, and fabrication for sports and live e...

11/05/2026

SVG All-Stars: Odair Auger, Tech Project Manager, NBCUniversal Telemundo Enterprises

The native of S o Paulo, Brazil, will play a crucial role in the broadcaster'...

11/05/2026

Cobalt Digital to Showcase IPMX/ST 2110 Ecosystem at BroadcastAsia 2026

Cobalt Digital will exhibit its IPMX/ST 2110 product lineup at BroadcastAsia 2026 (Stand 5D2-4), anchored by the COBALT blueCORE family of standalone signal pro...

11/05/2026

Interra Systems to Demonstrate QC, Monitoring, and Captioning Solutions at BroadcastAsia 2026

Interra Systems will exhibit at BroadcastAsia 2026 (Stand 5C1-10), demonstrating...

11/05/2026

T-Mobile Deploys 5G and AI Tools Across 2026 PGA Championship

T-Mobile is serving as a technology partner for the 2026 PGA Championship at Aronimink Golf Club in Newtown Square, Pennsylvania, deploying 5G connectivity and ...

11/05/2026

Vizrt Named Official Technical Supplier of Eurovision Song Contest 2026

Vizrt has announced it has been named an Official Technical Supplier of the Eurovision Song Contest 2026, in partnership with host broadcaster ORF. The 70th edi...

11/05/2026

WNBA and AWS Announce Multi-Year Partnership

The WNBA and Amazon Web Services (AWS) have announced a multi-year partnership making AWS the Official Cloud and Cloud AI Partner of the WNBA. AWS is also joini...

11/05/2026

NAB 2026: kronehit Awards Lawo and SLG Broadcast Contract for Full Studio Modernization

Austrian private broadcaster kronehit publicly awarded SLG Broadcast AG and Lawo...

11/05/2026

Behind The Mic: PGA Championship Changing Weekday Announcer Lineup for 2026 Tournament

Behind The Mic provides a roundup of recent news regarding on-air talent, includ...

11/05/2026

L3Harris Reaches Over 50% in FAA Telecommunications Modernization

L3Harris has completed more than 50% of a nationwide effort to rebuild the invisible backbone of America's airspace....

11/05/2026

CBS Sacramento Debuts New AR/VR Studio

Share Copy link Facebook X Linkedin Bluesky Email...

11/05/2026

Gray Media Advances James Fitch to Senior VP, News Services

Share Copy link Facebook X Linkedin Bluesky Email...

11/05/2026

Smarter Monetization Stronger Retention and PurposeDriven...

As Asia Pacific's streaming market continues to evolve, operators across the region are rebalancing their strategies and shifting their focus from subscribe...

11/05/2026

QuickLink to Highlight Latest StudioEdge Models StudioPr...

LONDON, MAY 11, 2026 QuickLink, a leading provider of award-winning video production and remote guest contribution solutions, will showcase its latest innovat...

11/05/2026

Cobalt Digitals BroadcastAsia Line up Highlights Flexible...

Cobalt Digital's BroadcastAsia Line-up Highlights Flexible End-to-End IPMX/ST 2110 Ecosystem Award-winning blueCORE standalone signal processors at the he...

11/05/2026

Encompass and VideoMagic Launch Altitude Intelligence to...

Encompass Digital Media, in partnership with VideoMagic International, announced at NAB 2026 the launch of Altitude Intelligence, an AI-powered platform designe...

11/05/2026

Louis Libin Preps for the World Cup's Spectrum Crunch

Share Copy link Facebook X Linkedin Bluesky Email...

11/05/2026

Report: Broadcast Employment Hard Hit by AI

Share Copy link Facebook X Linkedin Bluesky Email...

11/05/2026

Alfalite installs 150 LED panels in Torre Iberdrola audit...

Alfalite, the only European manufacturer of LED screens, has completed a major new corporate installation in Spain with the upgrade of the auditorium at Torre I...

10/05/2026

Berklee Honors Jacob Collier, Jill Scott, and Vinnie Colaiuta at 2026 Commencement

Berklee Honors Jacob Collier, Jill Scott, and Vinnie Colaiuta at 2026 Commenceme...

09/05/2026

New SoundApp ARA Plugin Transforms Boris FX CrumplePop

New SoundApp ARA Plugin Transforms Boris FX CrumplePop Jessie Electa Petrov May 9, 2026 0 Comments CrumplePop 2026.5 streamlines music, voice, and cin...

09/05/2026

Celebrating a TV Legend: Sir David Attenborough

Share Copy link Facebook X Linkedin Bluesky Email...

09/05/2026

Playmetrics Acquires SportsEngine From Versant

Share Copy link Facebook X Linkedin Bluesky Email...

08/05/2026

NBC Sports, Peacock Scale Vertical-Video Production Across Sports - and Beyond - Ahead of FIFA World Cup 2026

NBC's 9:16 live-production workflows power Courtside Live, Rinkside Live, Vi...

08/05/2026

ESPN Pieces Together Production Strategy After WNBA's Unsettled Offseason

Native 1080p HDR productions will rely on a mix of remote workflows from Bristol and Los Angeles Despite another prosperous WNBA season in 2025, there was no g...

08/05/2026

SVG GameDay, Ep. 14: Buffalo Sabres' Autumn Bolton - Bringing Back Better Days at KeyBank Center

In-venue and creative video staffers at the professional and collegiate level ha...

08/05/2026

ESPN and Players Era Announce Multi-Year Broadcast Agreement for Mens College Basketball Championships

Players Era, an EverWonder Studio company, and ESPN have announced a multi-year ...

08/05/2026

BFBS Selects Synamedia to Build Next-Generation Digital Platform for UK Armed Forces

BFBS (British Forces Broadcasting Service) has announced the selection of Syname...

08/05/2026

UFC and Joe Hand Promotions Launch Championship Rounds Commercial Subscription Package

UFC and Joe Hand Promotions, UFC's exclusive commercial distributor in the U...

08/05/2026

Ease Live Granted US Patent for Interactive Streaming Overlay Technology

Ease Live, an Evertz company, has announced it has been granted US Patent No. 12,556,777 for its graphical overlay technology for interactive live streaming. Th...

08/05/2026

Ateme Deploys OTT Platform for VTVcabs VTVprime Service in Vietnam

Ateme has announced that VTVcab has deployed Ateme's OTT platform to power its VTVprime streaming service in Vietnam. The deployment covers live and on-dema...

08/05/2026

Amagi Launches In-Content Ads Marketplace for CTV Advertising

Amagi has announced the availability of its In-Content Ads offering via the Amagi ADS PLUS marketplace, providing programmatic access to ad formats inserted dir...