Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

North America Stories

03/06/2026

SES Launches Multi-Orbit Satellite Inflight Connectivity on Viva Airlines

SES and Viva, Mexico's ultra low-cost airline, have launched multi-orbit satellite inflight connectivity on Viva's Airbus aircraft. A total of 60 A320s ...

03/06/2026

CFP, ESPN, and TNT Sports Announce 2026-27 College Football Playoff Broadcast Schedule

The College Football Playoff, ESPN, and TNT Sports have announced kick times and...

03/06/2026

RED Digital Cinema to Host Panels and Demos at Cine Gear Expo 2026

RED Digital Cinema will exhibit at Cine Gear Expo 2026 (Booth 33, June 5-6, Universal Studios Lot), hosting three panels and hands-on product demonstrations. P...

03/06/2026

Roku Launches Soccer Zone for FIFA World Cup 2026

Roku has announced the Soccer Zone, a dedicated hub for FIFA World Cup 2026 content available across the United States, Canada, Mexico, Brazil, Colombia, Argent...

03/06/2026

Liverpool FC and Wasabi Technologies Renew Multi-Year Partnership

Liverpool FC and Wasabi Technologies have announced a multi-year extension of their partnership, with Wasabi continuing as the club's official cloud storage...

03/06/2026

Sky News Australia Deploys Grass Valley AMPP for Cloud-Based Newsroom Production

Grass Valley has announced that Australian News Channel (ANC), operator of Sky News Australia, has deployed Grass Valley AMPP as part of the relocation of its n...

03/06/2026

Gudrun Scharler Appointed CEO of Riedel Networks

The Riedel Group has announced the appointment of Gudrun Scharler as CEO of Riedel Networks. She succeeds Michael Martens, who has led Riedel Networks since 201...

03/06/2026

Clear-Com Deploys Arcadia Central Station and FreeSpeak II at Itaka Arena in Poland

Clear-Com has announced the deployment of its intercom solutions at Itaka Arena ...

03/06/2026

Scott Coker Announces Six Executive Hires for New MMA Promotion

Scott Coker has announced six senior executive appointments for his new global MMA promotion, which launched earlier this year with $60 million in financing. Ad...

03/06/2026

Telemundo Announces Digital and Social Media Plans for FIFA World Cup 2026

Telemundo, the exclusive Spanish-language home of the FIFA World Cup 2026, has announced its digital and social media programming for the tournament, running Ju...

03/06/2026

Bundesliga Launches AI Assistant 'Captain' in Official App, Developed With AWS

The Bundesliga has announced the launch of Captain, an AI assistant built into t...

03/06/2026

Anthony James Partners Serving as Technology Consultant for UNC Kenan Stadium Modernization

Anthony James Partners (AJP) is serving as technology consultant for a moderniza...

03/06/2026

Audio-Technica Presents 2025-26 Samurai and Presidents Awards

Audio-Technica has announced the recipients of its annual sales rep firm awards for the 2025-26 fiscal year. The awards were presented by Jim Schanz, Executive ...

03/06/2026

iodyne to Demo Multi-Cam Ingest and Color Workflows With RED and Adobe at Cine Gear Expo 2026

iodyne will exhibit at Cine Gear Expo 2026 alongside RED Digital Cinema and Adob...

03/06/2026

Women's College World Series 2026: ESPN Brings Out Cinematic Cameras, POVORA CapCams for All-Texas Championship

The broadcaster will deploy 39 cameras for game coverage, six cameras for studio...

03/06/2026

How Comcast Xfinity Made RealTime 4K, Multiview a Reality for World Cup Fans with Vito Forlenza

The 2026 FIFA World Cup is providing a great opportunity for not only Fox Sports...

03/06/2026

Sonys New PTZ Cameras Deliver 4K 60p; New STARVIS Sensor Meets Low-Light Demands

Sony Electronics is introducing the SRG-AS10, a 4K 60p compatible PTZ Auto Framing camera that uses Sony's proprietary AI to automatically recognize and tra...

03/06/2026

NBA Finals 2026: ESPN's 1080p HDR Experience Drives San Antonio Spurs vs. New York Knicks Title Series

Game Creek Video's Flagship A, B, C, and D unit will wind up its first year ...

03/06/2026

New Sponsor Spotlight: Caretta Researchs Evangelos Vrysellas Says Booming Sports-Rights Market May Be Hitting the Brakes

According to Caretta Research, the sports-rights market may be hitting the brake...

03/06/2026

Film Festival Watch: Catch These 11 Must-See Sundance Institute-Supported Films at Tribeca 2026

Retrieval still courtesy of Tribeca. By Jessica Herndon This year, the Tribeca...

03/06/2026

Comcast Xfinity to Deliver 4K Coverage of FIFA World Cup 2026

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

Sony to Show New PTZ Cameras at InfoComm 2026

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

Oklahoma City Thunder Tops Nielsen Ranking of Most Viewed NBA Teams

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

SmallHD Introduces OLED 16 4K Production Monitor

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

ANDREW WILSON DELIVERS AUTHENTIC SOUND FOR FILM USING NU...

For more than three decades, Re-recording Mixer Andrew Wilson, AMPS, CAS, has helped bring the natural world to the screen with exceptional audio enjoyed by mil...

03/06/2026

From Stadiums to Campuses to Corporate Studios- Telestrea...

Telestream, a global leader in media workflow technologies, will showcase its latest innovations for modern AV production environments at InfoComm 2026 (Booth N...

03/06/2026

DPA ADVANCES INTEGRATED AUDIO SOLUTIONS FOR PRO AV ENVIR...

DPA Microphones will present a comprehensive portfolio of integrated audio solutions designed to meet the evolving needs of today's professional AV environm...

03/06/2026

Lightware Expands Gemini GVN With Full-Featured USB-C

Lightware announces the GVN-HC-TX220AP, a new transmitter in the Gemini GVN 1G AV-over-IP family that introduces full-featured USB-C for professional 1Gb AV-ove...

03/06/2026

Minno Teams with Evergent to Enhance Faith-Based Streamin...

Evergent, the customer management and monetization leader for streaming and digital subscription businesses, and Minno, the global leader in faith-based content...

03/06/2026

Alfalite brings its LED technology to RTVE iconic Teatro...

Alfalite, Europe's only LED display manufacturer, has completed a new broadcast installation with the deployment of two UHD Finepix 1.5 MATIX AlfaCOB LED di...

03/06/2026

Broadcast Solutions delivers new Ultra HD outside broadca...

Broadcast Solutions, a leading systems integrator and provider of innovative solutions for the broadcast media industry, is completing a contract to build eight...

03/06/2026

NAB Launches Merkhet Solutions to Advance Deployment of BPS

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

FCC Kicks Off First Spectrum Auction in Four Years

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

Utopai Studios Launches Pai 2.0, Cinematic Storytelling AI System

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

Roku Introduces NHL Zone

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

Study: Record Revenue Expected for FIFA World Cup 2026

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

Frequency Launches In-Scene Advertising

Share Copy link Facebook X Linkedin Bluesky Email...

03/06/2026

Creamsource Introduces Vortex2 and Vortex2 Soft

Creamsource, known for its tried-and-true Vortex Series of cinematic lighting, has announced the Vortex2 (V2) and Vortex2 Soft (V2S), two compact additions to t...

03/06/2026

Faster, More Flexible AI Matting Fuels Boris FX Silhouette

Faster, More Flexible AI Matting Fuels Boris FX Silhouette Jessie Electa Petrov June 2, 2026 0 Comments The 2026 release helps artists tackle complex ...

03/06/2026

La Tl's National League Hockey Expansion Powered by Blackmagic Design

La T l 's National League Hockey Expansion Powered by Blackmagic Design Brie Clayton June 2, 2026 0 Comments Blackmagic Videohub 120 120 12G provi...

03/06/2026

ZY Optics to Offer Exclusive First Look at Zone T1 Cine Kit at Cine Gear Expo 2026

ZY Optics to Offer Exclusive First Look at Zone T1 Cine Kit at Cine Gear Expo 20...

03/06/2026

Berklee Alumni St. Vincent and Ruby Plume to Perform Together on St. Vincents 2026 Live with Orchestra Tour

Berklee Alumni St. Vincent and Ruby Plume to Perform Together on St. Vincents 20...

03/06/2026

Built Different: Inside Colossal Biosciences & the Big Bet ...

Built Different: Inside Colossal Biosciences & the Big Bet to Make Extinction Reversible Published on Jun 3, 2026 Categories: Featured News LinkedIn Corpor...

03/06/2026

NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI

At CVPR, NVIDIA is unveiling new physical AI agent skills that help researchers ...

03/06/2026

NVIDIA Research Unlocks Advanced Grasping, Smarter Autonomous Driving and Agent Training at Scale

What makes a robot gripper useful isn't that it can pick up one object - it&...

02/06/2026

Case Study: Tennis Channel Transitions From Satellite to IP Distribution With LTN

Tennis Channel has completed a transition from satellite-based distribution to a...

02/06/2026

Daktronics Announces 2026 High School Video Summit for June 23-24

Daktronics has announced the 2026 High School Video Summit, a two-day educational event for high school educators and student production teams, taking place Jun...

02/06/2026

Fandango to Screen Telemundos FIFA World Cup 2026 Coverage at 160 Theaters Nationwide

Fandango will bring Telemundo's live Spanish-language coverage of the FIFA W...

02/06/2026

AWSN Announces Broadcast Schedule for Inaugural Professional Softball League Season

All Women's Sports Network (AWSN) has announced the live television schedule...

02/06/2026

Spalk Announces Partnerships with Ligue 1, Euroleague, and LNB

Spalk, a cloud-based multilingual commentary and production platform, has announced three new partnerships: Ligue 1 (English and Portuguese highlights), Eurolea...