Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

Most recent headlines

18/12/2025

SVG Campus Shot Callers: Kurt Sutton, Director of Broadcast Operations, Clemson University

SVG Campus Shot Callers: Kurt Sutton, Director of Broadcast Operations, Clemson ...

18/12/2025

Follow the Money Episode 2: Inside the Sports Media Biz with Sam McCleery and Steve Hellmuth

Follow the Money Episode 2: Inside the Sports Media Biz with Sam McCleery and St...

18/12/2025

SVG Sit-Down: Google Cloud's Anshul Kapoor on the Future of Generative Production' in Live Sports

SVG Sit-Down: Google Cloud's Anshul Kapoor on the Future of Generative Prod...

18/12/2025

The 2025 SVG Summit Draws Record Crowd for 20th-Annual Sports-Production Industry Homecoming in NYC

The 2025 SVG Summit Draws Record Crowd for 20th-Annual Sports-Production Industr...

18/12/2025

SBS's sports schedule sizzles in January with Dakar Rally, Kooyong Classic and Mapei Cadel Evans Great Ocean Road Race

SBS's sports schedule sizzles in January with Dakar Rally, Kooyong Classic a...

18/12/2025

Montreal's Bell Centre elevates fan experience with Argo S

Canada's largest indoor arena has transformed its live production capabilities with a full ST 2110 infrastructure and Calrec's compact Argo S console. S...

18/12/2025

The Gauge: Mexico November 2025

During November, streaming's share of TV viewing in Mexico settled at 24.2%, an increase of 0.5 share points from the previous month. Disclaimer: YUMI TV,...

18/12/2025

The Gauge: Poland | November 2025

November continued the upward trend in television viewership. The significantly colder weather and a rich programming lineup encouraged viewers to spend more ti...

18/12/2025

Gracenote helps TV platforms go beyond the game and deliver more connected, visually rich sports hub experiences

As viewers turn to sports highlights, recaps and documentary programming, expand...

18/12/2025

NAB Once Again Urges FCC to Eliminate Ownership Rules

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

18/12/2025

Carr Stands Up for His Policies in Senate Hearing

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

18/12/2025

The HELM and ARRI announce strategic partnership to redef...

The HELM, a global expert in cinematic live broadcast and high-end production workflows, has entered a strategic partnership with ARRI, the renowned designer an...

18/12/2025

Cadena Melodia Upgrades to DHD SX2 Audio Production Conso...

Cadena Melod a de Colombia (Cadena Melod a), a long-established Colombian radio network, has chosen DHD audio SX2 production consoles for integration into the m...

18/12/2025

Czech TV Elevates Video Streaming with Harmonic

Harmonic (NASDAQ: HLIT) today announced that Czech Television (Czech TV), the public broadcaster of the Czech Republic, has teamed up with Harmonic to modernize...

18/12/2025

Broadcast Solutions Group acquires PMT Professional Motio...

Broadcast Solutions Group, a leading system integrator and provider of innovative solutions for the broadcast and media industry, has announced the acquisition ...

18/12/2025

Keepit named a Leader in IDC MarketScape for Worldwide Sa...

Keepit, the SaaS data protection company, announced today that it has been named a Leader in the IDC MarketScape: Worldwide SaaS Data Protection 2025-2026 Vendo...

18/12/2025

Limecraft 2025 Version 8 adds User Controlled Notificatio...

Limecraft today announced the release of Limecraft 2025.8, the eighth and final major platform update of the year. This release strengthens daily workflows acro...

18/12/2025

creativespace Expands Footprint in the House of Worship M...

DigitalGlue is very grateful, especially at this time of the year, that its creative.space platform has expanded its footprint within the House of Worship marke...

18/12/2025

TAG Video Systems Celebrates Multiple APAC Award Wins for...

TAG Video Systems is proud to share that the company has recently received multiple industry recognitions across the Asia-Pacific region, reflecting its ongoing...

18/12/2025

NDI and Zoom team up to bring seamless connectivity to me...

NDI, the leading video connectivity standard for AV-over-IP, and Zoom, the AI-first collaboration platform, announce a strategic collaboration to integrate the ...

18/12/2025

YES and Synamedia extended deal backs Partner TV launch

Leading video software provider, Synamedia, today announced that it is extending its long-standing relationship with YES, the pay-TV subsidiary of the largest I...

18/12/2025

Riedel Builds Global Communication and Commentary Network...

Riedel Communications today announced it provided a fully integrated communications and commentary solution for the 15th National Games of China, supporting 56 ...

18/12/2025

Clear-Com Arcadia Central Station Links Toledo Walleye an...

When both the Toledo Walleye and Toledo Mud Hens play at home on the same night, communication between their respective production teams is essential. To stream...

18/12/2025

TMT Insights Focus Platform Recognized with TV Tech Best...

TMT Insights' new upstream media supply chain platform, Focus, was selected as a winner in the 2025 Media & Entertainment: Best in Market Awards in the TV T...

18/12/2025

Clear-Com Named Official Intercom Partner for NAMMs 125th...

Clear-Com is proud to announce its continued role as the official intercom supplier for the Yamaha Grand Plaza Stage at The 2026 NAMM Show, taking place Januar...

18/12/2025

CES: NBCU Unveils New Cross-Platform Ad Tech Solutions, Capabilities

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

18/12/2025

2026 NAB Show Opens Registration, Unveils Major Program Enhancements

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

18/12/2025

YouTube Wins Global Rights to Stream the Oscars

Share Share by: Copy link Facebook X Whatsapp Pinterest Flipboard...

18/12/2025

PGA TOUR Studios Teams up with SES for Hybrid Content Distribution

Long-term agreement includes the SES SCORE platform and hybrid distribution worldwide to deliver more than 5,000 hours of golf tournaments annually featuring th...

18/12/2025

Master Clock Management with Segment Rulesets in WO Automation for Radio

Talk formats require careful clock management and system tools to ensure audio content aligns as intended. WO Automation for Radio's Segment Rulesets provid...

18/12/2025

Reflecting on 2025: A Year of Transformation and Growth

By Toni Coonce, CEO, WideOrbit As 2025 comes to a close, I find myself reflecting on how much WideOrbit has evolved, not only in products and solutions but also...

18/12/2025

VEON Upgraded to Nasdaq Global Select Market, Enhancing Investor Visibility

18 Dec 2025 VEON Upgraded to Nasdaq Global Select Market, Enhancing Investor Visibility Dubai, December 18, 2025 - VEON Ltd. (Nasdaq: VEON), a global digital o...

18/12/2025

Tribeca X Launches Inaugural Advisory Council, Teases 2026 Awards Jury, and Announces New Global Programming

December 18th, 2025 Tribeca X Launches Inaugural Advisory Council, Teases 202...

18/12/2025

Tribeca Becomes First Major Film Festival to Open Submissions to Social Media Creators

December 18th, 2025 As Tribeca Celebrates Its 25th Anniversary, Festival Expa...

18/12/2025

Sky Sports remains the exclusive home of the Masters Tournament, with more live coverage than ever before

Thursday 18 December 2025 Sky Sports remains the exclusive home of the Masters ...

18/12/2025

Teaser for Can This Love Be Translated' Previews a Heartwarming Romance To Open 2026

Back to All News Teaser for Can This Love Be Translated' Previews a Heartw...

18/12/2025

2025-11-18

Using the additive process of 3D printing, layer after layer gets printed until an object is as close to the final shape needed as possible. Historically, machi...

18/12/2025

RT Supporting the Arts 2025 Review | January 2026 Events

In 2025, RT proudly supported 185 arts and cultural events across the island of Ireland, reflecting significant growth since the scheme was re-launched in 2014...

18/12/2025

The RT Sport Young Sportsperson of the Year Nominees 2025 Revealed

RT Sports Awards 2025 live on RT One and RT Player at 8:05pm on Saturday 20 December On Saturday 20 December live on RT One and RT Player at the earlier t...

18/12/2025

RT lyric fm celebrates the Winter Solstice with a special Ambient Orbit live broadcast

RT lyric fm presents a very special Winter Solstice edition of Ambient Orbit, l...

18/12/2025

Now Generally Available, NVIDIA RTX PRO 5000 72GB Blackwell GPU Expands Memory Options for Desktop Agentic AI

Top-notch options for AI at the desktops of developers, engineers and designers ...

18/12/2025

Celebrating 100 Years of Public Broadcasting in Ireland in 2026

At 7.45pm on 1st January 1926, the precursor to RT , then 2RN, delivered the fledgling new Irish state's first public radio transmission. From those first c...

18/12/2025

Deck the Vaults: Fallout: New Vegas' Joins the Cloud This Holiday Season

Step out of the vault and into the future of gaming with Fallout: New Vegas streaming on GeForce NOW, just in time to celebrate the newest season of the hit Ama...

18/12/2025

The Movie Experience SLO Becomes First U.S. Exhibitor to Adopt Dolby Vision+Atmos Theatrical Solution

December 18 2025, 05:30 (PST) The Movie Experience SLO Becomes First U.S. Exhib...

17/12/2025

The EU Investigative Journalism Award 2025: bold reporting, regional impact, and rise in public-interest journalism

Investigative journalists across the Western Balkans and T rkiye continue to con...

17/12/2025

Sports Broadcasting Hall of Fame Inducts 10 Industry Icons During Unforgettable Night

Sports Broadcasting Hall of Fame Inducts 10 Industry Icons During Unforgettable ...

17/12/2025

ESPN to Debut MNF Playbook with Next Gen Stats, a New AI-Driven NFL Data-AltCast

ESPN to Debut MNF Playbook with Next Gen Stats, a New AI-Driven NFL Data-AltCastThe series, powered by Adrenaline TruPlay AI, launches Dec. 22 and runs through ...

17/12/2025

Inaugural Optum Golf Channel Games Debut Under the Lights' in Primetime on Golf Channel and USA Network

Inaugural Optum Golf Channel Games Debut Under the Lights' in Primetime on ...

17/12/2025

Ring In the New Year With New Playlists Mixed by Artists, and More Spotify Hacks

The right playlist is essential on New Year's Eve, building the energy as you get ready and keeping it high as you count down to midnight. This year, Spotif...

17/12/2025

Clear-Com's Arcadia Central Station Links Toledo Walleye and Mud Hens Venues with...

eds3_5_jq(document).ready(function($) { $(#eds_sliderM519).chameleonSlider_2_1({...