Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

North America Stories

30/04/2026

PWHL Reports Record Growth in Third Regular Season as Playoffs Begin April 30

The Professional Women's Hockey League (PWHL) concluded its third regular season on Saturday, reporting growth across attendance, viewership, digital engage...

30/04/2026

NBC Sports Launches National Sunday MLB Coverage Beginning May 3

NBC Sports will air national MLB coverage on Sundays beginning May 3, with MLB Sunday Leadoff on Peacock and NBCSN at 12:30 p.m. ET, followed by the debut of th...

30/04/2026

Clear-Com Appoints Brian Grahn and Ben Turnwell to New Roles

Clear-Com has appointed Brian Grahn as Market Outreach Manager of the Americas and Ben Turnwell as Business Development Manager for EMEA live. Grahn joined Cle...

30/04/2026

ARRI Introduces cforce MAX Lens Motor for Hi-5 Lens Control System

ARRI has introduced the cforce MAX, a new lens motor for the Hi-5 lens control system. The cforce MAX is twice as fast as the cforce plus motor it replaces whil...

30/04/2026

Knuerr, Voxtronic, and IHSE to Present Integrated Control Room Solution at Airspace World

Knuerr, Voxtronic, and IHSE will jointly present an integrated control room solu...

30/04/2026

The CW Network and ESPN to Stream CW Sports Live Events on ESPN App

The CW Network and ESPN have announced an agreement to make the ESPN App the exclusive streaming home for all CW Sports live events. CW Sports will continue to ...

30/04/2026

Sennheiser Spectera Deployed on Ed Sheerans The Loop Global Stadium Tour

Ed Sheeran's The Loop' tour launched in Auckland in January 2026 before moving on to Australia, with South America and the United States to follow late...

30/04/2026

Audinate Launches Dante Preset Creator for Offline Network Configuration

Audinate has announced Dante Preset Creator, a free online tool for configuring Dante network settings before hardware is available on site. Presets created in ...

30/04/2026

Yahoo Sports Appoints Jarrod Schwarz as General Manager

Yahoo Sports has announced the appointment of Jarrod Schwarz as General Manager of Yahoo Sports. Schwarz will oversee product, design, and technology; revenue a...

30/04/2026

Nielsen: U.S. Viewers Spent 79.8 Billion Minutes Watching Soccer in 2025

Nielsen has released a new report, Get Ready with Media Intelligence: 2026 FIFA World Cup Edition, examining U.S. soccer viewership trends, fan engagement, and ...

30/04/2026

USA Lacrosse Names SportsEngine Preferred Youth Sports Management Platform Partner

USA Lacrosse and SportsEngine have announced an expanded partnership, naming Spo...

30/04/2026

Telos Alliance To Appear at MPTS 2026 Across Multiple Partner Booths

Telos Alliance will participate in the 2026 Media Production and Technology Show (MPTS), taking place May 13-14 at Olympia London. Rather than exhibiting from a...

30/04/2026

DAZN Bolsters U.S. Ambitions With ViewLift Acquisition, Targets Evolving Regional Sports Landscape

The global streamer buys the U.S. DTC platform solutions provider for a reported...

30/04/2026

Tigo Sports Upgrades Video Infrastructure with Ateme Technology

Tigo Sports, Paraguay's leading sports broadcaster, has upgraded its video infrastructure with Ateme solutions for live encoding, multiplexing, and signal c...

30/04/2026

World Rugby and IMG Announce Long-Term Media Rights Partnership

World Rugby and IMG have announced a long-term media rights partnership focused on growing rugby in the United States ahead of the Men's and Women's Rug...

30/04/2026

NWSL & Overtime Re-Up Gen-Z Focused Content Partnership

For the second year in a row, Overtime and the National Women's Soccer League (NWSL) are teaming up through a renewed content partnership to bring fans even...

30/04/2026

ESPN Executive Vice President David Roberts Set To Retire

The 22-year ESPN vet's responsibilities will reportedly be taken over by SVP Mike Foss...

30/04/2026

SVG GameDay, Ep. 13: Anaheim Ducks' Scott Fausneaucht - Skating with the Ducks of Orange County

In-venue and creative video staffers at the professional and collegiate level ha...

30/04/2026

Prime Video Announces Multiyear Agreement with Duke Mens Basketball

Amazon and Duke University have announced a multiyear agreement for Prime Video to present exclusive coverage of three Duke Blue Devils men's basketball neu...

30/04/2026

Ratings Roundup: NBA Playoffs Return to NBC Sports up 38%; UFL Viewership up Midway Through Regular Season

Ratings Roundup is a rundown of recent rating news and is derived from press rel...

30/04/2026

L3Harris Technologies Reports Strong First Quarter 2026 Results

MELBOURNE, Fla., April 30, 2026 - L3Harris Technologies (NYSE: LHX) reports first quarter 2026 results. Highlights Orders of $7.8 billion; book-to-bill of 1....

30/04/2026

CPI Media Leverages QuickLink StudioCall for Integrating...

STA VENERA, MALTA, APRIL 29, 2026 CPI Media, a voluntary organization within the Missionary Society of St Paul (MSSP) and a leading media house dedicated to p...

30/04/2026

Hitomi Broadcast Presents UK Debut of MatchBox Panorama a...

New software platform delivers comprehensive timing measurement across production workflows...

30/04/2026

Cutting edge British innovation leads at BroadcastAsia 20...

Once again, the UK Pavilion in Hall 5 of BroadcastAsia 2026 will feature the latest and best in technology specifically developed and tailored for modern media ...

30/04/2026

Avid powers faster workflows and next-generation immersive audio with latest Pro Tools release

Avid powers faster workflows and next-generation immersive audio with latest Pro...

30/04/2026

Ocean Blue Software Launches ATSC 3.0 Inspector for Smart TVs

Share Copy link Facebook X Linkedin Bluesky Email...

30/04/2026

Digital Azul positions Lisbon as a remote production hub...

Scalable broadcast-grade production over public internet, replacing traditional OB workflows...

30/04/2026

V Nova Brings DTV Deployment LCEVC Ecosystem Progress and...

Live demonstrations highlight LCEVC ecosystem momentum, AI-powered video pipelines, and expansion across broadcast, streaming, and social media. DTV (TV 3.0) ...

30/04/2026

Student Spotlight: Matthew Leon

Student Spotlight: Matthew Leon The dual major shares his path from community college to Berklee, and how his heritage influences his work. April 29, 2026 B...

30/04/2026

Berklee Artists to Perform at Major Global Music Festivals

Berklee Artists to Perform at Major Global Music Festivals As part of the Berklee Popular Music Institute, students will perform at Lollapalooza, Governors Ba...

30/04/2026

Study: Rokus Low-Cost, Ad-Free Howdy Streamer Hits 1 Million Subs

Share Copy link Facebook X Linkedin Bluesky Email...

30/04/2026

MLS Innovation Lab Selects AI Partners

Share Copy link Facebook X Linkedin Bluesky Email...

30/04/2026

NRB Files FCC Complaint Over Jimmy Kimmel Live!' Monologue

Share Copy link Facebook X Linkedin Bluesky Email...

30/04/2026

D.C. Court Denies Emergency Stay of Nexstar/Tegna Merger

Share Copy link Facebook X Linkedin Bluesky Email...

30/04/2026

NAB Criticizes FCC for Ordering Early Renewal of ABC-Owned Stations

Share Copy link Facebook X Linkedin Bluesky Email...

30/04/2026

FCC Approves Station Swaps Between Scripps and Gray Media

Share Copy link Facebook X Linkedin Bluesky Email...

30/04/2026

Telestream Introduces Pulse - a Software-Defined Test an...

A flexible monitoring platform designed to simplify ST 2110 operations, consolidate vendor tools, and support modern live production environments. See it at NAB...

30/04/2026

Langlev Takes Run Amok to Sundance with Zeiss Supreme Pri...

Sundance-premiering Run Amok is an expressive, unconventional take on today's teen experience, replete with musical numbers. When cinematographer Shachar ...

30/04/2026

Studio Technologies Elevates Jacksonville State Universit...

JACKSONVILLE, AL, APRIL 29, 2026 Jacksonville State University, known as Jax State and a proud NCAA Division I member of Conference USA, has transformed its a...

30/04/2026

Knowledge Network selects ThinkAnalytics to launch AI-pow...

Transforming viewer and content data into real-time intelligence to deliver relevant streaming experiences at scale ThinkAnalytics, the global leader in AI-pow...

30/04/2026

Sports Production, Delivery is Big Biz at NAB

Sports Production, Delivery is Big Biz at NAB Andy Marken April 29, 2026 0 Comments Hero image source: NAB One of the neat things about trade shows i...

30/04/2026

WideOrbit at NAB 2026: WO Aurora Takes Center Stage

NAB Show 2026 was a big one for WideOrbit. From a live broadcast on the show floor to a Best of Show award, our team had an incredible few days in Las Vegas. We...

30/04/2026

Telos Alliance To Highlight Audio Processing, Intercom, and Workflow Solutions at MPTS 2026

Telos Alliance To Highlight Audio Processing, Intercom, and Workflow Solutions ...

30/04/2026

'The Map of Longing' Arrives on Netflix Next July 17

Back to All News The Map of Longing Arrives on Netflix Next July 17 Entertainment 30 April 2026 GlobalSpain Link copied to clipboard WE ARE ALL MADE OF SM...

30/04/2026

Saif Ali Khan Anchors a Story of Duty and Dilemma, Red Chillies Entertainment's Next with Netflix, Kartavya'; Out on 15 May

Back to All News Saif Ali Khan Anchors a Story of Duty and Dilemma, Red Chillie...

30/04/2026

Netflix has released the trailer for Emi Martnez: The Kid Who Stops Time, which premieres on May 28

Back to All News Netflix has released the trailer for Emi Mart nez: The Kid Who...

30/04/2026

A Bunch of Goofy Misfits Out to Save the World: Netflix Releases Main Trailer for The WONDERfools'

Back to All News A Bunch of Goofy Misfits Out to Save the World: Netflix Releas...

30/04/2026

Introducing Exciting New Ways to Find and Enjoy Your Next Favorite on Mobile

Back to All News Introducing Exciting New Ways to Find and Enjoy Your Next Favorite on Mobile Product 30 April 2026 Global Link copied to clipboard Downlo...

30/04/2026

It's Gonna Be May: 16 Games Hit the Cloud This Month, With More NVIDIA GeForce RTX 5080 Power

It's gonna be May - and the cloud's in full festival mode. 16 games ar...