Sony Pixel Power calrec Sony

Fast, Low-Cost Inference Offers Key to Profitable AI

23/01/2025

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:

NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.

NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.

NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service

Google Cloud's Vertex AI, Google Kubernetes Engine

Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service

Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
LINK: https://blogs.nvidia.com/blog/ai-inference-platform/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

06/09/2026

Dolby and MagentaTV Bring Fans Closer to the FIFA World Cup 2026 in Germany with Dolby Vision and Dolby Atmos

June 9 2026, 23:00 (PDT) Dolby and MagentaTV Bring Fans Closer to the FIFA Worl...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

11/06/2026

HBSs Johannes Franken on Digital Innovations, the Role of the Influencer at the 2026 FIFA World Cup

The immense size of the tourney and its Atlantic-spanning operation also disting...

11/06/2026

Nielsen: Soccer Fandom in North America Tops 136 Million, Up 10.9% in Five Years

Nielsen has released a new soccer fandom consumer research report, The Fans Behind The Game: FIFA World Cup 2026 Edition, examining the soccer audience in the...

11/06/2026

Telemundo Announces All-Day Opening Day Coverage for FIFA World Cup 2026 on June 11

Telemundo will launch its FIFA World Cup 2026 coverage on Thursday, June 11 with...

11/06/2026

Fubo Announces Distribution Agreement With NBCUniversal

FuboTV Inc. has announced a distribution agreement with NBCUniversal. Fubo customers can now stream Telemundo and Universo, with NBC Sports Network (NBCSN), NBC...

11/06/2026

DAZN Announces In-App Features for FIFA World Cup 2026 Coverage in Spain, Italy, and Japan

DAZN has announced its in-app features for FIFA World Cup 2026 coverage in Spain...

11/06/2026

Roblox Report: Sports Engagement on Platform Drives Real-World Fandom and Purchases

Roblox has released the 2026 Roblox Digital Expression Report: Wave 4 - Sports D...

11/06/2026

Andrea Bocelli, David Guetta, Megan Thee Stallion, and EJAE Release Official FIFA World Cup 2026 Anthem DNA'

FIFA has unveiled DNA, the Official FIFA World Cup 2026 Anthem, performed by A...

11/06/2026

ESPN Announces Extensive English- and Spanish-Language World Cup 2026 Coverage

ESPN will provide English- and Spanish-language news and information coverage of FIFA World Cup 2026 across its U.S. media platforms from June 11 through July 1...

11/06/2026

SVG Students To Watch: Teddy Batkin, Rochester Institute of Technology

The latest product of the outstanding RIT Sports Network program, this recent grad from Long Island is carving out a promising path in broadcast engineering In...

11/06/2026

DAZN and DSPORTS Announce Distribution Agreement Across Five Latin American Countries

DAZN has announced a multi-year agreement to make DSPORTS channels available to ...

11/06/2026

Resource Actors Throughout the Years at Sundance Institute's Directors Lab

Laura Dern at the 1986 Sundance Institute Directors Lab (Photo by Eric Edwards) By Lucy Spicer It takes a village to bring together the Sundance Institute lab...

11/06/2026

Introducing a New Standard for Podcast Plays and Upgraded Creator Analytics Experience

As podcast formats evolve in the streaming era, podcasting needs updated, transp...

11/06/2026

RADAR Italia Unveils 6 New Artists and a New Approach for 2026

As Spotify's global RADAR program enters its sixth year in Italy, a new class of artists is stepping into the spotlight. Today, we're announcing the six...

11/06/2026

5 Audiobooks that Amplify and Celebrate Queer Voices

Pride Month is a time for celebration, reflection, and amplifying the diverse stories and perspectives from the LGBTQIA+ community that enrich our world. To hel...

11/06/2026

VSL introduce Synchron Solo Violin 1 & Cello (sordino)

First in new line of muted string libraries VSL have just announced the launch of two new string libraries that represent the first two instalments in a new...

11/06/2026

Novation reveal the Launchkey 61 MK4 White

New colour option for 61-key Launchkey MK4 At Superbooth 2025, Novation introduced the Launchkey Mini 37 White and Launchkey 49 White, bringing an additiona...

11/06/2026

Arturia announce the MiniLab 37

Larger, but still compact! Arturia's popular compact MIDI controller keyboard is now available in a, well, slightly less compact version! The new MiniLa...

11/06/2026

Eurosatory 2026: Rohde & Schwarz shapes the new-generation battlefield

Eurosatory 2026: Rohde & Schwarz shapes the new-generation battlefield Rohde & Schwarz unveils next generation SIGINT/EW and CUAS solutions on uncrewed system...

11/06/2026

Rohde & Schwarz unveils NEMACS - Directional, ultra secure connectivity for the future battlefield

Rohde & Schwarz unveils NEMACS - Directional, ultra secure connectivity for the ...

11/06/2026

MTI FILM acquires Mango/New Edit

MTI FILM acquires Mango/New Edit Posted by MTI Film on June 10, 2026 LOS ANGELES, CA - June 2026 - MTI FILM, the multiple Emmy Award winning Hollywood post-p...

11/06/2026

Ungrounded LLM Fabricates Every Detail for Nearly 1 in 5 Movie and TV Titles Tested, New Gracenote Report Finds

Study underscores the need for authoritative content intelligence to build trust...

11/06/2026

PTZOptics, LayerJot Partner on AI-Powered PTZ at InfoComm 2026

Share Copy link Facebook X Linkedin Bluesky Email...

11/06/2026

Chyron Unveils PAINT 10.4

Share Copy link Facebook X Linkedin Bluesky Email...

11/06/2026

Maxon Brings Real-Time Architectural Visualization to AIA26 With New Redshift for Revit and Archicad Integration Beta

Maxon Brings Real-Time Architectural Visualization to AIA26 With New Redshift fo...

11/06/2026

ABC Kid's Caper Crew Shoots Australian Adventure with Blackmagic Design

ABC Kid's Caper Crew Shoots Australian Adventure with Blackmagic Design Brie Clayton June 11, 2026 0 Comments DP Judd Overton and team bring Wes A...

11/06/2026

PTZOptics and LayerJot demo Visual Reasoning at InfoComm...

PTZOptics, and LayerJot today announced live demonstrations at InfoComm 2026 showing how prompt-based AI, robotic camera control, and high-performance computing...

11/06/2026

Lightware launches GPIO Button to deliver simplified hard...

Lightware, an industry leader in signal management, announces the release of GPIO-Button-10S, a dedicated control interface enabling straightforward press-to-a...

11/06/2026

NABs LeGeyt Urges Congress to Limit NFL's Antitrust Exemption

Share Copy link Facebook X Linkedin Bluesky Email...

11/06/2026

Fubo Inks New Distribution Agreement with NBCUniversal

Share Copy link Facebook X Linkedin Bluesky Email...

11/06/2026

Kiloview to Showcase Broadcast-Grade AV-over-IP Solutions...

Kiloview, a leading innovator in AV-over-IP video solutions, will return to InfoComm 2026 (Booth# N8327) with broadcast-grade AV-over-IP solutions designed for ...

11/06/2026

Australian Games Industry Glossary of Terms

Australian Games Industry Glossary of Terms 10 June 2026 From DAU and EULA to COT and QADE, here's a list of game industry terms, industry jargon and their...

11/06/2026

Berklee's Tonya Butler Named Music Business Educator of the Year

Berklee's Tonya Butler Named Music Business Educator of the Year The Music Business Association honored Butler at its annual Bizzy Awards. June 10, 2026 ...

11/06/2026

Ann Mincieli to Receive Honorary Doctorate at Berklee NYC Graduate Commencement

Ann Mincieli to Receive Honorary Doctorate at Berklee NYC Graduate Commencement The five-time Grammy-winning engineer and producer, known for her longstanding...

11/06/2026

Daisy May Cooper rallies the nation ahead of ICC Womens T20 World Cup

Thursday 11 June 2026 Daisy May Cooper rallies the nation ahead of ICC Women's T20 World CupTurn on cookies to view this content. Go to Privacy options and...

11/06/2026

Hadewych Minis and Geert van Rampelberg to Star in New Netflix Series Directed by Paula van der Oest

Back to All News Hadewych Minis and Geert van Rampelberg to Star in New Netflix...

11/06/2026

Official Trailer for Anime Adaptation of Thunder 3' Unveiled Ahead of July 9 Premiere

Back to All News Official Trailer for Anime Adaptation of Thunder 3' Unvei...

11/06/2026

RT Radio 1 and Irish Lights mark RT 100 with special broadcasts from Ireland's Lighthouses

Summer solstice shows from C il House and Late Date from 9pm on Saturday 20 Jun...

11/06/2026

Save Big and Play Bigger: GeForce NOW Summer Sale Brings Major Membership Savings

The GeForce NOW summer sale kicked off today with limited-time savings of up to ...

10/06/2026

SVG Sit-Down: Team Whistle's Joe Caporoso on Building World Cup Content Around Fans, Culture, IRL Experiences

DAZN-owned digital-media company launches three fan-first series leaning into cr...

10/06/2026

Clear-Com Appoints Jason Dino as Southwest Regional Sales Manager

Clear-Com has announced the appointment of Jason Dino as Southwest Regional Sales Manager USA, covering Southern California and the Southwest region. Dino joins...

10/06/2026

Caretta Research: 2026 World Cup Revenue Growth Due to More Matches; Rights Revenue Up 32%

An 11% decrease in number of global broadcast deals reflects the organization...

10/06/2026

Women Without Boundaries Awards Are Back!

The Women Without Boundaries Awards recognize women whose work is advancing the future of media, broadcast, AV, workplace technology, digital experience, and re...

10/06/2026

On Eve of World Cup Kickoff, FIFA and HBS Offer Deep Dive into IBC Operations, Commentary, and Ref Cam

Today is match day minus two for FIFA and HBS. On Thursday, there will be two ma...

10/06/2026

SES Supporting World's Biggest Soccer Tournament Broadcast Distribution Worldwide

SES is supporting broadcast distribution of the world's biggest football tou...

10/06/2026

BirdDog Achieves Full NDI 6.3 Compatibility Across Entire Product Line

NDI has announced that BirdDog has become the first hardware manufacturer to achieve full NDI 6.3 compatibility across its complete lineup of cameras, encoders,...

10/06/2026

Emmy Award-Winning Audio Team To Present at SVG Audio Symposium

Vince Caputo and Scott Carter, winners of the 2026 Sports Emmy for Outstanding Post Produced Audio have been announced as presenters for the 2026 SVG Advanced A...