
Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.
NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.
AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.
But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.
Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.
Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.
NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:
NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.
NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.
NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.
Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.
With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.
Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:
Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service
Google Cloud's Vertex AI, Google Kubernetes Engine
Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service
Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine
Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.
For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.
Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.
The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.
From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.
Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.
Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.
Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
North America Stories
28/10/2025
ESPN Announces Monsters Funday Football', Its Latest Real-Time Animated Bro...
28/10/2025
SVG All-Stars: Catherine Chalfant, Manager, Remote Operations, ESPNThe Ole Miss alum is an operational force behind ESPN's extensive college-football catalo...
28/10/2025
Elevating the experience: AI and data take Ryder Cup to the next level By Joe OHalloran
Tuesday, October 28, 2025 - 10:25
Print This Story
NBC produced th...
28/10/2025
Conquering the Air (waves): Taking a close up look at the IBC Accelerator Priva...
28/10/2025
World Series 2025: Spectrum SportsNet LA Brings Dodgers Fans Closer to the Actio...
28/10/2025
Dylan Southern and Benedict Cumberbatch at the premiere of The Thing with Feathers (photo by George Pimentel / Shutterstock for Sundance Film Festival)...
28/10/2025
Disney, NBCUniversal, FOX, Paramount Each Achieve Double-Digit Monthly Growth
...
28/10/2025
CINCINNATI The E.W. Scripps Company has announced an agreement to sell WRTV, its local ABC-affiliated station in Indianapolis, to Circle City Broadcasting for $...
28/10/2025
Berklee College of Music and Berklee Valencia Named to Billboards 2025 Top Music...
28/10/2025
NEW YORK As AI usage continues to spike, a new study from IAB delves into an important aspect of how AI is transforming the advertising business with new data s...
28/10/2025
Charlie Jablonski, a broadcast tech pioneer who helped shape the modern era of Olympics television coverage, died Oct. 25 at his home in Lake George N.Y., the N...
28/10/2025
VIENNA, Austria Bitmovin has launched Bitmovin Observability, a new stand-alone video data solution that delivers real-time insights into video playback. The so...
28/10/2025
LOS ANGELES LucidLink, the file streaming platform, has announced a Frame.io integration and expanded mobile capabilities at Adobe Max....
28/10/2025
Mediagenix, a global leader in smart content solutions to profitably connect the right content to the right audience, today announced that it has joined the Ama...
28/10/2025
Lightware, an industry leader in connectivity and signal management solutions, has announced a major update to its Taurus
platform, which now delivers flawless...
28/10/2025
Following a successful mid-September International Broadcasting Convention in Amsterdam, Hiltron Communications will promote its full range of satellite communi...
28/10/2025
Open Broadcast Systems has chosen MC&S (Media Consulting & Services) as a reseller to help strengthen its presence in France.
With over twenty years of experi...
28/10/2025
Bitmovin, leading provider of video streaming solutions, has launched Bitmovin Observability, a new stand-alone video data solution that delivers real-time insi...
28/10/2025
Ease Live, the leader in interactive TV technology, today announced the successful launch of interactive graphical overlays for UEFA Champions League matches fo...
28/10/2025
LucidLink, the file streaming platform, today at Adobe MAX announced a Frame.io integration and expanded mobile capabilities, streamlining collaboration and hel...
28/10/2025
ATLANTA Gray Media has promoted Nick Hasenecz to general manager of WNDU, its NBC affiliate in the South Bend-Elkhart, Ind., market....
28/10/2025
Applications Open for Berklee Fenway Neighborhood Improvement Grant Boston nonprofits can apply by December 12 for funding to support community projects that ...
28/10/2025
Scripps Research awarded $4 million to advance platform for neurodevelopmental disorders The California Institute for Regenerative Medicine (CIRM) grant support...
27/10/2025
You can touch this: Haptics becoming central to the virtual live experience By Adrian Pennington
Friday, October 24, 2025 - 09:12
Print This Story
The vid...
27/10/2025
A tale of two trailers: France's Stop & Go doubles up with its new hybrid tr...
27/10/2025
Pro Padel League Stages City's Cup Finals Inside NYC's Hammerstein Ballr...
27/10/2025
World Series 2025: Sportsnet Delivers Made-in-Canada' Moment for a Nation U...
27/10/2025
ESPN Extends Partnership With Sony's Beyond Sports To Expand Animated Altern...
27/10/2025
Reid Davenport attends the 2025 Sundance Film Festival premiere of Life After at The Ray Theatre on January 27, 2025, in Park City, UT. (Photo by Robin Marsha...
27/10/2025
UPDATE: Both parties have reached a new carriage agreement....
27/10/2025
LONDON As Hollywood jumps into the production of mini-dramas, a new study from Ampere Analysis finds that more than one in 10 internet users have watched drama ...
27/10/2025
LONDON Yealink, a provider of unified communication and collaboration solutions, has joined the NDI ecosystem with the availability of its SmartVision 80 premiu...
27/10/2025
ATLANTA Gray Media has named Chris Conroy as general manager of its stations in Cleveland, leading WOIO, a CBS affiliate, The CW station WUAB and Telemundo outl...
27/10/2025
ESPOO, Finland European connectivity leaders Nokia and Ericsson, have partnered with Berlin's Fraunhofer Heinrich Hertz Institute (HHI), to shape and drive ...
27/10/2025
PHILADELPHIA Comcast has expanded NOW TV Latino, its Spanish-language live TV and streaming offering, adding five more channels from Univision, ViX Premium with...
27/10/2025
Test & measurement innovator, Leader Electronics, will present its latest products and solutions at InterBEE 2025 (Hall 5, Booth 5218) Makuhari Messe in Chiba, ...
27/10/2025
Back to All News
City of Shadows, the New Netflix Thriller Arrives on December 12
Entertainment
27 October 2025
GlobalSpain
Link copied to clipboard
Downl...
27/10/2025
Back to All News
City of Shadows, the new Netflix thriller arrives on December 12
Entertainment
27 October 2025
GlobalSpain
Link copied to clipboard
Downl...
26/10/2025
Obituary: Charlie Jablonski, Industry Leader and Long-Time NBC and NBC Olympics ...
26/10/2025
Back to All News
Hiroshi Abe Joins All-Star Cast in Newly Revealed Main Trailer...
26/10/2025
Back to All News
Love Is Blind: Italy Coming to Netflix on December 1st
Entertainment
26 October 2025
GlobalItaly
Link copied to clipboard
THE ADAPTATION...
26/10/2025
This year's ROSCon conference heads to Singapore, bringing together the global robotics developer community behind Robot Operating System (ROS) - the world&...
26/10/2025
Back to All News
LOVE IS BLIND ITALY coming to Netflix on December 1st
Entertainment
26 October 2025
GlobalItaly
Link copied to clipboard
THE ADAPTATION ...
25/10/2025
LOS ANGELES As the popularity of short-for vertical videos from mobile devices continues to soar, vgames, Pitango and a group of Hollywood executives and celebr...
25/10/2025
LONDON The AI-powered VFX toolkit Slapshot has launched a professional-grade AI camera tracking tool the company said is designed to deliver precise camera sol...
25/10/2025
NEW YORK NAB Show New York said its 2025 edition wrapped up its program on Oct. 23 with 11,500 registered attendees from 95 countries, reinforcing its status as...
25/10/2025
NEW YORK Vimeo said it rolled out new AI-powered features and creative tools that it said will make professional video production faster, smarter and more rewar...
25/10/2025
HOUSTON Regional sports network Space City Home Network has upgraded its audio control room with a Solid State Logic System T S300-32 audio console as part of t...
25/10/2025
BAFTA-nominated cinematographer Annemarie Lean-Vercoe ( Breeders , Call the Midwife , Murder in Provence ) was just the DoP to set the look on sophisticated a...