
Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.
NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.
AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.
But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.
Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.
Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.
NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:
NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.
NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.
NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.
Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.
With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.
Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:
Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service
Google Cloud's Vertex AI, Google Kubernetes Engine
Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service
Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine
Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.
For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.
Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.
The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.
From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.
Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.
Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.
Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
Most recent headlines
06/10/2025
France T l visions, France's leading broadcaster, has received the 2025 EBU ...
04/09/2025
Monumental Sports & Entertainment (MSE), in collaboration with Dalet, has been a...
07/08/2025
July 8 2025, 22:30 (PDT) Tata Motors & Dolby Bring Dolby Atmos to Harrier.ev, R...
09/07/2025
WASHINGTON The NAB Leadership Foundation (NABLF) has announced the graduation of the 2025 Broadcast Leadership Training (BLT) class. The classs successful compl...
09/07/2025
Faced with rising costs, the proliferation digital services and the difficulty of managing many different subscriptions, two-thirds (66%) of consumers say they ...
09/07/2025
Daniel S. McCoy, a Production Sound Mixer and Sound Consultant for cinema, commercial and live events, has utilized DPA Microphones for over 25 years. In the ea...
09/07/2025
- A Celebration of Optical Excellence Uniting the Lens World's Leading Optical Minds -
CVP, one of Europe's leading resellers and providers of profe...
08/07/2025
In an era where digital threats to journalists are becoming increasingly aggressive and widespread, MediaSafe Africa launches as a vital online resource to help...
08/07/2025
One of the most exciting things about the Sundance Film Festival is having a fro...
08/07/2025
This summer, as the energy builds for one of the biggest moments in women's ...
08/07/2025
SBS strengthens commitment to inclusion with new commissioning guidelines
8 July, 2025
Media releases
SBS has launched new Commissioning Inclusion Guidelin...
08/07/2025
Secrets, Spies and Sabotage: 40 years on, SBS podcast investigates historic bomb...
08/07/2025
ATLANTA and CINCINNATI Gray Media and The E.W. Scripps Co. have agreed to swap television stations across five midsized and small markets, resulting in the crea...
08/07/2025
ALAMEDA, Calif. Clear-Com has announced new software updates to its Arcadia Central Station that enable connectivity across Clear-Com products as well as third-...
08/07/2025
PRINCETON, N.J. Triveni Digital has launched its NextGen TV Innovation Partnership Program, an initiative to support public broadcasters in their transition to ...
08/07/2025
WASHINGTON In response to a request by LPTV broadcaster HC2 that the Federal Communications Commission allow its stations to become datacasters using 5G broadca...
08/07/2025
As TV broadcasters shift from wary tire-kicking to increasingly enthusiastic adoption of AI technology, new ways of doing things are taking hold at lightning sp...
08/07/2025
Content personalization has been a hot topic ever since the emergence of niche cable networks and the VCR in the 1980s allowed consumers to access content geare...
08/07/2025
Broadpeak, a leader in streaming and monetization at scale, announces that major French media group RMC BFM has selected Broadpeak's server-side ad insertio...
08/07/2025
MBS, a leading service integrator and managed gateway operator, has announced an agreement with SES to acquire a part of its media services in Germany and the U...
08/07/2025
WASHINGTON Federal Communications Commission Chair Brendan Carr is applauding the passage of President Donald Trump's One Big Beautiful Bill and provision...
08/07/2025
SYDNEY RTS has announced that Channel 7 Sydney has deployed a new comms system from the intercom vendor as part of a larger ST-2110 upgrade....
08/07/2025
ATLANTA & CINCINNATI Gray Media, Inc. and The E.W. Scripps Company have entered into agreements to swap television stations across five mid-sized and small mark...
08/07/2025
UKTV today announces that Richard Watsham, Chief Creative Officer at UKTV and Global Director of Acquisitions for BBC Studios/UKTV, has decided to step down fro...
08/07/2025
Groundbreaking Gamecocks: Inside University of South Carolina's Four-Control...
08/07/2025
Monumental Sports & Entertainment's Samantha Brady on the Power of the RSN...
08/07/2025
Live From 2025 FIFA Club World Cup: FIFA, HBS Focus on Tournament Wrap, With an ...
08/07/2025
SVG Rewind: NESN Goes Behind the Scenes of 1975 Red Sox Retro Broadcast NESN is following up the retrocast with a week-long celebration of the Green Monster By...
08/07/2025
Back to All News
The Monster of Florence: The Four-Episode TV Series Directed b...
08/07/2025
Back to All News
Shooting of 53 Sundays, Cesc Gays New Project, Is Finished
Entertainment
08 July 2025
GlobalSpain
Link copied to clipboard
Download the f...
08/07/2025
SAN JOSE, Calif. - July 8, 2025 - Harmonic (NASDAQ: HLIT) today announced a reco...
08/07/2025
Arvato Systems Optimizes Customer Communication with AOK Niedersachsen
New project in the statutory health insurance sector
Arvato Systems and AOK Niedersach...
08/07/2025
On August 7, Apple Arcade is adding four exclusive games to its diverse catalogue of more than 200 fun games for players to enjoy, all free from ads and in-app ...
08/07/2025
Modern AI applications increasingly rely on models that combine huge parameter c...
07/07/2025
SBS Farewells Chair George Savvides AM
7 July, 2025
Media releases
Born of Greek Cypriot parents, Mr Savvides joined the SBS Board as Deputy Chair in Febru...
07/07/2025
Advanced vision solutions deliver more than reliable night vision for tactical-edge warfighters. Today's technological advances can provide the capability f...
07/07/2025
BALTIMORE Sinclair has named Narinder Sahai its executive vice president and chief financial officer, effective immediately....
07/07/2025
Riedel Communications today announced the launch of RefSuite, a powerful new Managed Technology solution tailored to professional sports workflows. RefSuite com...
07/07/2025
Regional base will foster client engagement, partner collaboration, and executive touchpoints
Skandha Media Services, a leading provider of cloud-native media ...
07/07/2025
Advanced Systems Group, LLC (ASG), a technology and services provider for media creatives and content owners, announced its Against All Odds - ASG Builds Cutti...
07/07/2025
Agreement secured on the strength of Skandha's proven track record in delivering reliable managed services across the full content life cycle for existing h...
07/07/2025
** MEDIA ALERT **
VIZ Media Ignites AX 2025 with Star-Studded Panels, Exclus...
07/07/2025
Back to All News
The Long-Awaited Trailer is Here: Alice in Borderland' Se...
07/07/2025
Back to All News
Anime for Every Fan: Fueling a New Era of Global Storytelling
Entertainment
07 July 2025
GlobalUnited StatesJapan
Link copied to clipboard...
07/07/2025
1-2 Punch: The Cloud, Generative AI, and the Future of Sports-Content Distributi...
07/07/2025
Tests and decisions: Formula E host broadcaster Aurora on launching new AR graph...
07/07/2025
Building tension: Using new AR graphics for storytelling at the Formula E penult...
07/07/2025
Altitude Sports' Matt Krol on the Success of the RSN's In-Arena Postgame...
07/07/2025
SVG Rewind: Four NFL Clubs Bring Us on Set of Respective Media Day Shoots An in-depth conversation with 49ers Laura Johnson, Ravens Jay O'Brien, Las Vegas R...
07/07/2025
Cosm To Build Fifth Shared Reality Venue in Cleveland as Part of Bedrock's ...