
Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform - a full stack comprising world-class silicon, systems and software - is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.
NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.
AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.
But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system - and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.
Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.
Cost-Effective User Throughput Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.
NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:
NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure - cloud, data centers, edge or workstations.
NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.
NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.
Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.
With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.
Cloud-Based LLM Inference To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:
Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service
Google Cloud's Vertex AI, Google Kubernetes Engine
Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service
Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine
Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.
For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.
Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.
The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.
From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.
Serving 400 Million Search Queries Monthly With Perplexity AI Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.
Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.
Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accurac
Most recent headlines
18/12/2025
SVG Campus Shot Callers: Kurt Sutton, Director of Broadcast Operations, Clemson ...
18/12/2025
Follow the Money Episode 2: Inside the Sports Media Biz with Sam McCleery and St...
18/12/2025
SVG Sit-Down: Google Cloud's Anshul Kapoor on the Future of Generative Prod...
18/12/2025
The 2025 SVG Summit Draws Record Crowd for 20th-Annual Sports-Production Industr...
18/12/2025
SBS's sports schedule sizzles in January with Dakar Rally, Kooyong Classic a...
18/12/2025
Canada's largest indoor arena has transformed its live production capabilities with a full ST 2110 infrastructure and Calrec's compact Argo S console. S...
18/12/2025
During November, streaming's share of TV viewing in Mexico settled at 24.2%, an increase of 0.5 share points from the previous month.
Disclaimer: YUMI TV,...
18/12/2025
November continued the upward trend in television viewership. The significantly colder weather and a rich programming lineup encouraged viewers to spend more ti...
18/12/2025
As viewers turn to sports highlights, recaps and documentary programming, expand...
18/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
18/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
18/12/2025
The HELM, a global expert in cinematic live broadcast and high-end production workflows, has entered a strategic partnership with ARRI, the renowned designer an...
18/12/2025
Cadena Melod a de Colombia (Cadena Melod a), a long-established Colombian radio network, has chosen DHD audio SX2 production consoles for integration into the m...
18/12/2025
Harmonic (NASDAQ: HLIT) today announced that Czech Television (Czech TV), the public broadcaster of the Czech Republic, has teamed up with Harmonic to modernize...
18/12/2025
Broadcast Solutions Group, a leading system integrator and provider of innovative solutions for the broadcast and media industry, has announced the acquisition ...
18/12/2025
Keepit, the SaaS data protection company, announced today that it has been named a Leader in the IDC MarketScape: Worldwide SaaS Data Protection 2025-2026 Vendo...
18/12/2025
Limecraft today announced the release of Limecraft 2025.8, the eighth and final major platform update of the year. This release strengthens daily workflows acro...
18/12/2025
DigitalGlue is very grateful, especially at this time of the year, that its creative.space platform has expanded its footprint within the House of Worship marke...
18/12/2025
TAG Video Systems is proud to share that the company has recently received multiple industry recognitions across the Asia-Pacific region, reflecting its ongoing...
18/12/2025
NDI, the leading video connectivity standard for AV-over-IP, and Zoom, the AI-first collaboration platform, announce a strategic collaboration to integrate the ...
18/12/2025
Leading video software provider, Synamedia, today announced that it is extending its long-standing relationship with YES, the pay-TV subsidiary of the largest I...
18/12/2025
Riedel Communications today announced it provided a fully integrated communications and commentary solution for the 15th National Games of China, supporting 56 ...
18/12/2025
When both the Toledo Walleye and Toledo Mud Hens play at home on the same night, communication between their respective production teams is essential. To stream...
18/12/2025
TMT Insights' new upstream media supply chain platform, Focus, was selected as a winner in the 2025 Media & Entertainment: Best in Market Awards in the TV T...
18/12/2025
Clear-Com is proud to announce its continued role as the official intercom supplier for the Yamaha Grand Plaza Stage at The 2026 NAMM Show, taking place Januar...
18/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
18/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
18/12/2025
Share Share by:
Copy link
Facebook
X
Whatsapp
Pinterest
Flipboard...
18/12/2025
Long-term agreement includes the SES SCORE platform and hybrid distribution worldwide to deliver more than 5,000 hours of golf tournaments annually featuring th...
18/12/2025
Talk formats require careful clock management and system tools to ensure audio content aligns as intended. WO Automation for Radio's Segment Rulesets provid...
18/12/2025
By Toni Coonce, CEO, WideOrbit As 2025 comes to a close, I find myself reflecting on how much WideOrbit has evolved, not only in products and solutions but also...
18/12/2025
18 Dec 2025
VEON Upgraded to Nasdaq Global Select Market, Enhancing Investor Visibility Dubai, December 18, 2025 - VEON Ltd. (Nasdaq: VEON), a global digital o...
18/12/2025
December 18th, 2025
Tribeca X Launches Inaugural Advisory Council, Teases 202...
18/12/2025
December 18th, 2025
As Tribeca Celebrates Its 25th Anniversary, Festival Expa...
18/12/2025
Thursday 18 December 2025
Sky Sports remains the exclusive home of the Masters ...
18/12/2025
Back to All News
Teaser for Can This Love Be Translated' Previews a Heartw...
18/12/2025
Using the additive process of 3D printing, layer after layer gets printed until an object is as close to the final shape needed as possible. Historically, machi...
18/12/2025
In 2025, RT proudly supported 185 arts and cultural events across the island of Ireland, reflecting significant growth since the scheme was re-launched in 2014...
18/12/2025
RT Sports Awards 2025 live on RT One and RT Player at 8:05pm on Saturday 20 December
On Saturday 20 December live on RT One and RT Player at the earlier t...
18/12/2025
RT lyric fm presents a very special Winter Solstice edition of Ambient Orbit, l...
18/12/2025
Top-notch options for AI at the desktops of developers, engineers and designers ...
18/12/2025
At 7.45pm on 1st January 1926, the precursor to RT , then 2RN, delivered the fledgling new Irish state's first public radio transmission. From those first c...
18/12/2025
Step out of the vault and into the future of gaming with Fallout: New Vegas streaming on GeForce NOW, just in time to celebrate the newest season of the hit Ama...
18/12/2025
December 18 2025, 05:30 (PST) The Movie Experience SLO Becomes First U.S. Exhib...
17/12/2025
Investigative journalists across the Western Balkans and T rkiye continue to con...
17/12/2025
Sports Broadcasting Hall of Fame Inducts 10 Industry Icons During Unforgettable ...
17/12/2025
ESPN to Debut MNF Playbook with Next Gen Stats, a New AI-Driven NFL Data-AltCastThe series, powered by Adrenaline TruPlay AI, launches Dec. 22 and runs through ...
17/12/2025
Inaugural Optum Golf Channel Games Debut Under the Lights' in Primetime on ...
17/12/2025
The right playlist is essential on New Year's Eve, building the energy as you get ready and keeping it high as you count down to midnight. This year, Spotif...
17/12/2025
eds3_5_jq(document).ready(function($) { $(#eds_sliderM519).chameleonSlider_2_1({...