Sony Pixel Power calrec Sony

KServe Providers Dish Up NIMble Inference in Clouds and Data Centers


Deploying generative AI in the enterprise is about to get easier than ever.

NVIDIA NIM, a set of generative AI inference microservices, works with KServe, open-source software that automates putting AI models to work at the scale of a cloud computing application.

The combination ensures generative AI can be deployed like any other large enterprise application. It also makes NIM widely available through platforms from dozens of companies, such as Canonical, Nutanix and Red Hat.

The integration of NIM on KServe extends NVIDIA's technologies to the open-source community, ecosystem partners and customers. Through NIM, they can all access the performance, support and security of the NVIDIA AI Enterprise software platform with an API call - the push-button of modern programming.

Serving AI on Kubernetes KServe got its start as part of Kubeflow, a machine learning toolkit based on Kubernetes, the open-source system for deploying and managing software containers that hold all the components of large distributed applications.

As Kubeflow expanded its work on AI inference, what became KServe was born and ultimately evolved into its own open-source project.

Many companies have contributed to and adopted the KServe software that runs today at companies including AWS, Bloomberg, Canonical, Cisco, Hewlett Packard Enterprise, IBM, Red Hat, Zillow and NVIDIA.

Under the Hood With KServe KServe is essentially an extension of Kubernetes that runs AI inference like a powerful cloud application. It uses a standard protocol, runs with optimized performance and supports PyTorch, Scikit-learn, TensorFlow and XGBoost without users needing to know the details of those AI frameworks.

The software is especially useful these days, when new large language models (LLMs) are emerging rapidly.

KServe lets users easily go back and forth from one model to another, testing which one best suits their needs. And when an updated version of a model gets released, a KServe feature called canary rollouts automates the job of carefully validating and gradually deploying it into production.

Another feature, GPU autoscaling, efficiently manages how models are deployed as demand for a service ebbs and flows, so customers and service providers have the best possible experience.

An API Call to Generative AI The goodness of KServe is now available with the ease of NVIDIA NIM.

With NIM, a simple API call takes care of all the complexities. Enterprise IT admins get the metrics they need to ensure their application is running with optimal performance and efficiency, whether it's in their data center or on a remote cloud service - even if they change the AI models they're using.

NIM lets IT professionals become generative AI pros, transforming their company's operations. That's why a host of enterprises such as Foxconn and ServiceNow are deploying NIM microservices.

NIM Rides Dozens of Kubernetes Platforms Thanks to its integration with KServe, users will be able access NIM on dozens of enterprise platforms such as Canonical's Charmed KubeFlow and Charmed Kubernetes, Nutanix GPT-in-a-Box 2.0, Red Hat's OpenShift AI and many others.

Red Hat has been working with NVIDIA to make it easier than ever for enterprises to deploy AI using open source technologies, said KServe contributor Yuan Tang, a principal software engineer at Red Hat. By enhancing KServe and adding support for NIM in Red Hat OpenShift AI, we're able to provide streamlined access to NVIDIA's generative AI platform for Red Hat customers.

Through the integration of NVIDIA NIM inference microservices with Nutanix GPT-in-a-Box 2.0, customers will be able to build scalable, secure, high-performance generative AI applications in a consistent way, from the cloud to the edge, said the vice president of engineering at Nutanix, Debojyoti Dutta, whose team contributes to KServe and Kubeflow.

As a company that also contributes significantly to KServe, we're pleased to offer NIM through Charmed Kubernetes and Charmed Kubeflow, said Andreea Munteanu, MLOps product manager at Canonical. Users will be able to access the full power of generative AI, with the highest performance, efficiency and ease thanks to the combination of our efforts.

Dozens of other software providers can feel the benefits of NIM simply because they include KServe in their offerings.

Serving the Open-Source Community NVIDIA has a long track record on the KServe project. As noted in a recent technical blog, KServe's Open Inference Protocol is used in NVIDIA Triton Inference Server, which helps users run many AI models simultaneously across many GPUs, frameworks and operating modes.

With KServe, NVIDIA focuses on use cases that involve running one AI model at a time across many GPUs.

As part of the NIM integration, NVIDIA plans to be an active contributor to KServe, building on its portfolio of contributions to open-source software that includes Triton and TensorRT-LLM. NVIDIA is also an active member of the Cloud Native Computing Foundation, which supports open-source code for generative AI and other projects.

Try the NIM API on the NVIDIA API Catalog using the Llama 3 8B or Llama 3 70B LLM models today. Hundreds of NVIDIA partners worldwide are using NIM to deploy generative AI.

Watch NVIDIA founder and CEO Jensen Huang's COMPUTEX keynote to get the latest on AI and more.
See more stories from nvidia

More from Nvidia


Mile-High AI: NVIDIA Research to Present Advancements in Simulation and Gen AI at SIGGRAPH

NVIDIA is taking an array of advancements in rendering, simulation and generativ...


Once Human,' Twice the Thrills on GeForce NOW

Unlock new experiences every GFN Thursday. Whether post-apocalyptic survival adventures, narrative-driven games or vast, open worlds, GeForce NOW always has som...


Japan Enhances AI Sovereignty With Advanced ABCI 3.0 Supercomputer

Enhancing Japan's AI sovereignty and strengthening its research and development capabilities, Japan's National Institute of Advanced Industrial Science ...


Mission NIMpossible: Decoding the Microservices That Accelerate Generative AI

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible and showcases new hardware, softwar...


Paige Cofounder Thomas Fuchs' Diagnosis on Improving Cancer Patient Outcomes With AI

Improved cancer diagnostics - and improved patient outcomes - could be among the...


Widescreen Wonder: Las Vegas Sphere Delivers Dazzling Displays

Sphere, a new kind of entertainment medium in Las Vegas, is joining the ranks of legendary circular performance spaces such as the Roman Colosseum and Shakespea...


In It for the Long Haul: Waabi Pioneers Generative AI to Unleash Fully Driverless Autonomous Trucking

Artificial intelligence is transforming the transportation industry, helping dri...


GeForce NOW Beats the Heat With 22 New Games in July

GeForce NOW is bringing 22 new games to members this month. Dive into the four titles available to stream on the cloud gaming service this week to stay cool an...


Decoding How the Generative AI Revolution BeGAN

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...


How an NVIDIA Engineer Unplugs to Recharge During Free Days

On a weekday afternoon, Ashwini Ashtankar sat on the bank of the Doodhpathri River, in a valley nestled in the Himalayas. Taking a deep breath, she noticed that...


Into the Omniverse: SyncTwin Helps Democratize Industrial Digital Twins With Generative AI, OpenUSD

Editor's note: This post is part of Into the Omniverse, a series focused on ...


GeForce NOW Unleashes High-Stakes Horror With Resident Evil Village'

Get ready to feel some chills, even amid the summer heat. Capcom's award-winning Resident Evil Village brings a touch of horror to the cloud this GFN Thursd...


Cut the Noise: NVIDIA Broadcast Supercharges Livestreaming, Remote Work

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...


Thinking Outside the Blox: How Roblox Is Using Generative AI to Enhance User Experiences

Roblox is a colorful online platform that aims to reimagine the way that people ...


EvolutionaryScale Debuts With ESM3 Generative AI Model for Protein Design

Generative AI has revolutionized software development with prompt-based code generation - protein design is next. EvolutionaryScale today announced the release...


Why 3D Visualization Holds Key to Future Chip Designs

Multi-die chips, known as three-dimensional integrated circuits, or 3D-ICs, represent a revolutionary step in semiconductor design. The chips are vertically sta...


Crack the Case With Tell Me Why' and As Dusk Falls' on GeForce NOW

Sit back and settle in for some epic storytelling. Tell Me Why and As Dusk Falls - award-winning, narrative-driven games from Xbox Studios - add to the 1,900+ g...


Decoding How NVIDIA AI Workbench Powers App Development

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible and showcases new hardware, softwar...


Light Bulb Moment: NVIDIA CEO Sees Bright Future for AI-Powered Electric Grid

The electric grid and the utilities managing it have an important role to play in the next industrial revolution that's being driven by AI and accelerated c...


NVIDIA Advances Physical AI at CVPR With Largest Indoor Synthetic Dataset

NVIDIA contributed the largest ever indoor synthetic dataset to the Computer Vision and Pattern Recognition (CVPR) conference's annual AI City Challenge - h...


NVIDIA Research Wins CVPR Autonomous Grand Challenge for End-to-End Driving

Making moves to accelerate self-driving car development, NVIDIA was today named an Autonomous Grand Challenge winner at the Computer Vision and Pattern Recognit...


Seamless in Seattle: NVIDIA Research Showcases Advancements in Visual Generative AI at CVPR

NVIDIA researchers are at the forefront of the rapidly advancing field of visual...


Believe in Something Unconventional, Something Unexplored,' NVIDIA CEO Tells Caltech Grads

NVIDIA founder and CEO Jensen Huang on Friday encouraged Caltech graduates to pu...


The Proudest Refugee': How Veronica Miller Charts Her Own Path at NVIDIA

When she was five years old, Veronica Miller (n e Teklai) and her family left their homeland of Eritrea, in the Horn of Africa, to escape an ongoing war with Et...


NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

NVIDIA today announced Nemotron-4 340B, a family of open models that developers ...


Cloud Ahoy! Treasure Awaits With Sea of Thieves' on GeForce NOW

Set sail for adventure, pirates. Sea of Thieves makes waves in the cloud this week. It's an adventure-filled GFN Thursday with four new games joining the Ge...


Every Company's Data is Their Gold Mine,' NVIDIA CEO Says at Databricks Data + AI Summit

Accelerated computing is transforming data processing and analytics for enterpri...


Scaling to New Heights: NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

The full-stack NVIDIA accelerated computing platform has once again demonstrated...


Nerding About NeRFs: How Neural Radiance Fields Transform 2D Images Into Hyperrealistic 3D Models

Let's talk about NeRFs - no, not the neon-colored foam dart blasters, but ne...


TOPS of the Class: Decoding AI Performance on RTX AI PCs and Workstations

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...


Why Accelerated Data Processing Is Crucial for AI Innovation in Every Industry

Across industries, AI is supercharging innovation with machine-powered computation. In finance, bankers are using AI to detect fraud more quickly and keep accou...


Here Comes a New Challenger: Street Fighter 6' Joins GeForce NOW

Capcom's latest entry in the iconic Street Fighter series, Street Fighter 6, punches its way into the cloud this GFN Thursday. The game, along with Ubisoft&...


Yotta CEO Sunil Gupta on Supercharging India's Fast-Growing AI Market

India's AI market is expected to be massive. Yotta Data Services is setting its sights on supercharging it. In this episode of NVIDIA's AI Podcast, Suni...


Creativity Accelerated: New RTX-Powered AI Hardware and Software Announced at COMPUTEX

NVIDIA launched NVIDIA Studio at COMPUTEX in 2019. Five years and more than 500 ...


SAP and NVIDIA Create AI for The Most Valuable Language,' CEOs Unveil at Sapphire Orlando

German enterprise cloud leader SAP is harnessing generative AI and industrial di...


NVIDIA and Cisco Weave Fabric for Generative AI

Building and deploying AI applications at scale requires a new class of computing infrastructure - one that can handle the massive amounts of data, compute powe...


Digital Bank Debunks Financial Fraud With Generative AI

European neobank bunq is debunking financial fraudsters with the help of NVIDIA accelerated computing and AI. Dubbed the bank of the free, bunq offers online...


Foxconn Trains Robots, Streamlines Assembly With NVIDIA AI and Omniverse

Foxconn operates more than 170 factories around the world - the latest one a virtual plant pushing the state of the art in industrial automation. It's the ...


Taiwan Electronics Giants Drive Industrial Automation With NVIDIA Metropolis and NIM

Taiwan's leading consumer electronics giants are making advances with AI aut...


KServe Providers Dish Up NIMble Inference in Clouds and Data Centers

Deploying generative AI in the enterprise is about to get easier than ever. NVIDIA NIM, a set of generative AI inference microservices, works with KServe, open...


Accelerate Everything,' NVIDIA CEO Says Ahead of COMPUTEX

Generative AI is reshaping industries and opening new opportunities for innovation and growth, NVIDIA founder and CEO Jensen Huang said in an address ahead of ...


Power Tool: Generative AI Tracks Typhoons, Tames Energy Use

Weather forecasters in Taiwan had their hair blown back when they saw a typhoon up close, created on a computer that slashed the time and energy needed for the ...


NVIDIA Grace Hopper Superchip Accelerates Murex MX.3 Analytics Performance, Reduces Power Consumption

After the 2008 financial crisis and increased risk-management regulations that f...


Elevate Your Expertise: NVIDIA Introduces AI Infrastructure and Operations Training and Certification

NVIDIA has introduced a self-paced course, called AI Infrastructure and Operatio...


GeForce NOW Brings the Heat With World of Warcraft'

World of Warcraft comes to the cloud this week, part of the 17 games joining the GeForce NOW library, with seven available to stream this week. Plus, it's ...


Riding the Wayve of AV 2.0, Driven by Generative AI

Generative AI is propelling AV 2.0, a new era in autonomous vehicle technology characterized by large, unified, end-to-end AI models capable of managing various...


Tidy Tech: How Two Stanford Students Are Building Robots for Handling Household Chores

Imagine having a robot that could help you clean up after a party - or fold heap...


Decoding How NVIDIA RTX AI PCs and Workstations Tap the Cloud to Supercharge Generative AI

Editor's note: This post is part of the AI Decoded series, which demystifies...


NVIDIA Scoops Up Wins at COMPUTEX Best Choice Awards

Building on more than a dozen years of stacking wins at the COMPUTEX trade show's annual Best Choice Awards, NVIDIA was today honored with BCAs for its late...


Senua's Story Continues: GeForce NOW Brings Senua's Saga: Hellblade II' to the Cloud

Every week, GFN Thursday brings new games to the cloud, featuring some of the la...