Sony Pixel Power calrec Sony

Mission NIMpossible: Decoding the Microservices That Accelerate Generative AI

10/07/2024

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible and showcases new hardware, software, tools and accelerations for NVIDIA RTX PC and workstation users.

In the rapidly evolving world of artificial intelligence, generative AI is captivating imaginations and transforming industries. Behind the scenes, an unsung hero is making it all possible: microservices architecture.

The Building Blocks of Modern AI Applications Microservices have emerged as a powerful architecture, fundamentally changing how people design, build and deploy software.

A microservices architecture breaks down an application into a collection of loosely coupled, independently deployable services. Each service is responsible for a specific capability and communicates with other services through well-defined application programming interfaces, or APIs. This modular approach stands in stark contrast to traditional all-in-one architectures, in which all functionality is bundled into a single, tightly integrated application.

By decoupling services, teams can work on different components simultaneously, accelerating development processes and allowing updates to be rolled out independently without affecting the entire application. Developers can focus on building and improving specific services, leading to better code quality and faster problem resolution. Such specialization allows developers to become experts in their particular domain.

Services can be scaled independently based on demand, optimizing resource utilization and improving overall system performance. In addition, different services can use different technologies, allowing developers to choose the best tools for each specific task.

A Perfect Match: Microservices and Generative AI The microservices architecture is particularly well-suited for developing generative AI applications due to its scalability, enhanced modularity and flexibility.

AI models, especially large language models, require significant computational resources. Microservices allow for efficient scaling of these resource-intensive components without affecting the entire system.

Generative AI applications often involve multiple steps, such as data preprocessing, model inference and post-processing. Microservices enable each step to be developed, optimized and scaled independently. Plus, as AI models and techniques evolve rapidly, a microservices architecture allows for easier integration of new models as well as the replacement of existing ones without disrupting the entire application.

NVIDIA NIM: Simplifying Generative AI Deployment As the demand for AI-powered applications grows, developers face challenges in efficiently deploying and managing AI models.

NVIDIA NIM inference microservices provide models as optimized containers to deploy in the cloud, data centers, workstations, desktops and laptops. Each NIM container includes the pretrained AI models and all the necessary runtime components, making it simple to integrate AI capabilities into applications.

NIM offers a game-changing approach for application developers looking to incorporate AI functionality by providing simplified integration, production-readiness and flexibility. Developers can focus on building their applications without worrying about the complexities of data preparation, model training or customization, as NIM inference microservices are optimized for performance, come with runtime optimizations and support industry-standard APIs.

AI at Your Fingertips: NVIDIA NIM on Workstations and PCs Building enterprise generative AI applications comes with many challenges. While cloud-hosted model APIs can help developers get started, issues related to data privacy, security, model response latency, accuracy, API costs and scaling often hinder the path to production.

Workstations with NIM provide developers with secure access to a broad range of models and performance-optimized inference microservices.

By avoiding the latency, cost and compliance concerns associated with cloud-hosted APIs as well as the complexities of model deployment, developers can focus on application development. This accelerates the delivery of production-ready generative AI applications - enabling seamless, automatic scale out with performance optimization in data centers and the cloud.

The recently announced general availability of the Meta Llama 3 8B model as a NIM, which can run locally on RTX systems, brings state-of-the-art language model capabilities to individual developers, enabling local testing and experimentation without the need for cloud resources. With NIM running locally, developers can create sophisticated retrieval-augmented generation (RAG) projects right on their workstations.

Local RAG refers to implementing RAG systems entirely on local hardware, without relying on cloud-based services or external APIs.

Developers can use the Llama 3 8B NIM on workstations with one or more NVIDIA RTX 6000 Ada Generation GPUs or on NVIDIA RTX systems to build end-to-end RAG systems entirely on local hardware. This setup allows developers to tap the full power of Llama 3 8B, ensuring high performance and low latency.

By running the entire RAG pipeline locally, developers can maintain complete control over their data, ensuring privacy and security. This approach is particularly helpful for developers building applications that require real-time responses and high accuracy, such as customer-support chatbots, personalized content-generation tools and interactive virtual assistants.

Hybrid RAG combines local and cloud-based resources to optimize performance and flexibility in AI applications. With NVIDIA AI Workbench, developers can get started with the hybrid-RAG Workbench Project - an example application that can be used to run vector databases and embedding models locally whil
LINK: https://blogs.nvidia.com/blog/ai-decoded-nim/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

02/05/2026

Dalet Flex LTS Delivers Smarter Search, Faster Editing, and an AI-Ready Foundation for Modern Media

Dalet, a leading technology and service provider for media-rich organizations, t...

01/05/2026

NBCUniversal's Peacock to Be First Streamer to Integrate Dolby's Full Suite of Premium Picture and Sound Innovations

January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...

23/04/2026

NAB Honors Rob Lowe and John Tesh With Hall of Fame Induction

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

Roku, Samsung Dominate CTV Platform Market in U.S.

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

G&D and VuWall Strengthen International Sales Team

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

The 2026 NAB Show Reports More than 58,000 Attendees

Share Copy link Facebook X Linkedin Bluesky Email...

23/04/2026

SmallHD Monitor Overlay License for Hi-5 and Hi-5 SX deli...

Partnership between ARRI and SmallHD brings new Hi-5 license Configurable monitor overlays adapt to individual working styles Supported by SmallHD monitors ru...

23/04/2026

Jeff Cronenweth ASC Sheds Light on Tron Ares with Astera

Lighting Master Cronenweth ASC brings a unique look to each grid world with the help of Astera Jeff Cronenweth on the set of Disney's TRON: ARES. Photo by...

23/04/2026

ZEISS Supreme Primes Shine in Star-Driven Short Dr Sam

DP Chloe Smolkin ( The Late Show, Kidz Bop ) joins director Danielle Beckmann and writer/actor Raji Ahsan behind the camera for the heartfelt short comedy Dr...

23/04/2026

Apply now to join the 2026 Producer Delegation to TIFF: The Market

Apply now to join the 2026 Producer Delegation to TIFF: The Market 23 April 2026 Screen Australia, in partnership with Ontario Creates, has opened application...

22/04/2026

Live From NAB 2026: Solid State Logics Berny Carpenter on Expanding System T With Virtual DSP, Cloud Workflows

Solid State Logic is advancing its System T platform with a stronger focus on IP...

22/04/2026

Live From NAB 2026: Dolbys Giles Baker on the Growth of Dolby OptiView, Immersive Vision and Audio for Live Sports

From immersive audio to live streaming, Dolby Laboratories is focused on the fut...

22/04/2026

Live From NAB 2026: Blackmagic Design's Bob Caniglia on Implementing Cinematic Looks in Live Broadcasts

Shallow depth-of-field cameras have taken the industry by storm. Its debut a han...

22/04/2026

NAB 2026: Eastern Kentucky University deploys campus-wide ST 2110 network with Riedel and Bridge Digital

Riedel Communications (Booth C4908) announced that Eastern Kentucky University (...

22/04/2026

SportsTechBuzz at NAB 2026, Day 4: Live Reports From the Show Floor in Vegas

The NAB Show is in full swing, and the SVG and SVG Europe editorial teams are chasing down the hottest stories from all over the Las Vegas Convention Center. He...

22/04/2026

NAB 2026: Blackmagic Design Announces URSA Cine 12K LF 100G

Blackmagic Design has announced the URSA Cine 12K LF 100G, a new model in the URSA Cine family adding 100G Ethernet for SMPTE 2110 live production output up to ...

22/04/2026

Live From NAB 2026: NEPs Martin Stewart Talks 40 Years, the NEP Platform, and Scaling for FIFA World Cup

Celebrating its 40th anniversary, NEP is leaning into hybrid production with the...

22/04/2026

Live From NAB 2026: NEPs Dan Murphy on NEP Platform, TFC, and the Shift to Software-Defined Workflows

NEP VP, Platform Dan Murphy sits down at the 2026 NAB Show to unpack what NEP P...

22/04/2026

Spotify and WNBA's New York Liberty Bring Basketball and Music Together With New Partnership

Spotify and the New York Liberty are teaming up to give music and basketball fan...

22/04/2026

The story of the Focusrite ISA preamp

New 20-minute documentary explores iconic design The Focusrite Room in Mesa, Arizona, where John Aquilino hosts the Studio Console 005. In 2025, Focusrite co...

22/04/2026

EverSync SP-10 wireless from Cloudvocal

Offers compact wireless solution for pedalboards Taiwanese audio brand Cloudvocal have announced the availability of a new pedalboard-friendly wireless syst...

22/04/2026

Arturia release Augmented Persia

Latest hybrid sampling/synthesis instrument arrives Arturia's Augmented series offerings rely on a mixture of sampling and synthesis, allowing users to ...

22/04/2026

Acustica Audio launch Salt 2

Combines three distinct analogue EQ emulations The latest addition to Acustica Audio's ever-expanding collection of analogue-emulation plug-ins combines...

22/04/2026

Analog Empire: Bass & Lead from Melda Production

Final instalment in vintage-inspired instrument series Analog Empire: Bass & Lead marks the final instalment in Melda Production's vintage hardware-insp...

22/04/2026

Strymon reveal the Canoga

Fuzz pedal joins all-analogue Series A line Given that Strymons reputation was built on unapologetically digital pedals, it was a little surprising to see t...

22/04/2026

SBS names shortlisted brands for 2026 SBS Media Sustainability Challenge

SBS names shortlisted brands for 2026 SBS Media Sustainability Challenge 22 April, 2026 Media releases National broadcaster also releases its second annual...

22/04/2026

The Frequency That Decides the Fight

Why Low Band Electronic Warfare Matters...

22/04/2026

Polish national football team play-off games top monthly programme list

The nation unites around football team's World Cup dream Warsaw, Poland, 20.04.26: Nielsen, a global leader in audience measurement, data, and media intell...

22/04/2026

Nielsen and the Polish Organisation of Advertisers announce strategic partnership to elevate marketing standards in Poland

Warsaw, Poland, 22.04.26: Nielsen, a global leader in audience measurement, data...

22/04/2026

Nielsen helps New Zealand brands expand internationally with greater clarity and confidence

New market intelligence offering gives businesses a clearer view of local consum...

22/04/2026

Glookast Unveils New UX, YouTube and Social Media Connectors, Premiere Panel, Cinnafilm Tachyon Plugin and More at NAB

Glookast Unveils New UX, YouTube and Social Media Connectors, Premiere Panel, Ci...

22/04/2026

Lightcraft Technology to Preview Spark Story at NAB 2026 with Interactive Previs Experience

Lightcraft Technology to Preview Spark Story at NAB 2026 with Interactive Previs...

22/04/2026

Bolin Demos New PTZ Cameras and Controller at 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

Anchor Audio Launches Beacon 3

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

FCC Grants WSWB TV License Transfer to Sinclair

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

Telemundo Puerto Rico Streaming Channel Launches On Prime Video

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

Chyron Announces PRIME Translate

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

TV Tech Announces Winners of Best of Show Awards at 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

22/04/2026

VEON's Banglalink to Bring Starlink Mobile to Customers in Bangladesh

22 Apr 2026 VEON's Banglalink to Bring Starlink Mobile to Customers in Bangladesh Bangladesh becomes the third market where VEON and Starlink Mobile partne...

22/04/2026

FIRST LOOK FOR NEW U DRAMA SERIES HIT POINT

U have unveiled exclusive first-look images for their six-part police thriller Hit Point, starring Nick Blood (Day of the Jackal) and BAFTA nominee Saffron Hock...

22/04/2026

UKTV Highlights: Saturday May 9th -15th 2026

What can I watch on UKTV and stream on U this week? This week on UKTV and the free streaming service U, viewers can watch a range of new and returning programm...

22/04/2026

Sky announces fifth year of WNT Fund with 30,000 bursary supporting players and grassroots football

Wednesday 22 April 2026 Sky announces fifth year of WNT Fund with 30,000 bursa...

22/04/2026

This Earth Day, Discover the Sustainable Productions Behind Our Films and Series

Back to All News This Earth Day, Discover the Sustainable Productions Behind Our Films and Series Emma Stewart, Ph.D. Netflix Sustainability Officer Enterta...

22/04/2026

Retail Media Standards Are Expanding Into Commerce Media - Here's Why That Matters for Measurement

The move from Retail Media to Commerce Media is about broadening the scope of th...

22/04/2026

Dolby and BMW Bring Dolby Atmos to the BMW 7 Series, Expanding Immersive Audio Across Future Models

April 22 2026, 07:00 (PDT) Dolby and BMW Bring Dolby Atmos to the BMW 7 Series,...

22/04/2026

RT Licenses Stolen Sister to Pushkin

RT Documentary On One 7-part series breaks US market for first time RT Programme Sales has announced its first deal with a US distribution partner for its 7-...