Sony Pixel Power calrec Sony

Mission NIMpossible: Decoding the Microservices That Accelerate Generative AI

10/07/2024

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible and showcases new hardware, software, tools and accelerations for NVIDIA RTX PC and workstation users.

In the rapidly evolving world of artificial intelligence, generative AI is captivating imaginations and transforming industries. Behind the scenes, an unsung hero is making it all possible: microservices architecture.

The Building Blocks of Modern AI Applications Microservices have emerged as a powerful architecture, fundamentally changing how people design, build and deploy software.

A microservices architecture breaks down an application into a collection of loosely coupled, independently deployable services. Each service is responsible for a specific capability and communicates with other services through well-defined application programming interfaces, or APIs. This modular approach stands in stark contrast to traditional all-in-one architectures, in which all functionality is bundled into a single, tightly integrated application.

By decoupling services, teams can work on different components simultaneously, accelerating development processes and allowing updates to be rolled out independently without affecting the entire application. Developers can focus on building and improving specific services, leading to better code quality and faster problem resolution. Such specialization allows developers to become experts in their particular domain.

Services can be scaled independently based on demand, optimizing resource utilization and improving overall system performance. In addition, different services can use different technologies, allowing developers to choose the best tools for each specific task.

A Perfect Match: Microservices and Generative AI The microservices architecture is particularly well-suited for developing generative AI applications due to its scalability, enhanced modularity and flexibility.

AI models, especially large language models, require significant computational resources. Microservices allow for efficient scaling of these resource-intensive components without affecting the entire system.

Generative AI applications often involve multiple steps, such as data preprocessing, model inference and post-processing. Microservices enable each step to be developed, optimized and scaled independently. Plus, as AI models and techniques evolve rapidly, a microservices architecture allows for easier integration of new models as well as the replacement of existing ones without disrupting the entire application.

NVIDIA NIM: Simplifying Generative AI Deployment As the demand for AI-powered applications grows, developers face challenges in efficiently deploying and managing AI models.

NVIDIA NIM inference microservices provide models as optimized containers to deploy in the cloud, data centers, workstations, desktops and laptops. Each NIM container includes the pretrained AI models and all the necessary runtime components, making it simple to integrate AI capabilities into applications.

NIM offers a game-changing approach for application developers looking to incorporate AI functionality by providing simplified integration, production-readiness and flexibility. Developers can focus on building their applications without worrying about the complexities of data preparation, model training or customization, as NIM inference microservices are optimized for performance, come with runtime optimizations and support industry-standard APIs.

AI at Your Fingertips: NVIDIA NIM on Workstations and PCs Building enterprise generative AI applications comes with many challenges. While cloud-hosted model APIs can help developers get started, issues related to data privacy, security, model response latency, accuracy, API costs and scaling often hinder the path to production.

Workstations with NIM provide developers with secure access to a broad range of models and performance-optimized inference microservices.

By avoiding the latency, cost and compliance concerns associated with cloud-hosted APIs as well as the complexities of model deployment, developers can focus on application development. This accelerates the delivery of production-ready generative AI applications - enabling seamless, automatic scale out with performance optimization in data centers and the cloud.

The recently announced general availability of the Meta Llama 3 8B model as a NIM, which can run locally on RTX systems, brings state-of-the-art language model capabilities to individual developers, enabling local testing and experimentation without the need for cloud resources. With NIM running locally, developers can create sophisticated retrieval-augmented generation (RAG) projects right on their workstations.

Local RAG refers to implementing RAG systems entirely on local hardware, without relying on cloud-based services or external APIs.

Developers can use the Llama 3 8B NIM on workstations with one or more NVIDIA RTX 6000 Ada Generation GPUs or on NVIDIA RTX systems to build end-to-end RAG systems entirely on local hardware. This setup allows developers to tap the full power of Llama 3 8B, ensuring high performance and low latency.

By running the entire RAG pipeline locally, developers can maintain complete control over their data, ensuring privacy and security. This approach is particularly helpful for developers building applications that require real-time responses and high accuracy, such as customer-support chatbots, personalized content-generation tools and interactive virtual assistants.

Hybrid RAG combines local and cloud-based resources to optimize performance and flexibility in AI applications. With NVIDIA AI Workbench, developers can get started with the hybrid-RAG Workbench Project - an example application that can be used to run vector databases and embedding models locally whil
LINK: https://blogs.nvidia.com/blog/ai-decoded-nim/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

06/09/2026

Dolby and MagentaTV Bring Fans Closer to the FIFA World Cup 2026 in Germany with Dolby Vision and Dolby Atmos

June 9 2026, 23:00 (PDT) Dolby and MagentaTV Bring Fans Closer to the FIFA Worl...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

17/06/2026

Spectrum Awards $1.1 Million in Digital Education Grants

Share Copy link Facebook X Linkedin Bluesky Email...

17/06/2026

XR Sports Alliance Adds New Members

Share Copy link Facebook X Linkedin Bluesky Email...

17/06/2026

AIMS Launches Free Online IPMX Training Series

Share Copy link Facebook X Linkedin Bluesky Email...

17/06/2026

Kiloview Partners with SFM to Expand AV-over-IP Solutions...

Montr al, Quebec, June 11, 2026 Kiloview, a leading provider of AV-over-IP and NDI -based video transmission solutions, today announced a distribution partner...

17/06/2026

Kiloview Launches U4 IP Video Dock Bringing Professional...

Changsha, China, June 15, 2026 Kiloview officially announced the launch of U4 IP Video Dock, a compact IP video decoder and output dock designed to bring prof...

17/06/2026

June 16, 2026

Calibr-Skaggs awarded $5.1M by NIH to develop long-acting hepatitis B virus therapy A new program aims to replace a daily HBV drug with once-monthly or even qua...

16/06/2026

Thomson launches new learning App

Thomson's highly regarded expert-led online learning courses are now easier to access on the go via our new App. Available now on Google Play Store, the J...

16/06/2026

Neumann MT 48 Receives Major Firmware 2.0 Update

Neumann.Berlin has released firmware version 2.0 for the MT 48 audio interface, adding plugin compatibility, expanded Dante networking options, broadcast encode...

16/06/2026

TVNewsCheck Opens Nominations for 2027 Women in Technology Awards

TVNewsCheck has announced that nominations are now open for its 2027 Women in Technology Awards, to be presented at NAB Show 2027 on Tuesday, April 6 in the Med...

16/06/2026

Clear-Com Introduces Avalon IP Intercom Platform

Clear-Com has announced Avalon, a 1RU IP intercom platform for broadcast, live events, and production environments. Designed for IP-only workflows, Avalon suppo...

16/06/2026

SNS EVO Enables Remote and Distributed Video Editing Workflows

SNS has published a guide to remote video editing workflows using its EVO shared storage platform and companion tools, covering use cases ranging from home edit...

16/06/2026

Richmond Flying Squirrels Deploy Grass Valley LDX 110 Cameras at CarMax Park

Grass Valley has announced that the Richmond Flying Squirrels, a Minor League Baseball affiliate of the San Francisco Giants, have deployed five Grass Valley LD...

16/06/2026

AIMS Launches Free Official IPMX Training Series Online

The Alliance for IP Media Solutions (AIMS) has announced the launch of the Official IPMX Training Series, a free online program covering the design, configurati...

16/06/2026

Swerve Womens Sports Announces Distribution Deals with Fubo, Plex, Amazon Fire TV, and Anoki AI

Swerve TV has announced distribution agreements with Fubo, Plex, Amazon Fire TV,...

16/06/2026

ATP and TikTok Expand Global Content Partnership

ATP and TikTok have announced an expansion of their global content partnership, extending the ATP's TikTok hub powered by TikTok GamePlan to cover all nine ...

16/06/2026

FOX Sports Turns Los Angeles Pico Lot Into Its FIFA World Cup Production Nerve Center

Network's LA facility serves as the heart of a sprawling operation built to ...

16/06/2026

300+ Records a Day, 150 TB Daily, and a Relentless Content Avalanche: Inside FOX Sports' World Cup Media Engine

At Pico, the network's media-management team is supporting a flood of HBS fe...

16/06/2026

NHL Games Leaving CBC in Canada as Sublicense With Rogers Sportsnet Ends

The NHL will no longer air on CBC after the pulic broadcasters and national rights-holder Rogers Sportsnet were unable to come to agreement. After a successfu...

16/06/2026

SVG New Sponsor Spotlight: Virtual Eye's Ben Taylor on Making Live Sports More Valuable and Entertaining Through Data-Driven Graphics

As live sports broadcasters continue to seek new ways to make complex action mor...

16/06/2026

Thats BRISK, Baby! FOX Sports' Broadcast Remote IP Studio Kits Bring World Cup Fan Energy Back to Pico

Built with the 2026 FIFA World Cup in mind, these small but mighty IP-based tran...

16/06/2026

Rumble three-band soft synth by UVI

Boasts individual synths for each band UVI's latest synth takes an interesting approach to synthesis, offering a trio of synth engines that each operate...

16/06/2026

PSP Levelizer: auto level adjustment plug-in from PSPaudioware

New intelligent auto-fader plug-in unveiled PSPaudioware's latest release offers automatic level adjustment and provides more detailed control than many...

16/06/2026

The Crow Hill Company launch Crystal Pads

New performance-focused library announced Crystal Pads is the latest addition to The Crow Hill Company's ever-growing product range, and according to th...

16/06/2026

GForce launch official Prophet-5 soft synth

Developed in partnership with Sequential In recent years, GForce Software have branched into official emulations of classic hardware synths, delivering a ha...

16/06/2026

DT 30 IE: New in-ears from beyerdynamic

Designed specifically for live performance monitoring beyerdynamic's latest announcement sees the company introduce an affordable in-ear monitoring syst...

16/06/2026

Cherry Audio recreate the Ensoniq ESQ-1

Official emulation celebrates iconic synth's 40th anniversary Cherry Audio have just introduced Ensoniq ESQ-1, an official recreation of the 1986 polyph...

16/06/2026

Australians place growing trust in SBS News

Australians place growing trust in SBS News 16 June, 2026 Media releases SBS has been recognised as one of Australia's most trusted news providers, ran...

16/06/2026

Rohde & Schwarz achieves highest number of GCF validated 3GPP NR NTN test cases for RF, RRM and PCT domains

Rohde & Schwarz achieves highest number of GCF validated 3GPP NR NTN test cases ...

16/06/2026

Hitachi and PESA Announce Strategic Partnership to Drive Growth in Poland's Rail Market

Bydgoszcz to Become a Local Centre of Excellence for Advanced Rail Technologies....

16/06/2026

Chyron Unveils Chyron Weather 2.4

Share Copy link Facebook X Linkedin Bluesky Email...

16/06/2026

Historic Zhuque-3 Reusable Rocket Test Mission Captured with URSA Cine Immersive

Historic Zhuque-3 Reusable Rocket Test Mission Captured with URSA Cine Immersive Brie Clayton June 16, 2026 0 Comments Apple Immersive Video puts view...

16/06/2026

SMPTE Plans ST 2110 Education Summer Programs

Share Copy link Facebook X Linkedin Bluesky Email...

16/06/2026

Rise Awards Returns for 2026 to Celebrate Excellence in B...

Rise WIB, the award-winning advocacy group championing gender diversity and career progression across the broadcast and media technology industry, today announc...

16/06/2026

Limecraft Expands its Media Production Platform with Team...

Limecraft today announced the availability of Limecraft 2026.4, the fourth of eight planned platform releases this year. The update introduces Team-Based Access...

16/06/2026

Perry Sook: Big Tech Poses 'Very Urgent Threat to Broadcast Stations

Share Copy link Facebook X Linkedin Bluesky Email...

16/06/2026

FIFA World Cup Delivers Record Ratings on Fox

Share Copy link Facebook X Linkedin Bluesky Email...

16/06/2026

AIMS Launches the Official IPMX Training Series Online

Free Program Supports IPMX Education from Foundational Concepts Through System and Network Design The Alliance for IP Media Solutions (AIMS) today announced t...

16/06/2026

Share your views on Screen Australia and the future of the industry

Share your views on Screen Australia and the future of the industry 15 June 2026 Your feedback matters. Following the instrumental insights provided in 2025,...

16/06/2026

HPE AI Factory With NVIDIA Expands for the Era of Agents

Enterprises are moving agentic AI from proof of concept to production - and the next generation of AI factories are built for the era of agents. At HPE Discove...

16/06/2026

Coherent Breaks Ground on Expanded Texas Facility, Scaling AI's Optical Backbone

AI runs at the speed of light. More and more, that light is made in Texas. Cohe...

16/06/2026

Techtel Supports T-Motion RCCP-2A Controller Upgrade for Major Australian Broadcaster

Techtel Supports T-Motion RCCP-2A Controller Upgrade for Major Australian Broadc...

16/06/2026

Record audiences tune in for opening weekend of ICC Womens T20 World Cup 2026 on Sky Sports

Tuesday 16 June 2026 Record audiences tune in for opening weekend of ICC Women&...

16/06/2026

Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0

Every breakthrough AI model starts the same way: with a training run. The infrastructure running those training jobs shapes everything: how fast teams can itera...

15/06/2026

University of South Carolina's Valerie Gerfin on Gamecock Productions' Growth, Upgrades at Williams-Brice Stadium

One of the more exciting internal video production divisions within a college at...

15/06/2026

Fox Corp. To Acquire Roku, Pairs Live Sports Powerhouse With Major CTV Platform

The deal valued at $22 Billion is expected to close in the first half of 2027...