Sony Pixel Power calrec Sony

Mission NIMpossible: Decoding the Microservices That Accelerate Generative AI

10/07/2024

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible and showcases new hardware, software, tools and accelerations for NVIDIA RTX PC and workstation users.

In the rapidly evolving world of artificial intelligence, generative AI is captivating imaginations and transforming industries. Behind the scenes, an unsung hero is making it all possible: microservices architecture.

The Building Blocks of Modern AI Applications Microservices have emerged as a powerful architecture, fundamentally changing how people design, build and deploy software.

A microservices architecture breaks down an application into a collection of loosely coupled, independently deployable services. Each service is responsible for a specific capability and communicates with other services through well-defined application programming interfaces, or APIs. This modular approach stands in stark contrast to traditional all-in-one architectures, in which all functionality is bundled into a single, tightly integrated application.

By decoupling services, teams can work on different components simultaneously, accelerating development processes and allowing updates to be rolled out independently without affecting the entire application. Developers can focus on building and improving specific services, leading to better code quality and faster problem resolution. Such specialization allows developers to become experts in their particular domain.

Services can be scaled independently based on demand, optimizing resource utilization and improving overall system performance. In addition, different services can use different technologies, allowing developers to choose the best tools for each specific task.

A Perfect Match: Microservices and Generative AI The microservices architecture is particularly well-suited for developing generative AI applications due to its scalability, enhanced modularity and flexibility.

AI models, especially large language models, require significant computational resources. Microservices allow for efficient scaling of these resource-intensive components without affecting the entire system.

Generative AI applications often involve multiple steps, such as data preprocessing, model inference and post-processing. Microservices enable each step to be developed, optimized and scaled independently. Plus, as AI models and techniques evolve rapidly, a microservices architecture allows for easier integration of new models as well as the replacement of existing ones without disrupting the entire application.

NVIDIA NIM: Simplifying Generative AI Deployment As the demand for AI-powered applications grows, developers face challenges in efficiently deploying and managing AI models.

NVIDIA NIM inference microservices provide models as optimized containers to deploy in the cloud, data centers, workstations, desktops and laptops. Each NIM container includes the pretrained AI models and all the necessary runtime components, making it simple to integrate AI capabilities into applications.

NIM offers a game-changing approach for application developers looking to incorporate AI functionality by providing simplified integration, production-readiness and flexibility. Developers can focus on building their applications without worrying about the complexities of data preparation, model training or customization, as NIM inference microservices are optimized for performance, come with runtime optimizations and support industry-standard APIs.

AI at Your Fingertips: NVIDIA NIM on Workstations and PCs Building enterprise generative AI applications comes with many challenges. While cloud-hosted model APIs can help developers get started, issues related to data privacy, security, model response latency, accuracy, API costs and scaling often hinder the path to production.

Workstations with NIM provide developers with secure access to a broad range of models and performance-optimized inference microservices.

By avoiding the latency, cost and compliance concerns associated with cloud-hosted APIs as well as the complexities of model deployment, developers can focus on application development. This accelerates the delivery of production-ready generative AI applications - enabling seamless, automatic scale out with performance optimization in data centers and the cloud.

The recently announced general availability of the Meta Llama 3 8B model as a NIM, which can run locally on RTX systems, brings state-of-the-art language model capabilities to individual developers, enabling local testing and experimentation without the need for cloud resources. With NIM running locally, developers can create sophisticated retrieval-augmented generation (RAG) projects right on their workstations.

Local RAG refers to implementing RAG systems entirely on local hardware, without relying on cloud-based services or external APIs.

Developers can use the Llama 3 8B NIM on workstations with one or more NVIDIA RTX 6000 Ada Generation GPUs or on NVIDIA RTX systems to build end-to-end RAG systems entirely on local hardware. This setup allows developers to tap the full power of Llama 3 8B, ensuring high performance and low latency.

By running the entire RAG pipeline locally, developers can maintain complete control over their data, ensuring privacy and security. This approach is particularly helpful for developers building applications that require real-time responses and high accuracy, such as customer-support chatbots, personalized content-generation tools and interactive virtual assistants.

Hybrid RAG combines local and cloud-based resources to optimize performance and flexibility in AI applications. With NVIDIA AI Workbench, developers can get started with the hybrid-RAG Workbench Project - an example application that can be used to run vector databases and embedding models locally whil
LINK: https://blogs.nvidia.com/blog/ai-decoded-nim/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

02/05/2026

Dalet Flex LTS Delivers Smarter Search, Faster Editing, and an AI-Ready Foundation for Modern Media

Dalet, a leading technology and service provider for media-rich organizations, t...

01/05/2026

NBCUniversal's Peacock to Be First Streamer to Integrate Dolby's Full Suite of Premium Picture and Sound Innovations

January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...

01/04/2026

DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION

January 4 2026, 18:00 (PST) DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION Douyin Users Can Now Create And Share Videos With Stun...

26/03/2026

Phantom C-Series High-Speed Cameras Set a New Standard for Automotive Crash and Safety Imaging

Wayne, N.J., March 26, 2026 Phantom High-Speed announces the latest product li...

25/03/2026

In The Hot Seat: The Art of Directing a Premier League Match

Live match directors Sarah Cheadle (Sky Sports), Rob Levi (TNT Sports), and Andrew Swift (BBC Sport) sit down with the Premier League's Rachel Nightingale t...

25/03/2026

SVG Students To Watch: Kyle Maier, St. Bonaventure University

The senior from Upstate New York is manning the mic while also interning for the athletic department's sports-information team In the live-sports-video ind...

25/03/2026

NAB 2026: Synamedia Launches Edge Watermarking Solution, Marks 10 Years of ContentArmor

Synamedia has announced ContentArmor Edge Watermarking, a server-side solution t...

25/03/2026

SES Taps K2 Space to Build meoSphere MEO Satellite Network

SES has announced meoSphere, a medium Earth orbit (MEO) satellite network targeted for operation by 2030. The first phase will pair SES-developed software-defin...

25/03/2026

Reuters and TVU Networks Begin Satellite-to-IP Migration for Live News Distribution

TVU Networks is working with Reuters on a phased migration from satellite to a c...

25/03/2026

Nielsen Names Three Senior Hires in Sports, Advertising, and Publishing Roles

Nielsen has announced three senior appointments. Seth Ladetsky has been named Head of Global Sports. Trevor Fellows will lead Nielsen's advertiser and agenc...

25/03/2026

Anoki and Amagi Bring Scene-Level Intelligence to In-Content CTV Ads

Anoki and Amagi have launched In-Scene Ads powered by Anoki ContextIQ across Amagi's portfolio of in-content ad formats for Free Ad-supported Streaming TV (...

25/03/2026

NAB 2026: Arkona to Unveil BLADE//planner and Platform Updates

Arkona Technologies will announce a series of enhancements to its BLADE//runner platform at NAB 2026 (Booth C.1808). The updates focus on usability and workflow...

25/03/2026

San Diego Padres Partners With Daktronics to Enhance Petco Park

Daktronics has installed two tower displays and a video wall in the Lexus Club at Petco Park in San Diego ahead of the 2026 season. Continuing to improve the ...

25/03/2026

NAB 2026: MultiDyne Marks 50th Anniversary

MultiDyne Video & Fiber Optic Systems is celebrating its 50th anniversary as NAB Show 2026 approaches. The company was founded in 1976 by Vincent Jachetta, an N...

25/03/2026

NAB 2026: IPC to Debut with One Connect Intercom Platform and New One Link Keypanels

IPC, a provider of integrated communication solutions, will make its NAB 2026 de...

25/03/2026

ESPN Tops 2026 Sports Emmy Nominations With 63 Nods

Live production categories were led by NBC, FOX, and ESPN's NFL coverage...

25/03/2026

Atlanta Braves and Spectrum Reach Multiyear Distribution Agreement for BravesVision

The Atlanta Braves and Spectrum have announced a multiyear distribution agreemen...

25/03/2026

The AI Doc Asks the Question No One Wants to Answer

(L-R) Charlie Tyrell and Daniel Roher attend The AI Doc: Or How I Became An Apocaloptimist Premiere during the 2026 Sundance Film Festival at The Ray Theatre ...

25/03/2026

Kelsey Lu and Savanah Leaf Lean Into the Emotional Core of Running To Pain' in Episode Three of Directed By'

Directed By, Spotify's documentary-style series that pulls back the curtain ...

25/03/2026

BTS and Spotify Bring ARIRANG' to Top Fans in New York City

BTS is so back., This week, the global pop superstars took the stage at New York City's Pier 17 for their first U.S. performance in four years. Part of Spo...

25/03/2026

Step Into Sound at Our New Spotify Listening Lounge in London

How you listen can shape what you hear. That's the idea behind the new Spotify Listening Lounge, an acoustic space at our London headquarters purpose-built ...

25/03/2026

Iconic Instruments launch Transport Vintage Tape

Tape effects taken to the extreme The latest release from New York-based developer Iconic Instruments is said to accurately recreate the saturation and comp...

25/03/2026

Sonuscore introduce Fantasy Vocal Phrases

Launched alongside new Vocal Phrases bundle Sonuscore's latest release has been designed specifically for composers working on fantasy TV, film and game...

25/03/2026

Steinberg unveil Nuendo 15

Latest update now live The latest version of Steinberg's post-production-focused DAW has just arrived, and comes packed with new dialogue editing, sound...

25/03/2026

Rohde & Schwarz joins FormFactor's MeasureOne partner program

Rohde & Schwarz joins FormFactor's MeasureOne partner program FormFactor and Rohde & Schwarz advance their partnership for on-wafer RF component character...

25/03/2026

L3Harris, RFTEQ Sign Agreement to Advance Sovereign Electronic Warfare Capability in Australia

L3Harris Technologies and RFTEQ Pty Ltd signed a memorandum of understanding to ...

25/03/2026

L3Harris to Provide Autonomous Underwater Capability for US Navy Submarines

L3Harris delivers combat-ready Torpedo Tube Launch and Recovery system, which deploys and retrieves Iver4 900 autonomous underwater vehicles through submarine t...

25/03/2026

Nielsen Names New Senior Leaders Supporting Sports, Advertising and Publishing Clients

The company expands leadership team under Chief Revenue Officer Amilcar Perez S...

25/03/2026

Stable TV Viewership in Poland in February as Warner Bros. Discovery Retains Top Spot

Winter Olympic Games Opening Ceremony features in top 10 programmes of the month...

25/03/2026

Mediaproxy to Show Upgrades to LogServer at 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

25/03/2026

Hitomi transforms production synchronisation with the lau...

Providing wide view timing visibility across the entire production chain...

25/03/2026

Bitfocus showcases complete control at NAB Show 2026

Continuing development drives advances in security, availability, access and connectivity...

25/03/2026

Caudalie Paris HQ elevates brand experience with INFiLED...

Caudalie, the renowned French cosmetics brand, has unveiled a state-of-the-art 200-seat auditorium at its new headquarters in the historic Marais district of ce...

25/03/2026

Telestream Unlocks Adobe-Centric Media Pipeline and Strea...

Telestream, a global leader in media workflow technologies, today announced expanded integration with Adobe Premiere, Adobe Media Encoder (AME), and Frame.io, d...

25/03/2026

Marshall Electronics Showcases New Feature Rich CV320 and...

Marshall Electronics is expanding its lineup of high-performance POV cameras designed for broadcast, live production and professional AV applications with the d...

25/03/2026

OOONA Achieves TPN Gold Star Shield - the Highest Level o...

OOONA, a global provider of professional management and production tools for the media localization industry, announced today that it has been awarded the TPN G...

25/03/2026

Gray Media to Simulcast 2026 Atlanta Braves Home Opener

Share Copy link Facebook X Linkedin Bluesky Email...

25/03/2026

2026 NAB Show Exhibitor Insight: Appear

Share Copy link Facebook X Linkedin Bluesky Email...

25/03/2026

Deepfakes Vulnerable to AI Fingerprint Hacks, Study Finds

Share Copy link Facebook X Linkedin Bluesky Email...

25/03/2026

SipMX launches at NAB Show 2026 to democratize media orch...

SipRadius, specialists in secure, low-latency media transport, will drive innovation and interoperability still further with the launch of the SipMX Alliance at...

25/03/2026

QuickLink to Showcase StudioPro Video Production Ecosyste...

QuickLink, a leading provider of award-winning video production and remote contribution solutions, will showcase its award-winning StudioPro ecosystem at The 2...

25/03/2026

Alfalite returns to NAB Show alongside FOR-A showcasing L...

Alfalite, the only European manufacturer of LED screens, will once again be present at NAB Show, the leading global event for the broadcast, media and entertain...

25/03/2026

First Time Exhibitor IPC Introduces One Link Keypanels fo...

PC, a leading provider of integrated communication solutions, will make its NAB 2026 Show debut in Booth C 3341. The company will highlight the newly branded On...

25/03/2026

LTN showcases satellite-to-IP migration solutions for cha...

Open video transport network simplifies satellite transition with new partnerships and global reach as spectrum auction nears LTN will demonstrate how its purp...

25/03/2026

Leader to present full suite of advanced Test and Measure...

Test & measurement innovator, Leader Electronics, will present the full range of Leader, PHABRIX and LeaderPhabrix T&M solutions at the NAB Show, which takes pl...

25/03/2026

MNC Software Appoints Heartland Video Systems as North Am...

MNC Software, a global leader in network management and operational support systems tailored to the broadcast and media industry, today announced the appointmen...

25/03/2026

Samsung Launches Shoppable TV Experience with Amazon

Share Copy link Facebook X Linkedin Bluesky Email...

25/03/2026

Gray Media to Simulcast 2026 Atlanta Braves to Home Opener

Share Copy link Facebook X Linkedin Bluesky Email...