
DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, AI models like DeepSeek-R1 perform reasoning through the chain-of-thought method to generate the best answer.
Performing this sequence of inference passes - using reason to arrive at the best answer - is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference.
As models are allowed to iteratively think through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments.
R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding and language understanding while also delivering high inference efficiency.
To help developers securely experiment with these capabilities and build their own specialized agents, the 671-billion-parameter DeepSeek-R1 model is now available as an NVIDIA NIM microservice preview on build.nvidia.com. The DeepSeek-R1 NIM microservice can deliver up to 3,872 tokens per second on a single NVIDIA HGX H200 system.
Developers can test and experiment with the application programming interface (API), which is expected to be available soon as a downloadable NIM microservice, part of the NVIDIA AI Enterprise software platform.
The DeepSeek-R1 NIM microservice simplifies deployments with support for industry-standard APIs. Enterprises can maximize security and data privacy by running the NIM microservice on their preferred accelerated computing infrastructure. Using NVIDIA AI Foundry with NVIDIA NeMo software, enterprises will also be able to create customized DeepSeek-R1 NIM microservices for specialized AI agents.
DeepSeek-R1 - a Perfect Example of Test-Time Scaling DeepSeek-R1 is a large mixture-of-experts (MoE) model. It incorporates an impressive 671 billion parameters - 10x more than many other popular open-source LLMs - supporting a large input context length of 128,000 tokens. The model also uses an extreme number of experts per layer. Each layer of R1 has 256 experts, with each token routed to eight separate experts in parallel for evaluation.
Delivering real-time answers for R1 requires many GPUs with high compute performance, connected with high-bandwidth and low-latency communication to route prompt tokens to all the experts for inference. Combined with the software optimizations available in the NVIDIA NIM microservice, a single server with eight H200 GPUs connected using NVLink and NVLink Switch can run the full, 671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second. This throughput is made possible by using the NVIDIA Hopper architecture's FP8 Transformer Engine at every layer - and the 900 GB/s of NVLink bandwidth for MoE expert communication.
Getting every floating point operation per second (FLOPS) of performance out of a GPU is critical for real-time inference. The next-generation NVIDIA Blackwell architecture will give test-time scaling on reasoning models like DeepSeek-R1 a giant boost with fifth-generation Tensor Cores that can deliver up to 20 petaflops of peak FP4 compute performance and a 72-GPU NVLink domain specifically optimized for inference.
Get Started Now With the DeepSeek-R1 NIM Microservice Developers can experience the DeepSeek-R1 NIM microservice, now available on build.nvidia.com. Watch how it works:
With NVIDIA NIM, enterprises can deploy DeepSeek-R1 with ease and ensure they get the high efficiency needed for agentic AI systems.
See notice regarding software product information.
More from Nvidia
20/10/2025
NVIDIA and Google Cloud are expanding access to accelerated computing to transform the full spectrum of enterprise workloads, from visual computing to agentic a...
17/10/2025
NVIDIA's on the ground at Open Source AI Week. Stay tuned for a celebration ...
17/10/2025
AI has ignited a new industrial revolution.
NVIDIA and TSMC are working togethe...
16/10/2025
GeForce NOW is more than just a platform to stream fresh games every week - it offers celebrations for the gamers who make it epic, with member rewards to sweet...
14/10/2025
AI is transforming the way enterprises build, deploy and scale intelligent applications. As demand surges for enterprise-grade AI applications that offer speed,...
14/10/2025
At Oracle AI World, NVIDIA and Oracle announced they are deepening their collabo...
13/10/2025
The future of AI took flight at Starbase, Texas - where NVIDIA CEO Jensen Huang ...
13/10/2025
At the OCP Global Summit, NVIDIA is offering a glimpse into the future of gigawa...
09/10/2025
NVIDIA Blackwell swept the new SemiAnalysis InferenceMAX v1 benchmarks, deliveri...
09/10/2025
Microsoft Azure today announced the new NDv6 GB300 VM series, delivering the ind...
09/10/2025
Lock, load and stream - the battle is just beginning. EA's highly anticipated Battlefield 6 is set to storm the cloud when it launches tomorrow with GeForce...
08/10/2025
Telecommunication networks are critical infrastructure for every nation, underpi...
02/10/2025
Editor's note: This blog has been updated to include an additional game for October, The Outer Worlds 2.
October is creeping in with plenty of gaming treat...
01/10/2025
Many users want to run large language models (LLMs) locally for more privacy and control, and without subscriptions, but until recently, this meant a trade-off ...
30/09/2025
Quantum computing promises to reshape industries - but progress hinges on solvin...
30/09/2025
Editor's note: This blog is a part of Into the Omniverse, a series focused o...
25/09/2025
Suit up and head for the cloud. Mecha BREAK, the popular third-person shooter, is now available to stream on GeForce NOW with NVIDIA DLSS 4 technology.
Catch i...
24/09/2025
Canada's role as a leader in artificial intelligence was on full display at ...
24/09/2025
Open technologies - made available to developers and businesses to adopt, modify...
23/09/2025
Energy efficiency in large language model inference has improved 100,000x in the...
22/09/2025
OpenAI and NVIDIA just announced a landmark AI infrastructure partnership - an initiative that will scale OpenAI's compute with multi-gigawatt data centers ...
19/09/2025
AI is no longer solely a back-office tool. It's a strategic partner that can...
18/09/2025
The U.K. was the center of the AI world this week as NVIDIA, U.K. and U.S. leade...
18/09/2025
GeForce NOW is packing a monstrous punch this week. Dying Light: The Beast, the latest adrenaline fueled chapter in Techland's parkour meets survival horror...
17/09/2025
Today's creators are equal parts entertainer, producer and gamer, juggling game commentary, scene changes, replay clips, chat moderation and technical troub...
16/09/2025
The U.K. is driving investments in sovereign AI, using the technology to advance...
13/09/2025
Celtic languages - including Cornish, Irish, Scottish Gaelic and Welsh - are the U.K.'s oldest living languages. To empower their speakers, the UK-LLM sover...
10/09/2025
GeForce NOW Blackwell RTX 5080-class SuperPODs are now rolling out, unlocking a new level of ultra high-performance, cinematic cloud gaming.
GeForce NOW Ultima...
09/09/2025
Inference has emerged as the new frontier of complexity in AI. Modern models are...
09/09/2025
As large language models (LLMs) grow larger, they get smarter, with open models from leading developers now featuring hundreds of billions of parameters. At the...
09/09/2025
At this week's AI Infrastructure Summit in Silicon Valley, NVIDIA's VP o...
09/09/2025
Inference performance is critical, as it directly influences the economics of an AI factory. The higher the throughput of AI factory infrastructure, the more to...
09/09/2025
At this week's IAA Mobility conference in Munich, NVIDIA Vice President of A...
09/09/2025
ComfyUI - an open-source, node-based graphical interface for running and buildin...
04/09/2025
NVIDIA today announced new AI education support for K-12 programs at a White House event to celebrate public-private partnerships that advance artificial intell...
04/09/2025
Editor's note: This post is part of the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copi...
04/09/2025
NVIDIA Blackwell RTX is coming to the cloud on Wednesday, Sept. 10 - an upgrade ...
03/09/2025
3D artists are constantly prototyping.
In traditional workflows, modelers must build placeholder, low-fidelity assets to populate 3D scenes, tinkering and adju...
02/09/2025
For more than a century, meteorologists have chased storms with chalkboards, equ...
28/08/2025
Brace yourself, COGs - the Locusts aren't the only thing rising up. The Coal...
28/08/2025
Last week at Gamescom, NVIDIA announced the winners of the NVIDIA and ModDB RTX ...
27/08/2025
AI models are advancing at a rapid rate and scale.
But what might they lack that (most) humans don't? Common sense: an understanding, developed through rea...
25/08/2025
Robots around the world are about to get a lot smarter as physical AI developers...
25/08/2025
As autonomous vehicle systems rapidly grow in complexity, equipped with reasonin...
22/08/2025
As the latest member of the NVIDIA Blackwell architecture family, the NVIDIA Blackwell Ultra GPU builds on core innovations to accelerate training and AI reason...
22/08/2025
AI reasoning, inference and networking will be top of mind for attendees of next...
21/08/2025
Japan is once again building a landmark high-performance computing system - not ...
21/08/2025
From AI assistants doing deep research to autonomous vehicles making split-second navigation decisions, AI adoption is exploding across industries.
Behind ever...
21/08/2025
Across the globe, AI factories are rising - massive new data centers built not to serve up web pages or email, but to train and deploy intelligence itself. Inte...
21/08/2025
Get a glimpse into the future of gaming.
The NVIDIA Blackwell RTX architecture is coming to GeForce NOW in September, marking the service's biggest upgrade...