Sony Pixel Power calrec Sony

Small and Mighty: NVIDIA Accelerates Microsoft's Open Phi-3 Mini Language Models

23/04/2024

NVIDIA announced today its acceleration of Microsoft's new Phi-3 Mini open language model with NVIDIA TensorRT-LLM, an open-source library for optimizing large language model inference when running on NVIDIA GPUs from PC to cloud.

Phi-3 Mini packs the capability of 10x larger models and is licensed for both research and broad commercial usage, advancing Phi-2 from its research-only roots. Workstations with NVIDIA RTX GPUs or PCs with GeForce RTX GPUs have the performance to run the model locally using Windows DirectML or TensorRT-LLM.

The model has 3.8 billion parameters and was trained on 3.3 trillion tokens in only seven days on 512 NVIDIA H100 Tensor Core GPUs.

Phi-3 Mini has two variants, with one supporting 4k tokens and the other supporting 128K tokens, which is the first model in its class for very long contexts. This allows developers to use 128,000 tokens - the atomic parts of language that the model processes - when asking the model a question, which results in more relevant responses from the model.

Developers can try Phi-3 Mini with the 128K context window at ai.nvidia.com, where it is packaged as an NVIDIA NIM, a microservice with a standard application programming interface that can be deployed anywhere.

Creating Efficiency for the Edge Developers working on autonomous robotics and embedded devices can learn to create and deploy generative AI through community-driven tutorials, like on Jetson AI Lab, and deploy Phi-3 on NVIDIA Jetson.

With only 3.8 billion parameters, the Phi-3 Mini model is compact enough to run efficiently on edge devices. Parameters are like knobs, in memory, that have been precisely tuned during the model training process so that the model can respond with high accuracy to input prompts.

Phi-3 can assist in cost- and resource-constrained use cases, especially for simpler tasks. The model can outperform some larger models on key language benchmarks while delivering results within latency requirements.

TensorRT-LLM will support Phi-3 Mini's long context window and uses many optimizations and kernels such as LongRoPE, FP8 and inflight batching, which improve inference throughput and latency. The TensorRT-LLM implementations will soon be available in the examples folder on GitHub. There, developers can convert to the TensorRT-LLM checkpoint format, which is optimized for inference and can be easily deployed with NVIDIA Triton Inference Server.

Developing Open Systems NVIDIA is an active contributor to the open-source ecosystem and has released over 500 projects under open-source licenses.

Contributing to many external projects such as JAX, Kubernetes, OpenUSD, PyTorch and the Linux kernel, NVIDIA supports a wide variety of open-source foundations and standards bodies as well.

Today's news expands on long-standing NVIDIA collaborations with Microsoft, which have paved the way for innovations including accelerating DirectML, Azure cloud, generative AI research, and healthcare and life sciences.

Learn more about our recent collaboration.
LINK: https://blogs.nvidia.com/blog/microsoft-open-phi-3-mini-language-model...
See more stories from nvidia

More from Nvidia

14/06/2024

The Proudest Refugee': How Veronica Miller Charts Her Own Path at NVIDIA

When she was five years old, Veronica Miller (n e Teklai) and her family left their homeland of Eritrea, in the Horn of Africa, to escape an ongoing war with Et...

14/06/2024

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

NVIDIA today announced Nemotron-4 340B, a family of open models that developers ...

13/06/2024

Cloud Ahoy! Treasure Awaits With Sea of Thieves' on GeForce NOW

Set sail for adventure, pirates. Sea of Thieves makes waves in the cloud this week. It's an adventure-filled GFN Thursday with four new games joining the Ge...

12/06/2024

Every Company's Data is Their Gold Mine,' NVIDIA CEO Says at Databricks Data + AI Summit

Accelerated computing is transforming data processing and analytics for enterpri...

12/06/2024

Scaling to New Heights: NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

The full-stack NVIDIA accelerated computing platform has once again demonstrated...

12/06/2024

Nerding About NeRFs: How Neural Radiance Fields Transform 2D Images Into Hyperrealistic 3D Models

Let's talk about NeRFs - no, not the neon-colored foam dart blasters, but ne...

12/06/2024

TOPS of the Class: Decoding AI Performance on RTX AI PCs and Workstations

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...

07/06/2024

Why Accelerated Data Processing Is Crucial for AI Innovation in Every Industry

Across industries, AI is supercharging innovation with machine-powered computation. In finance, bankers are using AI to detect fraud more quickly and keep accou...

06/06/2024

Here Comes a New Challenger: Street Fighter 6' Joins GeForce NOW

Capcom's latest entry in the iconic Street Fighter series, Street Fighter 6, punches its way into the cloud this GFN Thursday. The game, along with Ubisoft&...

05/06/2024

Yotta CEO Sunil Gupta on Supercharging India's Fast-Growing AI Market

India's AI market is expected to be massive. Yotta Data Services is setting its sights on supercharging it. In this episode of NVIDIA's AI Podcast, Suni...

05/06/2024

Creativity Accelerated: New RTX-Powered AI Hardware and Software Announced at COMPUTEX

NVIDIA launched NVIDIA Studio at COMPUTEX in 2019. Five years and more than 500 ...

04/06/2024

SAP and NVIDIA Create AI for The Most Valuable Language,' CEOs Unveil at Sapphire Orlando

German enterprise cloud leader SAP is harnessing generative AI and industrial di...

04/06/2024

NVIDIA and Cisco Weave Fabric for Generative AI

Building and deploying AI applications at scale requires a new class of computing infrastructure - one that can handle the massive amounts of data, compute powe...

03/06/2024

Digital Bank Debunks Financial Fraud With Generative AI

European neobank bunq is debunking financial fraudsters with the help of NVIDIA accelerated computing and AI. Dubbed the bank of the free, bunq offers online...

02/06/2024

Foxconn Trains Robots, Streamlines Assembly With NVIDIA AI and Omniverse

Foxconn operates more than 170 factories around the world - the latest one a virtual plant pushing the state of the art in industrial automation. It's the ...

02/06/2024

Taiwan Electronics Giants Drive Industrial Automation With NVIDIA Metropolis and NIM

Taiwan's leading consumer electronics giants are making advances with AI aut...

02/06/2024

KServe Providers Dish Up NIMble Inference in Clouds and Data Centers

Deploying generative AI in the enterprise is about to get easier than ever. NVIDIA NIM, a set of generative AI inference microservices, works with KServe, open...

02/06/2024

Accelerate Everything,' NVIDIA CEO Says Ahead of COMPUTEX

Generative AI is reshaping industries and opening new opportunities for innovation and growth, NVIDIA founder and CEO Jensen Huang said in an address ahead of ...

02/06/2024

Power Tool: Generative AI Tracks Typhoons, Tames Energy Use

Weather forecasters in Taiwan had their hair blown back when they saw a typhoon up close, created on a computer that slashed the time and energy needed for the ...

31/05/2024

NVIDIA Grace Hopper Superchip Accelerates Murex MX.3 Analytics Performance, Reduces Power Consumption

After the 2008 financial crisis and increased risk-management regulations that f...

30/05/2024

Elevate Your Expertise: NVIDIA Introduces AI Infrastructure and Operations Training and Certification

NVIDIA has introduced a self-paced course, called AI Infrastructure and Operatio...

30/05/2024

GeForce NOW Brings the Heat With World of Warcraft'

World of Warcraft comes to the cloud this week, part of the 17 games joining the GeForce NOW library, with seven available to stream this week. Plus, it's ...

29/05/2024

Riding the Wayve of AV 2.0, Driven by Generative AI

Generative AI is propelling AV 2.0, a new era in autonomous vehicle technology characterized by large, unified, end-to-end AI models capable of managing various...

29/05/2024

Tidy Tech: How Two Stanford Students Are Building Robots for Handling Household Chores

Imagine having a robot that could help you clean up after a party - or fold heap...

29/05/2024

Decoding How NVIDIA RTX AI PCs and Workstations Tap the Cloud to Supercharge Generative AI

Editor's note: This post is part of the AI Decoded series, which demystifies...

27/05/2024

NVIDIA Scoops Up Wins at COMPUTEX Best Choice Awards

Building on more than a dozen years of stacking wins at the COMPUTEX trade show's annual Best Choice Awards, NVIDIA was today honored with BCAs for its late...

23/05/2024

Senua's Story Continues: GeForce NOW Brings Senua's Saga: Hellblade II' to the Cloud

Every week, GFN Thursday brings new games to the cloud, featuring some of the la...

23/05/2024

Into the Omniverse: SoftServe and Continental Drive Digitalization With OpenUSD and Generative AI

Editor's note: This post is part of Into the Omniverse, a series focused on ...

21/05/2024

Watt a Win: NVIDIA Sweeps New Ranking of World's Most Energy-Efficient Supercomputers

In the latest ranking of the world's most energy-efficient supercomputers, k...

21/05/2024

New Performance Optimizations Supercharge NVIDIA RTX AI PCs for Gamers, Creators and Developers

NVIDIA today announced at Microsoft Build new AI performance optimizations and i...

21/05/2024

NVIDIA Expands Collaboration With Microsoft to Help Developers Build, Deploy AI Applications Faster

If optimized AI workflows are like a perfectly tuned orchestra - where each comp...

21/05/2024

A Superbloom of Updates in the May Studio Driver Gives Fresh Life to Content Creation

Editor's note: This post is part of our In the NVIDIA Studio series, which c...

20/05/2024

Every Company to Be an Intelligence Manufacturer,' Declares NVIDIA CEO Jensen Huang at Dell Technologies World

AI heralds a new era of innovation for every business in every industry, NVIDIA ...

16/05/2024

Fight for Honor in Men of War II' on GFN Thursday

Whether looking for new adventures, epic storylines or games to play with a friend, GeForce NOW members are covered. Start off with the much-anticipated sequel...

15/05/2024

NVIDIA, Teradyne and Siemens Gather in the City of Robotics' to Discuss Autonomous Machines and AI

Senior executives from NVIDIA, Siemens and Teradyne Robotics gathered this week ...

15/05/2024

Fire It Up: Mozilla Firefox Adds Support for AI-Powered NVIDIA RTX Video

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, ...

15/05/2024

How Basecamp Research Helps Catalog Earth's Biodiversity

Basecamp Research is on a mission to capture the vastness of life on Earth at an unprecedented scale. Phil Lorenz, CTO at Basecamp Research, discusses using AI ...

15/05/2024

Needle-Moving AI Research Trains Surgical Robots in Simulation

A collaboration between NVIDIA and academic researchers is prepping robots for surgery. ORBIT-Surgical - developed by researchers from the University of Toront...

14/05/2024

Gemma, Meet NIM: NVIDIA Teams Up With Google DeepMind to Drive Large Language Model Innovation

Large language models that power generative AI are seeing intense innovation - m...

13/05/2024

Drug Discovery, STAT! NVIDIA, Recursion Speed Pharma R&D With AI Supercomputer

Described as the largest system in the pharmaceutical industry, BioHive-2 at the Salt Lake City headquarters of Recursion debuts today at No. 35, up more than 1...

13/05/2024

Drug Discovery, STAT! NVIDIA, Recursion Speed Pharma R&D With AI Supercomputer

Described as the largest system in the pharmaceutical industry, BioHive-2 at the...

12/05/2024

Dial It In: Data Centers Need New Metric for Energy Efficiency

Data centers need an upgraded dashboard to guide their journey to greater energy efficiency, one that shows progress running real-world applications. The formu...

12/05/2024

Generating Science: NVIDIA AI Accelerates HPC Research

Generative AI is taking root at national and corporate labs, accelerating high-performance computing for business and science. Researchers at Sandia National L...

12/05/2024

NVIDIA Blackwell Platform Pushes the Boundaries of Scientific Computing

Quantum computing. Drug discovery. Fusion energy. Scientific computing and physics-based simulations are poised to make giant steps across domains that benefit ...

09/05/2024

Through the Wormhole: Media.Monks' Vision for Enhancing Media and Marketing With AI

Meet Media.Monks' Wormhole, an alien-like, conversational robot with a quirk...

09/05/2024

Honkai: Star Rail' Blasts Off on GeForce NOW

Gear up, Trailblazers - Honkai: Star Rail lands on GeForce NOW this week, along with an in-game reward for members to celebrate the title's launch in the cl...

08/05/2024

Get On the Train' NVIDIA CEO Says at ServiceNow's Knowledge 2024

Now's the time to hop aboard AI, NVIDIA founder and CEO Jensen Huang declared Wednesday as ServiceNow unveiled a demo of futuristic AI avatars together with...

08/05/2024

‘Get On the Train,’ NVIDIA CEO Says at ServiceNow's Knowledge 2024

Now's the time to hop aboard AI, NVIDIA founder and CEO Jensen Huang declare...

08/05/2024

NVIDIA CEO Jensen Huang to Deliver Keynote Ahead of COMPUTEX 2024

Amid an AI revolution sweeping through trillion-dollar industries worldwide, NVIDIA founder and CEO Jensen Huang will deliver a keynote address ahead of COMPUTE...

08/05/2024

AI Decoded: New DaVinci Resolve Tools Bring RTX-Accelerated Renaissance to Editors

AI tools accelerated by NVIDIA RTX have made it easier than ever to edit and wor...