Sony Pixel Power calrec Sony

Now We're Talking: NVIDIA Releases Open Dataset, Models for Multilingual Speech AI

15/08/2025

Of around 7,000 languages in the world, a tiny fraction are supported by AI language models. NVIDIA is tackling the problem with a new dataset and models that support the development of high-quality speech recognition and translation AI for 25 European languages - including languages with limited available data like Croatian, Estonian and Maltese.

These tools will enable developers to more easily scale AI applications to support global users with fast, accurate speech technology for production-scale use cases such as multilingual chatbots, customer service voice agents and near-real-time translation services. They include:

Granary, a massive, open-source corpus of multilingual speech datasets that contains around a million hours of audio, including nearly 650,000 hours for speech recognition and over 350,000 hours for speech translation.

NVIDIA Canary-1b-v2, a billion-parameter model trained on Granary for high-quality transcription of European languages, plus translation between English and two dozen supported languages. It tops Hugging Face's leaderboard of open models for multilingual speech recognition accuracy.

NVIDIA Parakeet-tdt-0.6b-v3, a streamlined, 600-million-parameter model designed for real-time or large-volume transcription of Granary's supported languages. It has the highest throughput of multilingual models on the Hugging Face leaderboard, measured as duration of audio transcribed divided by computation time.

The paper behind Granary will be presented at Interspeech, a language processing conference taking place in the Netherlands, Aug. 17-21. The dataset, as well as the new Canary and Parakeet models, are now available on Hugging Face.

How Granary Addresses Data Scarcity To develop the Granary dataset, the NVIDIA speech AI team collaborated with researchers from Carnegie Mellon University and Fondazione Bruno Kessler. The team passed unlabeled audio through an innovative processing pipeline powered by NVIDIA NeMo Speech Data Processor toolkit that turned it into structured, high-quality data.

This pipeline allowed the researchers to enhance public speech data into a usable format for AI training, without the need for resource-intensive human annotation. It's available in open source on GitHub.

With Granary's clean, ready-to-use data, developers can get a head start building models that tackle transcription and translation tasks in nearly all of the European Union's 24 official languages, plus Russian and Ukrainian.

For European languages underrepresented in human-annotated datasets, Granary provides a critical resource to develop more inclusive speech technologies that better reflect the linguistic diversity of the continent - all while using less training data.

The team demonstrated in their Interspeech paper that, compared to other popular datasets, it takes around half as much Granary training data to achieve a target accuracy level for automatic speech recognition (ASR) and automatic speech translation (AST).

Tapping NVIDIA NeMo to Turbocharge Transcription The new Canary and Parakeet models offer examples of the kinds of models developers can build with Granary, customized to their target applications. Canary-1b-v2 is optimized for accuracy on complex tasks, while parakeet-tdt-0.6b-v3 is designed for high-speed, low-latency tasks.

By sharing the methodology behind the Granary dataset and these two models, NVIDIA is enabling the global speech AI developer community to adapt this data processing workflow to other ASR or AST models or additional languages, accelerating speech AI innovation.

Canary-1b-v2, available under a permissive license, expands the Canary family's supported languages from four to 25. It offers transcription and translation quality comparable to models 3x larger while running inference up to 10x faster.

https://blogs.nvidia.com/wp-content/uploads/2025/08/Canary-demo.mp4

NVIDIA NeMo, a modular software suite for managing the AI agent lifecycle, accelerated speech AI model development. NeMo Curator, part of the software suite, enabled the team to filter out synthetic examples from the source data so that only high-quality samples were used for model training. The team also harnessed the NeMo Speech Data Processor toolkit for tasks like aligning transcripts with audio files and converting data into the required formats.

Parakeet-tdt-0.6b-v3 prioritizes high throughput and is capable of transcribing 24-minute audio segments in a single inference pass. The model automatically detects the input audio language and transcribes without additional prompting steps.

Both Canary and Parakeet models provide accurate punctuation, capitalization and word-level timestamps in their outputs.

Read more on GitHub and get started with Granary on Hugging Face.
LINK: https://blogs.nvidia.com/blog/speech-ai-dataset-models/...
See more stories from nvidia

North America Stories

16/08/2025

Sinclair Expands Distribution of Its Multicast Broadcast Networks

BALTIMORE Sinclair has announced that its free, over-the-air multicast networks Charge, Comet, Roar, and The Nest have concluded a series of national distributi...

16/08/2025

EditShare to Unveil Latest Ultimate EFS Nodes at IBC2025

BOSTON EditShare will unveil its latest Ultimate EFS Nodes, optimized for high-performance media workflows at any scale, during IBC2025, Sept. 12-15, at the RAI...

16/08/2025

PBS Plans 21% Budget Cuts

WASHINGTON PBS has informed public stations that it plans to cut its budget by about 21% as part of an effort to deal with the elimination of Federal funding an...

16/08/2025

Gray Media Promotes Bob Kroeger to CTO

ATLANTA Gray Media has named Bob Kroeger chief technology officer for the company, effective immediately. Bob has served as chief information officer for both G...

15/08/2025

Arlen Borrego Miranda Awarded 2025 Prodigy Scholarship by Latin Grammy Cultural Foundation

Arlen Borrego Miranda Awarded 2025 Prodigy Scholarship by Latin Grammy Cultural ...

15/08/2025

DirecTV Says Costs for ATSC 3.0 Transition Would be Onerous'

WASHINGTON The pay TV and telco industry-backed American Television Alliance told staffers at the Federal Communications Commission's Media Bureau that the ...

15/08/2025

Hisense Launches Hisense Channels Free Streaming Service

PHILADELPHIA and SUWANEE, Ga. Hisense and Xumo, the streaming joint venture between Comcast and Charter Communications, have announced the launch of Hisense Cha...

15/08/2025

MRMC Unveils Cinebot Nano

SURREY, U.K. Mark Roberts Motion Control (MRMC) has launched the Cinebot Nano, a motion control robot designed to make professional-grade camera movement more a...

15/08/2025

Save 20% or More on ALL Ivory 3 and Ivory II Upgrades through August 31st!

Upgrade and Save - Now Through August 31st! Enjoy 20% or more off all Ivory 3 and Ivory II Upgrades for a limited time. If you've been considering unlocki...

15/08/2025

Netflix Celebrates Mexican Cinema Day by Announcing the Production of More National Films

Back to All News Netflix Celebrates Mexican Cinema Day by Announcing the Produc...

15/08/2025

Now We're Talking: NVIDIA Releases Open Dataset, Models for Multilingual Speech AI

Of around 7,000 languages in the world, a tiny fraction are supported by AI lang...

14/08/2025

An Emotional East of Wall Premiere Brings Cast and Crowd To Grateful Tears

(L-R) Clay Pateneaude, Tabatha Zimiga, Porshia Zimiga, director Kate Beecroft, Leanna Shumpert, Jesse Thorson, and Jennifer Ehle attend the premiere of East o...

14/08/2025

New Zealands Sky Adopts Grass Valleys AMPP for Playout Upgrade

MONTREAL Grass Valley today announced that Sky Network Television, New Zealand's largest pay-TV provider, has chosen Grass Valley' AMPP to overhaul its ...

14/08/2025

Telemundo Launches Spanish-Language Sports FAST Channel

MIAMI Telemundo today debuts Telemundo Deportes Ahora, a 24/7 Spanish-language sports FAST channel, on Peacock, Xumo Play, the NBC News FAST hub and Telemundo.c...

14/08/2025

DAZN Turns to MediaKind For FIFA Club World Cup 2025 Global Streaming

DENVER, Colo. Sports entertainment platform DAZN relied on the MediaKind MK.IO elastic, cloud-native streaming platform to support high-quality streaming of the...

14/08/2025

DirecTV Says Costs for ATSC 3.0 Transition Would be Onerous

WASHINGTON The pay-TV and telco industry-backed American Television Alliance told staffers at the Federal Communications Commission's Media Bureau that the ...

14/08/2025

Kevin Trueblood Elected President of the Society of Broadcast Engineers

Kevin Trueblood, current vice president of the Society of Broadcast Engineers has been elected president of the national board for the association for broadcast...

14/08/2025

BES builds futureproof recording pipeline for The Star wi...

Performing arts venues have become a destination for memorable entertainment experiences. Technological advancements have helped elevate the in-venue experience...

14/08/2025

IP Showcase on the Water Announces Full Presentation Sche...

The Alliance for IP Media Solutions (AIMS), Advanced Media Workflow Association (AMWA), and the Video Services Forum (VSF) today announced the full IP Showcase ...

14/08/2025

SMPTE Announces IBC2025 Sessions and Highlights

SMPTE , the home of media professionals, technologists, and engineers, today announced its lineup of sessions and show highlights for IBC2025, Sept. 12-15, in A...

14/08/2025

Digital Alert Systems Promotes Adam Jones to Senior Accou...

Digital Alert Systems, a global leader in emergency communications solutions for media providers, today announced the promotion of Adam Jones to the position of...

14/08/2025

Clear-Com to Spotlight Broadcast Solutions and Special Ev...

Clear-Com will showcase a full lineup of intercom innovations for broadcast professionals at IBC2025, taking place September 12-15 in Amsterdam, NL. On display...

14/08/2025

VisualOn and Anyscreen Partner to Deliver Turnkey Efficie...

VisualOn, a leader in advanced video optimization technology, today announced a strategic partnership with Anyscreen, a provider of comprehensive live and VOD s...

14/08/2025

USGA Deal Keeps Golf's U.S. Open With NBCU Through 2032

STAMFORD, Conn., and LIBERTY CORNER, N.J. NBCUniversal has extended its media rights agreement with the United States Golf Association (USGA) through 2032. Its ...

14/08/2025

IBC2025's IP Showcase on the Water' Sets Its Course for Sept. 14

BOTHELL, Wash. The Alliance for IP Media Solutions (AIMS), Advanced Media Workflow Association (AMWA) and the Video Services Forum (VSF) have released the full ...

14/08/2025

FAA Considers New Rules for How Newsrooms Use Drones

WASHINGTON As the Federal Aviation Administration considers a notice of proposed rulemaking on regulations governing the use of unmanned aircraft vehicles (UAVs...

14/08/2025

Spectrum Reach, tvbeat Launch Programmatic Solution for Linear TV Ads

LONDON and NEW YORK Charter Communication's ad sales unit Spectrum Reach and tvbeat have announced that they are working together to bring tvbeat's prog...

14/08/2025

LG Adds Cerence AI to Smart TVs

BURLINGTON, Mass. LG Electronics is now using a multi-lingual text-to-speech solution from Cerence AI to enable its TVs to deliver information via spoken comman...

14/08/2025

Samsung TV Plus Adds AI-Powered Personalization

The free-ad-support streaming platform, Samsung TV Plus, has unveiled a major upgrade to its user interface and search capabilities, with a new interface, smart...

14/08/2025

Cineverse, Banyan Ventures to Launch MicroCo

LOS ANGELES Hoping to tap into the popularity of short form content, Cineverse and Banyan Ventures, the venture arm of former ABC Entertainment Group and WME Ch...

14/08/2025

Matrox Video to Unveil ORIGIN Fabric at IBC2025

MONTREAL Matrox Video will debut its ORIGIN Fabric, designed for developers to share content among media applications using the most efficient available connect...

14/08/2025

Graduate Spotlight: Sofija Zlatanova

Graduate Spotlight: Sofija Zlatanova The educator, who grew up in North Macedonia, shares how her Berklee journey went from viola performance to music educati...

14/08/2025

Jnger Audio and Fraunhofer IIS Demonstrate IP Production Chain at SET Expo 2025 and IBC 2025

J nger Audio and Fraunhofer IIS Demonstrate IP Production Chain at SET Expo 202...

14/08/2025

German Innovation: An In-Depth Look at the DFL's 10K VR Stream of the Supercup

German Innovation: An In-Depth Look at the DFL's 10K VR Stream of the Superc...

14/08/2025

Esports World Cup 2025: Music Is in the Sport's DNA

Esports World Cup 2025: Music Is in the Sport's DNA Entertainment is intrinsic with esports, blending its passion and diversity By Dan Daley, Audio Editor ...

14/08/2025

Stress Test: SailGP Embraces Real-time Cloud Workflows For Portsmouth Event

Stress test: SailGP embraces real-time cloud workflows for Portsmouth event By Joe OHalloran Tuesday, August 12, 2025 - 10:15 Print This Story Emirates Gr...

14/08/2025

Esports World Cup 2025: Inside EFG's Hybrid Production

Esports World Cup 2025: Inside EFG's Hybrid Production For the complex operation, the emphasis was on scale, integration, localization' By Mark J Burn...

14/08/2025

Esports World Cup 2025: IMG Puts a Spotlight on Innovation as Esports Goes Mainstream

Esports World Cup: IMG puts a Spotlight on innovation as esports goes mainstream...

14/08/2025

Netflix More Than Doubles US Upfront Commitment and Secures Global Clients For Upcoming Titles

Back to All News Netflix More Than Doubles US Upfront Commitment and Secures Gl...

14/08/2025

Sky Network Television Transforms Playout Operations with Grass Valley AMPP

Deploying Scalable, Hybrid Cloud Solutions to Deliver High-Quality UHD Content and Accelerate Channel Growth MONTREAL, CANADA - August 14, 2025 - Grass Valley,...

14/08/2025

NVIDIA, National Science Foundation Support Ai2 Development of Open AI Models to Drive U.S. Scientific Leadership

NVIDIA is partnering with the U.S. National Science Foundation (NSF) to create a...

14/08/2025

Warhammer 40,000: Dawn of War - Definitive Edition' Storms GeForce NOW at Launch

Warhammer 40,000: Dawn of War - Definitive Edition is marching onto GeForce NOW,...

13/08/2025

US Space Force Successfully Launches L3Harris-Built NTS-3 Satellite

The L3Harris-built Navigation Technology Satellite-3 (NTS-3) satellite launched on a United Launch Alliance Vulcan rocket Aug. 12 from Cape Canaveral Space Forc...

13/08/2025

Nielsen launches Ad Intel CTV in Australia

Sydney - August 13, 2025 - Nielsen, a global leader in audience measurement, data, and analytics, today announced the upcoming launch of Connected TV (CTV) insi...

13/08/2025

Digital Alert Systems Elevates Adam Jones to Senior Account Manager

LYNDONVILLE, N.Y. Emergency communications solutions provider Digtal Alert Systems said it has promoted Adam Jones to senior account manager....

13/08/2025

Billy Robbins Takes On VP of Station Sales Operations Role at Sinclair

BALTIMORE Sinclair has named Billy Robbins vice president of station sales operations, a newly created role for the broadcast group....

13/08/2025

NBCU Extends USGA Rights Deal Through 2032

STAMFORD, Conn., and LIBERTY CORNER, N.J. NBCUniversal has extended its media rights agreement with the United States Golf Association (USGA) through 2032. Its ...

13/08/2025

BIA Revises U.S. Local Ad Outlook Downward Amid Increased Pressure'

CHANTILLY, Va. BIA Advisory Services' has revised its 2025 U.S. Local Advertising Forecast down to $169 billion this year, reflecting a 2.4% decline compare...

13/08/2025

SMPTE Announces IBC2025 Sessions and Highlights

WHITE PLAINS, N.Y. SMPTE has announced its lineup of sessions and show highlights for IBC2025, Sept. 12-15, in Amsterdam. Show visitors will find SMPTE and oth...