
Of around 7,000 languages in the world, a tiny fraction are supported by AI language models. NVIDIA is tackling the problem with a new dataset and models that support the development of high-quality speech recognition and translation AI for 25 European languages - including languages with limited available data like Croatian, Estonian and Maltese.
These tools will enable developers to more easily scale AI applications to support global users with fast, accurate speech technology for production-scale use cases such as multilingual chatbots, customer service voice agents and near-real-time translation services. They include:
Granary, a massive, open-source corpus of multilingual speech datasets that contains around a million hours of audio, including nearly 650,000 hours for speech recognition and over 350,000 hours for speech translation.
NVIDIA Canary-1b-v2, a billion-parameter model trained on Granary for high-quality transcription of European languages, plus translation between English and two dozen supported languages. It tops Hugging Face's leaderboard of open models for multilingual speech recognition accuracy.
NVIDIA Parakeet-tdt-0.6b-v3, a streamlined, 600-million-parameter model designed for real-time or large-volume transcription of Granary's supported languages. It has the highest throughput of multilingual models on the Hugging Face leaderboard, measured as duration of audio transcribed divided by computation time.
The paper behind Granary will be presented at Interspeech, a language processing conference taking place in the Netherlands, Aug. 17-21. The dataset, as well as the new Canary and Parakeet models, are now available on Hugging Face.
How Granary Addresses Data Scarcity To develop the Granary dataset, the NVIDIA speech AI team collaborated with researchers from Carnegie Mellon University and Fondazione Bruno Kessler. The team passed unlabeled audio through an innovative processing pipeline powered by NVIDIA NeMo Speech Data Processor toolkit that turned it into structured, high-quality data.
This pipeline allowed the researchers to enhance public speech data into a usable format for AI training, without the need for resource-intensive human annotation. It's available in open source on GitHub.
With Granary's clean, ready-to-use data, developers can get a head start building models that tackle transcription and translation tasks in nearly all of the European Union's 24 official languages, plus Russian and Ukrainian.
For European languages underrepresented in human-annotated datasets, Granary provides a critical resource to develop more inclusive speech technologies that better reflect the linguistic diversity of the continent - all while using less training data.
The team demonstrated in their Interspeech paper that, compared to other popular datasets, it takes around half as much Granary training data to achieve a target accuracy level for automatic speech recognition (ASR) and automatic speech translation (AST).
Tapping NVIDIA NeMo to Turbocharge Transcription The new Canary and Parakeet models offer examples of the kinds of models developers can build with Granary, customized to their target applications. Canary-1b-v2 is optimized for accuracy on complex tasks, while parakeet-tdt-0.6b-v3 is designed for high-speed, low-latency tasks.
By sharing the methodology behind the Granary dataset and these two models, NVIDIA is enabling the global speech AI developer community to adapt this data processing workflow to other ASR or AST models or additional languages, accelerating speech AI innovation.
Canary-1b-v2, available under a permissive license, expands the Canary family's supported languages from four to 25. It offers transcription and translation quality comparable to models 3x larger while running inference up to 10x faster.
https://blogs.nvidia.com/wp-content/uploads/2025/08/Canary-demo.mp4
NVIDIA NeMo, a modular software suite for managing the AI agent lifecycle, accelerated speech AI model development. NeMo Curator, part of the software suite, enabled the team to filter out synthetic examples from the source data so that only high-quality samples were used for model training. The team also harnessed the NeMo Speech Data Processor toolkit for tasks like aligning transcripts with audio files and converting data into the required formats.
Parakeet-tdt-0.6b-v3 prioritizes high throughput and is capable of transcribing 24-minute audio segments in a single inference pass. The model automatically detects the input audio language and transcribes without additional prompting steps.
Both Canary and Parakeet models provide accurate punctuation, capitalization and word-level timestamps in their outputs.
Read more on GitHub and get started with Granary on Hugging Face.
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
02/05/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
01/05/2026
January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...
01/04/2026
January 4 2026, 18:00 (PST) DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION
Douyin Users Can Now Create And Share Videos With Stun...
28/02/2026
With two features seen in Formula 1 coverage, the broadcaster aims to bring view...
28/02/2026
Secretary of War Pete Hegseth addresses a crowd of approximately 1,500 L3Harris employees in Camden, Arkansas, as part of his Arsenal of Freedom tour....
28/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
28/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
28/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
28/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
28/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
28/02/2026
Berklee Presents Mambo Mania: Eguie Castrillo and the Berklee All-Stars Big Band...
28/02/2026
Berklee Announces Two New Summer Programs in Los Angeles The Berklee Music Business Program and Electronic Music Production and Sound Design Workshop bring imme...
28/02/2026
AI-RAN is moving from lab to field, showing that a software-defined approach is ...
28/02/2026
Autonomous networks - intelligent, self-managing telecommunications operations -...
28/02/2026
Back to All News
Final Trailer for BEASTARS Final Season Part 2' Roars Tow...
28/02/2026
New way to intentionally discover molecular glues could expand drug discovery Scripps Research scientists and colleagues show how drugs that eliminate certain d...
27/02/2026
The E.W. Scripps Company names Oliver Gray as Vice President, Network Sports and...
27/02/2026
The Gotham Sports App, the exclusive direct-to-consumer streaming home of MSG Networks and the YES Network, is now available for purchase through Prime Video fo...
27/02/2026
ESPN and the Horizon League announce a new multi-year, multi-platform media rights agreement, continuing a 38-year collaboration that began with the 1988 Midwes...
27/02/2026
At the 2026 NAB Show in Las Vegas, NETGEAR will highlight its new switch models and major updates to its Engage Controller software. The company's network d...
27/02/2026
Riedel Communications announces that Fondazione Teatro alla Scala has deployed a...
27/02/2026
Lyuno specializes in media localization, including translation, dubbing, subtitling, and voice-over services for a wide array of entertainment content. The comp...
27/02/2026
Chyron Weather 2.3, the latest edition of Chyron's weather visualization suite for broadcasters and meteorologists, recently launched.
The release includes...
27/02/2026
Telestream, which concentrates in media workflow technologies, announces expanded practical AI enhancements across its Vantage, Vantage Cloud, EDC, Stanza, and ...
27/02/2026
Horizon Sports & Experiences (HS&E), a global sports marketing, media, and live ...
27/02/2026
Legendary sports broadcasters Bob Costas, Doug Collins, Mike Czar of the Telest...
27/02/2026
Beginning on March 1st, IndyCar will be kicking off their 31st season on the str...
27/02/2026
In-venue and creative video staffers at the professional and collegiate level ha...
27/02/2026
Ratings Roundup is a rundown of recent rating news and is derived from press rel...
27/02/2026
Owl AI a pioneer in artificial intelligence for professional sports, announces a...
27/02/2026
With over 447 million fans in APAC, Formula 1 and beIN will continue to innovate...
27/02/2026
12-year-old Noelle Taylor will be the Kid Reporter when the Brooklyn Nets host t...
27/02/2026
Entire CapCam system - including camera unit, RF transmitter, and battery - is h...
27/02/2026
Since its inception, Gorillaz has been known for blending art with genre-bending...
27/02/2026
This week, Spotify introduced Audiobook Charts for the U.S. and U.K. The charts make it easy to discover your next favorite book by showing what's popular a...
27/02/2026
Rohde & Schwarz and Viasat to collaborate on NB-NTN IoT test plan for connectivi...
27/02/2026
In media technology, big features often steal the spotlight - AI integrations, cloud transformations, automation frameworks. But for the people who use these to...
27/02/2026
Digital Asset Management systems sit at the heart of most marcoms operations. They centralise content, organise it, and make it discoverable. Integrated with th...
27/02/2026
The AI Wild West comes to NAB 2026 and Blue Lucy is bringing the Sheriff
The AI Wild West is here, and media organisations are feeling the heat. On Booth W23...
27/02/2026
NEW YORK - February 26, 2026 - An estimated 32.6 million people watched President Donald J. Trump deliver the 2026 State of the Union address on Tuesday, Februa...
27/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
27/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
27/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
27/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
27/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
27/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
27/02/2026
Video is one of the lawyer's most powerful storytelling tools in civil litigation today, whether used to transport jurors to an incident scene or challenge ...
27/02/2026
Creative software developer Foundry today released Nuke 17.0, the latest version of its powerful compositing tool for visual effects and animation. Marking one ...