
More than 75 million people speak Telugu, predominantly in India's southern regions, making it one of the most widely spoken languages in the country.
Despite such prevalence, Telugu is considered a low-resource language when it comes to speech AI. This means there aren't enough hours' worth of speech datasets to easily and accurately create AI models for automatic speech recognition (ASR) in Telugu.
And that means billions of people are left out of using ASR to improve transcription, translation and additional speech AI applications in Telugu and other low-resource languages.
To build an ASR model for Telugu, the NVIDIA speech AI team turned to the NVIDIA NeMo framework for developing and training state-of-the-art conversational AI models. The model won first place in a competition conducted in October by IIIT-Hyderabad, one of India's most prestigious institutes for research and higher education.
NVIDIA placed first in accuracy for both tracks of the Telugu ASR Challenge, which was held in collaboration with the Technology Development for Indian Languages program and India's Ministry of Electronics and Information Technology as a part of its National Language Translation Mission.
For the closed track, participants had to use around 2,000 hours of a Telugu-only training dataset provided by the competition organizers. And for the open track, participants could use any datasets and pretrained AI models to build the Telugu ASR model.
NVIDIA NeMo-powered models topped the leaderboards with a word error rate of approximately 13% and 12% for the closed and open tracks, respectively, outperforming by a large margin all models built on popular ASR frameworks like ESPnet, Kaldi, SpeechBrain and others.
What sets NVIDIA NeMo apart is that we open source all of the models we have - so people can easily fine-tune the models and do transfer learning on them for their use cases, said Nithin Koluguri, a senior research scientist on the conversational AI team at NVIDIA. NeMo is also one of the only toolkits that supports scaling training to multi-GPU systems and multi-node clusters.
Building the Telugu ASR Model The first step in creating the award-winning model, Koluguri said, was to preprocess the data.
Koluguri and his colleague Megh Makwana, an applied deep learning solution architect manager at NVIDIA, removed invalid letters and punctuation marks from the speech dataset that was provided for the closed track of the competition.
Our biggest challenge was dealing with the noisy data, Koluguri said. This is when the audio and the transcript don't match - in this case you cannot guarantee the accuracy of the ground-truth transcript you're training on.
The team cleaned up the audio clips by cutting them to be less than 20 seconds, chopped out clips of less than 1 second and removed sentences with a greater-than-30 character rate, which measures characters spoken per second.
Makwana then used NeMo to train the ASR model for 160 epochs, or full cycles through the dataset, which had 120 million parameters.
For the competition's open track, the team used models pretrained with 36,000 hours of data on all 40 languages spoken in India. Fine-tuning this model for the Telugu language took around three days using an NVIDIA DGX system, according to Makwana.
Inference test results were then shared with the competition organizers. NVIDIA won with around 2% better word error rates than the second-place participant. This is a huge margin for speech AI, according to Koluguri.
The impact of ASR model development is very high, especially for low-resource languages, he added. If a company comes forward and sets a baseline model, as we did for this competition, people can build on top of it with the NeMo toolkit to make transcription, translation and other ASR applications more accessible for languages where speech AI is not yet prevalent.
NVIDIA Expands Speech AI for Low-Resource Languages ASR is gaining a lot of momentum in India majorly because it will allow digital platforms to onboard and engage with billions of citizens through voice-assistance services, Makwana said.
And the process for building the Telugu model, as outlined above, is a technique that can be replicated for any language.
Of around 7,000 world languages, 90% are considered to be low resource for speech AI - representing 3 billion speakers. This doesn't include dialects, pidgins and accents.
Open sourcing all of its models on the NeMo toolkit is one way NVIDIA is improving linguistic inclusion in the field of speech AI.
In addition, pretrained models for speech AI, as part of the NVIDIA Riva software development kit, are now available in 10 languages - with many additions planned for the future.
And NVIDIA last month hosted its inaugural Speech AI Summit, featuring speakers from Google, Meta, Mozilla Common Voice and more. Learn more about Unlocking Speech AI Technology for Global Language Users by watching the presentation on demand.
Get started building and training state-of-the-art conversational AI models with NVIDIA NeMo.
Most recent headlines
06/10/2025
France T l visions, France's leading broadcaster, has received the 2025 EBU ...
04/09/2025
Monumental Sports & Entertainment (MSE), in collaboration with Dalet, has been a...
07/08/2025
July 8 2025, 22:30 (PDT) Tata Motors & Dolby Bring Dolby Atmos to Harrier.ev, R...
12/07/2025
As the death toll continues to mount, with at least 120 killed and more than 170 people still missing on July 10 from devastating Texas floods, a number of broa...
12/07/2025
EL SEGUNDO, Calif., and MIAMI -DirecTV and TelevisaUnivision have signed a deal that will make the ad-supported premium subscription tier of ViX, ViX Premium wi...
11/07/2025
PARK CITY, UTAH, July 11, 2025 - The nonprofit Sundance Institute announced today the 11 producers chosen for its annual Producers Labs, returning to Ucross Fou...
11/07/2025
If you've ever wondered what might be playing in Clark Kent's headphones...
11/07/2025
L3Harris Technologies President of Intelligence, Surveillance and Reconnaissance Jason Lambert and General Manager of L3Harris Waco facility Sean Ling held a ce...
11/07/2025
ARLINGTON, Va. WETA, the flagship public media station in the national capital area, has launched WETA+, a new streaming service tailored for the local Washingt...
11/07/2025
The Federal Communications Commission has emerged as one of the central players in the broadcast TV landscape in 2025, with its deregulatory policies sparking h...
11/07/2025
Calrec will introduce usability, customization and system enhancements across its entire range of Argo consoles during IBC2025, Sept. 12-15, at the RAI Amsterda...
11/07/2025
LONDON Encompass Digital Media said it will support live and on-demand viewing of the 2025 FIFA Club World Cup across multiple global regions for sports enterta...
11/07/2025
Two-thirds of broadcast engineers reaped the benefits of a pay raise within the last year....
11/07/2025
CARY, N.C. SmallHD has launched the Quantum 27, a new 26.5-inch Quantum-Dot OLED monitor designed to deliver postproduction image quality in a compact, set-frie...
11/07/2025
The Federal Communications Commission's Enforcement Bureau and Tegna have entered into a consent decree that will settle an investigation into the accidenta...
11/07/2025
WASHINGTON Following news in early July that Paramount had settled President Donald Trump's lawsuit, Sens. Edward J. Markey (D-Mass.) and Ben Ray Luj n (D-N...
11/07/2025
Model/Actriz Performs Lead Single Cinderella on The Late Show with Stephen Colbe...
11/07/2025
Behind the Mic: Amazon Prime Preps for First Season of NBA Action; MSG Networks ...
11/07/2025
SVG New Sponsor Spotlight: Suite Studios' Craig Hering on Adapting to Client...
11/07/2025
2025 SVG Content Management Forum Breaks Down AI's Impact, Continued Transit...
11/07/2025
A Journey HOME: University of Nebraska's HuskerVision Goes IP Leaders from the HuskerVision and Lawo share their IP learnings By SVG Staff
Friday, July 1...
11/07/2025
CMSI, Remote Picture Labs, Ace ESPN's Cloud-Based Editing Efforts for Wimble...
11/07/2025
Netflix Enters the Live-Boxing-Production Ring for Round 2 With Historic Taylor-...
11/07/2025
Back to All News
Too Hot to Handle: Italy Is Coming on July 18 Only on Netflix
Entertainment
11 July 2025
GlobalItaly
Link copied to clipboard
July 11, 20...
11/07/2025
Back to All News
Netflix Will Release Death Inc. Seasons 1, 2 and 3
Entertainment
11 July 2025
GlobalSpain
Link copied to clipboard
Season 1
Season 2
Se...
11/07/2025
AI and Multimedia Authenticity Standards Collaboration launches two papers to guide the future of AI integration, today at the AI for Good Global Summit
The...
11/07/2025
Ceramics - the humble mix of earth, fire and artistry - have been part of a global conversation for millennia.
From Tang Dynasty trade routes to Renaissance pa...
10/07/2025
The current holder of the prestigious Thomson Foundation Young Journalist of the Year Award has been forced to stop reporting over fears for her safety in Afgha...
10/07/2025
Spotify is turning up the volume on Australian music with a multipronged initiative designed to highlight the dominance of Australian artists on the global stag...
10/07/2025
This is not a drill: Oasis is back on the road-marking its first live performanc...
10/07/2025
The music industry depends on fresh ideas, bold voices, and emerging talent. Yet across the U.K., too many young musicians lack the space to develop their craft...
10/07/2025
NEW YORK - July 10, 2025 - Nielsen, the global leader in audience measurement, data and analytics, today announced that it appointed Richard Pacheco as head of ...
10/07/2025
Local newscasts don't exist in a vacuum. News directors and station management constantly evaluate what's working, what isn't and perhaps most impor...
10/07/2025
Lawo has announced that Stuttgart Media University (Hochschule der Medien, HdM) has comprehensively modernized its central recording studio after selecting an I...
10/07/2025
The Society of Motion Picture and Television Engineers (SMPTE) has opened early-bird registration for the Media Technology Summit, which will take place in a ne...
10/07/2025
NASHVILLE, Tenn. TNDV Television has launched Aspiration 35, a new version of its 40-foot Aspiration truck reimagined for cinematic multicamera productions....
10/07/2025
BURBANK, Calif. Key Code Education, a provider of instructor-led postproduction training, is growing its curriculum with new programs for beginner and intermedi...
10/07/2025
HACKENSACK, N.J. Actus Digital will demonstrate how broadcasters can transform compliance monitoring from a necessary expense into a strategic revenue driver at...
10/07/2025
The Federal Register has published a summary of the Federal Communications Commission's Public Notice seeking comments on its ownership rules that lists a d...
10/07/2025
Back to All News
Netflix Presents the Official Trailer for SuperestarPlay Video
Play Video
Entertainment
10 July 2025
GlobalSpain
Link copied to clipboard...
10/07/2025
In the race to understand our planet's changing climate, speed and accuracy are everything. But today's most widely used climate simulators often strugg...
10/07/2025
As one of the world's largest emerging markets, Indonesia is making strides toward its Golden 2045 Vision - an initiative tapping digital technologies and...
10/07/2025
10 Jul 2025
VEON and Cohen Circle Secure Investor Commitments for Kyivstar Listing Kyiv, New York, Dubai, and Philadelphia - July 10, 2025 - VEON Ltd. (Nasdaq:...
10/07/2025
5G for all? What the DFL's use of Easy5G and RefCam could mean for events in...
10/07/2025
Save the Date: PGA TOUR Studios Welcomes SVG Remote Production Summit on Oct 14-...
10/07/2025
Cloud on the Road: How Remote-Production-Service Providers Are Adapting to a New...
10/07/2025
Seattle Kraken's Ryan Schaber on the NHL Team Taking Live Game Productions I...
10/07/2025
FOX Sports Reboots Small Control Room in Los Angeles as Hub for Vertical-First P...
10/07/2025
SVG Sit-Down: MSE's Zach Leonsis, ViewLift's Rick Allen Go Deep on Joint...