
Linguists estimate that at least half of the world's estimated 7,000 spoken languages will become extinct by the century's end, due to forces ranging from globalization to cultural assimilation.
Part of the challenge of documenting and revitalizing endangered languages is a lack of texts and speech recordings to work with. Seneca, a language of one of the six Iroquois Nations in North America, has only about 100 first-language speakers and several hundred more second-language learners.
Automatic speech recognition (ASR) technology is widely used to transcribe languages with millions or billions of speakers, like English and Mandarin. But it has only scratched the surface with languages like Seneca, which have vastly fewer speakers and significantly less data to work with.
Now a team of researchers at the Rochester Institute of Technology in New York, along with colleagues from the University at Buffalo, is tapping deep learning to bolster the ability of ASR. And while its focus is on Seneca, the project's vision encompasses the preservation of languages globally as well as an important part of our shared cultural history.
Knowing about different languages teaches us a lot about how our brain works, said Emily Prud'hommeaux, an assistant professor of computer science at Boston College and a research faculty member at RIT. When you document a language, you're preserving information not only about that language but also about how humans use language in general.
It's no coincidence that Prud'hommeaux and her team started with the Seneca language. Three members of the Seneca nation are part of the effort - a direct connection that is rare in research of this type, she said.
Leading the charge is Robbie Jimerson, a Ph.D. student in RIT's Golisano College of Computing and Information Science. He is a member of the Seneca Nation of Indians and is passionate about ensuring the survival of the Seneca language.
There's a big effort by the leaders of the tribe to preserve and promote our language, said Jimerson. I was looking for an opportunity to contribute.
Using GANs to Create More Language Samples Now in its third year, the project has had challenges when it comes to accumulating language data. Jimerson said the Seneca community can be guarded about what it shares with other people, so there wasn't an abundance of recordings of the language being spoken. He set out to change that.
He started by recording friends and elders who speak the language and asking them to record their friends. He found out whenever someone was speaking Seneca in public. He asked for family recordings of elders telling stories handed down from previous generations. And he grabbed any publicly available videos or recordings he could find online.
The team has fine-tuned an ASR model for Seneca, running it through generative adversarial networks to create more samples out of the limited number of recordings. The model turns wave files of the spoken language into streams of characters, while computing probability and making corrections.
The resulting data is fed into a deep learning model that in turn expands upon the ASR model's accuracy.
The team's networks run in two compute settings: on a nine-server machine learning lab running a variety of NVIDIA Tesla GPUs, and on a university cluster of large servers, each running 10 NVIDIA Tesla P4 GPUs. Each cluster runs a range of deep learning frameworks such as TensorFlow and Caffe.
The computer engineering cluster is for all students in the computer engineering department, and so they have to compete' for these resources, said Ray Ptucha, assistant professor of computer engineering at RIT, another collaborator on this project.
With access to these clusters at a premium, Jimerson tests code and checks the stability of models on a local machine running an NVIDIA TITAN X rather than inconvenience other students by running a model that might crash.
Achieving Better Accuracy So far, the team's efforts have brought the word error rate of its ASR model from 70 percent down to 56 percent. The goal, said Prud'hommeaux, is to get that rate down to 25 percent, which is where ASR systems were in processing English several years ago.
The more samples of spoken and written Seneca the team can accumulate, the more the error rate will decrease. (Today, English ASR models can achieve word error rates as low as 5 percent.)
The team's work is expected to help with language preservation efforts around the world.
Prud'hommeaux said the team has an agreement with an archiving institution that's a condition of a grant the project received from the National Science Foundation. The resulting language archiving database will be made available as a resource for other efforts seeking to document threatened languages.
Additionally, Prud'hommeaux said the team's work could prove helpful for any deep learning effort that has to make do with limited amounts of data.
Read more about the team's work in their research papers here and here.
Feature image: The Haudenosaunee (Iroquois Confederacy) flag, via Wikimedia Commons.
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
08/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
08/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
08/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
08/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
08/05/2026
Liberty University on why broadcast technology isn't just a technical invest...
08/05/2026
COW Jobs: UGC, On-Camera Video Content
Brie Clayton May 7, 2026
0 Comments
UGC, On-Camera Video Content
April 17, 2026Documentary Editor - US,......
07/05/2026
Journalists reporting on Sudan are working in one of the most complex and fast-m...
07/05/2026
Multi-year partnership positions Victory+ as a free home for Dallas Cowboys orig...
07/05/2026
From sideline reporting to directing and producing, the talented sophomore is building a well-rounded foundation for a career in live sports production
In the ...
07/05/2026
Cobalt Digital has received two Future Best of Show awards at NAB Show 2026. The COBALT blueCORE platform was recognized by TV Tech, and the COBALT PACIFIC ULL-...
07/05/2026
ESPN analyst Mina Kimes will host the televised semifinals and finals of the 202...
07/05/2026
Bitmovin has announced that MUBI, the global film streaming platform, has selected Bitmovin as its cloud VOD encoding partner. Bitmovin's encoding infrastru...
07/05/2026
NBCUniversal Telemundo Enterprises and the U.S. Soccer Federation have announced...
07/05/2026
Angel City Football Club (ACFC) and Victory have announced a regional broadcast partnership bringing live match coverage to fans across the greater Los Angeles...
07/05/2026
Spiideo has announced the launch of AI Highlights inside Spiideo Play, its automated sports production platform. AI Highlights combines video, event data, audio...
07/05/2026
Leostream Corporation has announced a unified remote access ecosystem for high-p...
07/05/2026
The Atlanta Dream has partnered with Victory to stream all locally broadcast Dream games for free, expanding the team's digital distribution strategy and g...
07/05/2026
Riedel Communications has announced the appointment of Marc Engroff as Chief Fin...
07/05/2026
SES and ARD, Germany's largest public broadcasting network, have announced a long-term extension of their satellite distribution partnership through 2039. U...
07/05/2026
UpLight Technologies delivers a flexible video and lighting system for a new televised sportLaunching the Pro Cheer League required more than creating a compell...
07/05/2026
Full Day Productions and GSE Worldwide have announced Spikes Under the Lights, a...
07/05/2026
Seit dem Start im Jahr 2023 hat DJ (Beta) das personalisierte H rerlebnis von 94...
07/05/2026
Da quando stata presentata nel 2023, DJ (beta) ha aiutato a definire un'esperienza d'ascolto pi personalizzata per 94 milioni di utenti Spotify Premi...
07/05/2026
Since launching in 2023, DJ (beta) has helped shape a more personalized listenin...
07/05/2026
Desde o lan amento em 2023, o DJ (beta) j ajudou a deixar a experi ncia de ouvi...
07/05/2026
From our earliest days, Spotify has been built on a simple principle: Great audio should be easy to reach. It's what's driven us to expand from music to...
07/05/2026
Asian and Pacific Islander artists continue to shape the global soundscape, pushing creative boundaries and connecting with fans worldwide. This Asian & Pacific...
07/05/2026
Company announce long-requested choir instrument
In the latest expansion of their Symphonic Elements series, UJAM have introduced an all-new vocal instrumen...
07/05/2026
Modular-inspired sound generation with digital control
Buchla have introduced Ziggy, a new compact synthesizer built around the company's complex oscill...
07/05/2026
Combines analogue voices with digital synthesis and sequencing
Polyend have announced Drums, a new hybrid analogue and digital drum machine that combines sy...
07/05/2026
Control surface now officially supports four DAWs
When Nektar launched the Panorama CS12 control surface, it worked exclusively with Apple's Logic Pro, ...
07/05/2026
In the first quarter 2026, SGL Carbon generated consolidated sales of 184.5 million, which was 49.8 million, or 21.3%, lower than in the prior year (Q1 2025: ...
07/05/2026
Rohde & Schwarz and Greenerwave achieve precise and fast ESA antenna characteriz...
07/05/2026
Two multi-role L3Harris products - the Red Wolf launched effects vehicle and SK...
07/05/2026
L3Harris will be developing key features of a secure and resilient digital infra...
07/05/2026
Alysa Liu and Shohei Ohtani Help Drive Viewership as 91 of the Top 100 Broadcast...
07/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
07/05/2026
Cobalt Digital Wins Two Future Best of Show Awards at 2026 NAB Show
Manufacturer Recognized by TV Tech and TVBEurope for Innovation in signal processing
Cobal...
07/05/2026
Software and hardware platforms, AI power and user-friendliness on show...
07/05/2026
Intinor will demonstrate its latest technical enhancements for the Direkt series at BroadcastAsia 2026. With a continued focus on reliable contribution and remo...
07/05/2026
Bitmovin has announced that MUBI has chosen Bitmovin as its cloud VOD encoding partner, replacing MUBI's legacy on premises encoding setup to improve scalab...
07/05/2026
Meet Graduates from Berklees Class of 2026 Members of this years graduating class reflect on their proudest moments at Berklee and look ahead to whats next.
...
07/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
07/05/2026
At BroadcastAsia 2026, Interra Systems will demonstrate its latest innovations in automated quality control (QC), real-time monitoring, and captioning. The comp...
07/05/2026
At this year's Broadcast Asia, PlayBox Neo is set to unveil recent innovations across its PlayBox Neo Suite and integrated range of broadcast media solution...