Sony Pixel Power calrec Sony

How Deep Learning Is Aiding Preservation of Seneca and Other Endangered Languages

03/01/2019

Linguists estimate that at least half of the world's estimated 7,000 spoken languages will become extinct by the century's end, due to forces ranging from globalization to cultural assimilation.

Part of the challenge of documenting and revitalizing endangered languages is a lack of texts and speech recordings to work with. Seneca, a language of one of the six Iroquois Nations in North America, has only about 100 first-language speakers and several hundred more second-language learners.

Automatic speech recognition (ASR) technology is widely used to transcribe languages with millions or billions of speakers, like English and Mandarin. But it has only scratched the surface with languages like Seneca, which have vastly fewer speakers and significantly less data to work with.

Now a team of researchers at the Rochester Institute of Technology in New York, along with colleagues from the University at Buffalo, is tapping deep learning to bolster the ability of ASR. And while its focus is on Seneca, the project's vision encompasses the preservation of languages globally as well as an important part of our shared cultural history.

Knowing about different languages teaches us a lot about how our brain works, said Emily Prud'hommeaux, an assistant professor of computer science at Boston College and a research faculty member at RIT. When you document a language, you're preserving information not only about that language but also about how humans use language in general.

It's no coincidence that Prud'hommeaux and her team started with the Seneca language. Three members of the Seneca nation are part of the effort - a direct connection that is rare in research of this type, she said.

Leading the charge is Robbie Jimerson, a Ph.D. student in RIT's Golisano College of Computing and Information Science. He is a member of the Seneca Nation of Indians and is passionate about ensuring the survival of the Seneca language.

There's a big effort by the leaders of the tribe to preserve and promote our language, said Jimerson. I was looking for an opportunity to contribute.

Using GANs to Create More Language Samples Now in its third year, the project has had challenges when it comes to accumulating language data. Jimerson said the Seneca community can be guarded about what it shares with other people, so there wasn't an abundance of recordings of the language being spoken. He set out to change that.

He started by recording friends and elders who speak the language and asking them to record their friends. He found out whenever someone was speaking Seneca in public. He asked for family recordings of elders telling stories handed down from previous generations. And he grabbed any publicly available videos or recordings he could find online.

The team has fine-tuned an ASR model for Seneca, running it through generative adversarial networks to create more samples out of the limited number of recordings. The model turns wave files of the spoken language into streams of characters, while computing probability and making corrections.

The resulting data is fed into a deep learning model that in turn expands upon the ASR model's accuracy.

The team's networks run in two compute settings: on a nine-server machine learning lab running a variety of NVIDIA Tesla GPUs, and on a university cluster of large servers, each running 10 NVIDIA Tesla P4 GPUs. Each cluster runs a range of deep learning frameworks such as TensorFlow and Caffe.

The computer engineering cluster is for all students in the computer engineering department, and so they have to compete' for these resources, said Ray Ptucha, assistant professor of computer engineering at RIT, another collaborator on this project.

With access to these clusters at a premium, Jimerson tests code and checks the stability of models on a local machine running an NVIDIA TITAN X rather than inconvenience other students by running a model that might crash.

Achieving Better Accuracy So far, the team's efforts have brought the word error rate of its ASR model from 70 percent down to 56 percent. The goal, said Prud'hommeaux, is to get that rate down to 25 percent, which is where ASR systems were in processing English several years ago.

The more samples of spoken and written Seneca the team can accumulate, the more the error rate will decrease. (Today, English ASR models can achieve word error rates as low as 5 percent.)

The team's work is expected to help with language preservation efforts around the world.

Prud'hommeaux said the team has an agreement with an archiving institution that's a condition of a grant the project received from the National Science Foundation. The resulting language archiving database will be made available as a resource for other efforts seeking to document threatened languages.

Additionally, Prud'hommeaux said the team's work could prove helpful for any deep learning effort that has to make do with limited amounts of data.

Read more about the team's work in their research papers here and here.

Feature image: The Haudenosaunee (Iroquois Confederacy) flag, via Wikimedia Commons.
LINK: https://blogs.nvidia.com/blog/2019/01/02/deep-learning-preserves-senec...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

02/05/2026

Dalet Flex LTS Delivers Smarter Search, Faster Editing, and an AI-Ready Foundation for Modern Media

Dalet, a leading technology and service provider for media-rich organizations, t...

01/05/2026

NBCUniversal's Peacock to Be First Streamer to Integrate Dolby's Full Suite of Premium Picture and Sound Innovations

January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...

10/04/2026

Portland Fire+ Streaming Platform Launches

Share Copy link Facebook X Linkedin Bluesky Email...

10/04/2026

Tod Musgrave Joins Proton as U.S. Sales & Marketing Director

Share Copy link Facebook X Linkedin Bluesky Email...

10/04/2026

Proton Expands Minicam Portfolio With Proton Pro At 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

10/04/2026

FCC To Vote on Changes to Audible Crawl Rule

Share Copy link Facebook X Linkedin Bluesky Email...

10/04/2026

Frequency Launches AI Platform for Streaming Television a...

Frequency, the engine behind the worlds leading streaming television channels, today launched its AI platform for Frequency Studio, powering the entire channel ...

09/04/2026

Yospace surpasses 10 billion ads stitched in a single month, as ad-supported streaming surges

Staines-upon-Thames, UK, 09, April, 2026 - Yospace, the trusted leader in Dynam...

09/04/2026

just:play pro 2026 and just:live pro 2026 Sneak Preview News for NAB 2026

just:play pro 2026 and just:live pro 2026 Sneak Preview News for NAB 2026 More Details:At NAB 2026, ToolsOnAir will showcase just:play pro 2026 and just:live p...

09/04/2026

just:in mac pro 2026 - The Next Level of Professional Recording on macOS at NAB 2026

just:in mac pro 2026 - The Next Level of Professional Recording on macOS at NAB ...

09/04/2026

NAB 2026: Zixi to Demonstrate Live Video Workflows and Satellite Replacement

Zixi will demonstrate IP-based live video workflow solutions at NAB Show 2026 (Booth W2057). The industry is moving quickly toward IP-based distribution as br...

09/04/2026

Deloitte Research: Women's Elite Sports Revenues Expected to Reach at Least $3 Billion in 2026

Global women's elite sports revenues are expected to reach at least $3 billi...

09/04/2026

Monitor Engineer Gavin Tempany Mixes Kylie Minogue's Tension Tour on Solid State Logic L550 Plus

Monitor engineer Gavin Tempany mixed Kylie Minogue s Tension Tour on a Solid Sta...

09/04/2026

NAB 2026: KOKUSAI DENKI Electric America to Debut New 4K Camera and Remote Control Panel

KOKUSAI DENKI Electric America will exhibit at NAB Show 2026 (Booth C5507), debu...

09/04/2026

NBC Sports Reviews Innovations and Milestones from Its 2025-26 NBA Regular Season

With the 2025-26 NBA regular season concluded and the playoffs beginning next we...

09/04/2026

NAB 2026: Telestream and Mimir Announce Integration for Ingest-to-Editorial Workflows

Telestream and Mimir have announced an integration connecting Telestream's V...

09/04/2026

NAB 2026: Bitmovin Expands Live Encoding and Observability Solutions for End-to-End Live Streaming Monitoring

Bitmovin has expanded its Live Encoding and Observability solutions to provide r...

09/04/2026

Nashville Predators and Scripps Sports Announce Multi-Year Broadcast Agreement

The Nashville Predators and Scripps Sports have announced a multi-year media rights agreement covering local preseason, regular season, and first-round playoff ...

09/04/2026

ASG Partners with Beam Dynamics for Asset Intelligence Platform

Advanced Systems Group, LLC has announced a partnership with Beam Dynamics to offer the Beam Asset and License Intelligence Platform to its clients. The platfor...

09/04/2026

NAB 2026: Lawo Introduces Edge One Converged Video and Audio Stagebox

Lawo has unveiled Edge One, a combined video and audio stagebox for broadcast and Pro AV workflows. The device will be on display at NAB Show (Booth C2108, Apri...

09/04/2026

NAB 2026: SMPTE to Host ST 2110 IP Media Roadshow

The Society of Motion Picture and Television Engineers (SMPTE) will host the SMPTE ST 2110 IP Media Roadshow on Tuesday, April 21, 2026, at the Las Vegas Conven...

09/04/2026

Atlanta Braves Upgrade Video Displays at Truist Park

The Atlanta Braves have completed upgrades to video displays in and around Truist Park ahead of the 2026 MLB season. The upgrades include the Delta Out-of-Town ...

09/04/2026

USC Installs Daktronics LED Displays Across Four Athletics Venues

The University of Southern California has contracted Daktronics (NASDAQ: DAKT) of Brookings, South Dakota, to manufacture and install 22 LED displays across fou...

09/04/2026

NAB 2026: Backlight to Showcase Iconik and Wildmoka Integration

Backlight, the media technology company behind Iconik and Wildmoka, will showcase its Creative Operations Platform at NAB Show 2026 (Booth N2829, April 19-22). ...

09/04/2026

MotoAmerica Superbike to Air on VICE TV for 2026 Season

MotoAmerica and V10 Entertainment have announced a partnership to broadcast MotoAmerica Superbike racing on VICE TV for the 2026 season. Coverage begins live on...

09/04/2026

Proton Camera Innovations Appoints Tod Musgrave as US Sales and Marketing Director

Proton Camera Innovations has announced the appointment of Tod Musgrave as US Sa...

09/04/2026

Former UEFA, Orange Executive Nicolas Dal Launches OVERCAST Private-Cloud Production Service

Designed specifically for live sports broadcasting, new platform features IP-nat...

09/04/2026

NEWstalgia: How the Return of the NBA on NBC Was Driven by a Bold and Ownable' Graphics Package

Blending 1990s DNA, modern motion theory, and a distinctly colorful brand identi...

09/04/2026

SVG Sit-Down: Christy Media's Amy Vacher on What It Takes To Find the Best Person for the Job

Technical capability is essential, but long-term success often depends on how we...

09/04/2026

Sundance Film Festival: CDMX 2026 by Cinpolis Unveils Official Program for Its Third Edition

15 feature films, including fiction and documentaries, along with six short film...

09/04/2026

Spotify Introduces New Video Controls for Listeners

Spotify has always been about putting listeners in the driver's seat. Today, people don't just want more ways to spend their time; they want that time t...

09/04/2026

New Spotify Video Controls Put Families and Parents in Charge

Our Chief Public Affairs Officer Dustee Jenkins shares how we're building a more positive experience for families on Spotify. As Spotify's Chief Public...

09/04/2026

Get Festival-Ready With These 4 Spotify Features

Festival season is upon us. From sun-soaked weekends out west to iconic stages in Chicago and New York, fans are getting ready to see their favorite artists liv...

09/04/2026

Spotify Introduz Novos Controles de Vdeo para Ouvintes

O Spotify sempre teve como foco colocar os ouvintes no controle. Hoje, as pessoas n o querem apenas mais formas de passar o tempo - elas querem que esse tempo s...

09/04/2026

Novos controles de vdeo do Spotify colocam pais e famlias no comando

Read the original note in English here. Nossa Chief Public Affairs Officer, Dustee Jenkins, compartilha como estamos construindo uma experi ncia mais positiva ...

09/04/2026

Spaces from Smokestack Sounds

New synth focuses on sci-fi scoring Following their formation and debut releases in December 2025, Smokestack Sounds - the brainchild of composer and produc...

09/04/2026

Reason Studios preview Reason 14

Latest version set for May 2026 launch Reason Studios have revealed that the latest version of their DAW software will be launching in May 2026. Currently a...

09/04/2026

Shy Audio introduce EQT-1M

New EQ aimed at mix-bus & mastering duties Shy Audio's first two releases focused on the past, delivering recreations of the budget mixers that were com...

09/04/2026

PBS' The Forsytes' Puts a Glamorous New Spin on the Beloved Family Drama as the series premieres in the US.

Based on John Galsworthy's novels known collectively as The Forsyte Saga a...

09/04/2026

The Forsytes' Renewed For Season 3 At PBS Masterpiece

The Forsytes has been renewed for a third season before the period drama has even premiered on PBS Masterpiece. The adaptation of John Galsworthy's novel, ...

09/04/2026

BBC brings Danny Robins The Witch Farm to the screen Inspired by the hit podcast of the same name, filming begins soon

IThe BBC has commissioned new drama The Witch Farm, starring Gabrielle Creevy (T...

09/04/2026

NAB 2026 Major Announcement

The next big thing To help broadcasters fully embrace dynamic hybrid workflows, Calrec will make a major announcement that unlocks even more freedom for broadca...

09/04/2026

ENCO to Showcase New aiTrack Capabilities at 2026 NAB Show

Share Copy link Facebook X Linkedin Bluesky Email...

09/04/2026

LTN Unveils Network Enhancements in Advance of C-Band Changes

Share Copy link Facebook X Linkedin Bluesky Email...

09/04/2026

FOR-A Buys Tamura Corp. Information Equipment Business

Share Copy link Facebook X Linkedin Bluesky Email...

09/04/2026

Imagine Showcases Expanded Multiviewer Portfolio at 2026...

Purpose Built Monitoring From Live Production to Master Control to OTT, Across On Prem and Cloud Environments At the 2026 NAB Show (April 19-22, Las Vegas Con...