Sony Pixel Power calrec Sony

How Deep Learning Is Aiding Preservation of Seneca and Other Endangered Languages

03/01/2019

Linguists estimate that at least half of the world's estimated 7,000 spoken languages will become extinct by the century's end, due to forces ranging from globalization to cultural assimilation.

Part of the challenge of documenting and revitalizing endangered languages is a lack of texts and speech recordings to work with. Seneca, a language of one of the six Iroquois Nations in North America, has only about 100 first-language speakers and several hundred more second-language learners.

Automatic speech recognition (ASR) technology is widely used to transcribe languages with millions or billions of speakers, like English and Mandarin. But it has only scratched the surface with languages like Seneca, which have vastly fewer speakers and significantly less data to work with.

Now a team of researchers at the Rochester Institute of Technology in New York, along with colleagues from the University at Buffalo, is tapping deep learning to bolster the ability of ASR. And while its focus is on Seneca, the project's vision encompasses the preservation of languages globally as well as an important part of our shared cultural history.

Knowing about different languages teaches us a lot about how our brain works, said Emily Prud'hommeaux, an assistant professor of computer science at Boston College and a research faculty member at RIT. When you document a language, you're preserving information not only about that language but also about how humans use language in general.

It's no coincidence that Prud'hommeaux and her team started with the Seneca language. Three members of the Seneca nation are part of the effort - a direct connection that is rare in research of this type, she said.

Leading the charge is Robbie Jimerson, a Ph.D. student in RIT's Golisano College of Computing and Information Science. He is a member of the Seneca Nation of Indians and is passionate about ensuring the survival of the Seneca language.

There's a big effort by the leaders of the tribe to preserve and promote our language, said Jimerson. I was looking for an opportunity to contribute.

Using GANs to Create More Language Samples Now in its third year, the project has had challenges when it comes to accumulating language data. Jimerson said the Seneca community can be guarded about what it shares with other people, so there wasn't an abundance of recordings of the language being spoken. He set out to change that.

He started by recording friends and elders who speak the language and asking them to record their friends. He found out whenever someone was speaking Seneca in public. He asked for family recordings of elders telling stories handed down from previous generations. And he grabbed any publicly available videos or recordings he could find online.

The team has fine-tuned an ASR model for Seneca, running it through generative adversarial networks to create more samples out of the limited number of recordings. The model turns wave files of the spoken language into streams of characters, while computing probability and making corrections.

The resulting data is fed into a deep learning model that in turn expands upon the ASR model's accuracy.

The team's networks run in two compute settings: on a nine-server machine learning lab running a variety of NVIDIA Tesla GPUs, and on a university cluster of large servers, each running 10 NVIDIA Tesla P4 GPUs. Each cluster runs a range of deep learning frameworks such as TensorFlow and Caffe.

The computer engineering cluster is for all students in the computer engineering department, and so they have to compete' for these resources, said Ray Ptucha, assistant professor of computer engineering at RIT, another collaborator on this project.

With access to these clusters at a premium, Jimerson tests code and checks the stability of models on a local machine running an NVIDIA TITAN X rather than inconvenience other students by running a model that might crash.

Achieving Better Accuracy So far, the team's efforts have brought the word error rate of its ASR model from 70 percent down to 56 percent. The goal, said Prud'hommeaux, is to get that rate down to 25 percent, which is where ASR systems were in processing English several years ago.

The more samples of spoken and written Seneca the team can accumulate, the more the error rate will decrease. (Today, English ASR models can achieve word error rates as low as 5 percent.)

The team's work is expected to help with language preservation efforts around the world.

Prud'hommeaux said the team has an agreement with an archiving institution that's a condition of a grant the project received from the National Science Foundation. The resulting language archiving database will be made available as a resource for other efforts seeking to document threatened languages.

Additionally, Prud'hommeaux said the team's work could prove helpful for any deep learning effort that has to make do with limited amounts of data.

Read more about the team's work in their research papers here and here.

Feature image: The Haudenosaunee (Iroquois Confederacy) flag, via Wikimedia Commons.
LINK: https://blogs.nvidia.com/blog/2019/01/02/deep-learning-preserves-senec...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

02/05/2026

Dalet Flex LTS Delivers Smarter Search, Faster Editing, and an AI-Ready Foundation for Modern Media

Dalet, a leading technology and service provider for media-rich organizations, t...

01/05/2026

NBCUniversal's Peacock to Be First Streamer to Integrate Dolby's Full Suite of Premium Picture and Sound Innovations

January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...

01/04/2026

DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION

January 4 2026, 18:00 (PST) DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION Douyin Users Can Now Create And Share Videos With Stun...

23/03/2026

Professional Fighters League & U-Next Renew Multi-year Partnership to Broadcast PFL Events Live in Japan

The Professional Fighters League (PFL) has renewed its multi-year partnership wi...

23/03/2026

The Snow League Announces Google Cloud as Official Cloud and AI Partner to Power Personalized Fan Experiences Worldwide

The Snow League has named Google Cloud as its Official Cloud and AI Partner. The...

23/03/2026

Chyron Appoints Eric Wolff as Director of Venues Sales, North America

Chyron has appointed Eric Wolff as Director of Venues Sales, North America. Wolff previously served as Director of Broadcast Operations & Media Production for T...

23/03/2026

Chicago Sports Network, WCIU To Simulcast 10 White Sox Games

Chicago Sports Network (CHSN) and Weigel Broadcasting's WCIU (The U, ch. 26.1) will simulcast 10 Chicago White Sox games during the 2026 season, the compani...

23/03/2026

Cosm Appoints Jon Werbeck as VP, Head of Sponsorships

Cosm has appointed Jon Werbeck as Vice President, Head of Sponsorships. He will report to Corey Breton, Head of Venues, and will focus on corporate sponsorship ...

23/03/2026

CP Communications Partners with Mark Roberts Motion Control to Bring Robotic Camera Systems to Florida

CP Communications has announced a partnership with Mark Roberts Motion Control (...

23/03/2026

NAB 2026: Creators, Media Leaders, Storytelling Legends and AI Step Into the Spotlight as Lineup is Unveiled

NAB Show 2026, taking place April 18-22 (exhibits April 19-22) at the Las Vegas ...

23/03/2026

Bay FC Announces Multi-Year Partnership with Victory+ as the Official Streaming Partner

Bay FC and free streaming platform Victory have announced a partnership through...

23/03/2026

MLB, Google Cloud Debut AI-Powered, Real-Time Game Analysis With Scout Insights' in MLB Gameday

Gemini AI models will surface hidden context around pitches, matchups, rare stat...

23/03/2026

Behind the Mic: NBC Sports Announces MLB Opening Day On Air Lineup, Featuring Orel Hershiser, Al Leiter and More.

Behind The Mic provides a roundup of recent news regarding on-air talent, includ...

23/03/2026

SVG All-Stars: Alex McKeen, Senior Manager, Strategic Production Planning, TNT Sports

Growing from broadcast engineer to strategic planner, this Ithaca College grad h...

23/03/2026

Announcing the 2026 Sundance Institute | Sandbox Fund Grantees

16 Science-Focused Nonfiction Projects Selected for Funding LOS ANGELES, CA, March 23, 2026 - The nonprofit Sundance Institute and Sandbox Films announced toda...

23/03/2026

Celebrate 20 Years of Hannah Montana With Megan Moroney's New The Best of Both Worlds' Cover and More

It's been 20 years since Miley Cyrus introduced the world to Hannah Montana,...

23/03/2026

BOOM Library release Seasons of Earth: European Autumn

Made entirely from real natural recordings Aimed at sound designers and editors working in film, TV and game audio, the latest release from BOOM Library com...

23/03/2026

Klang.io launch Transcription Studio

Transcribe sheets, tabs or MIDI from audio files Klang.io have announced the launch of a new AI-powered software tool that's capable of detecting multip...

23/03/2026

RL10 and Orion Main Engine are Key to NASA's Historic Artemis II Journey to the Moon

An auxiliary target has been affixed to the Interim Cryogenic Propulsion Stage f...

23/03/2026

Scripps To Launch Scripps Sports Network Streaming Channel

Share Copy link Facebook X Linkedin Bluesky Email...

23/03/2026

IBEW Calls for Scrutiny of Skydance-CBS Layoffs and Proposed CNN Merger

Share Copy link Facebook X Linkedin Bluesky Email...

23/03/2026

Bruno Mars Risk It All Music Video Captures Timeless Text...

Pro8mm, the Super 8 experts, provided cameras, Super 8 movie film, and scanning services for Bruno Mars' Risk It All music video. The debut single from Br...

23/03/2026

Matthews Intros Lightweight Aluminum Grid Clamps

Matthews, introduces their first aluminum grid clamp collection, engineered for the rigging needs of film, television and live production. Combining light weigh...

23/03/2026

Hacks, the multi-Emmy -winning Sky Exclusive comedy, returns to the UK for its final season this April

Monday 23 March 2026 Hacks, the multi-Emmy -winning Sky Exclusive comedy, retur...

23/03/2026

'Too Hot to Handle: Italy' Reignites for a Second Season With the Special Participation of Selvaggia Lucarelli

Back to All News Too Hot to Handle: Italy Reignites for a Second Season With th...

23/03/2026

How Autonomous AI Agents Become Secure by Design With NVIDIA OpenShell

Autonomous agents mark a new inflection point in AI. Systems are no longer limited to generating responses or reasoning through tasks. They can take action: Age...

23/03/2026

RT Statement on the Death of Sports Broadcaster Michael Lyster

RT is sad today to learn of the death of legendary RT Sport broadcaster Michael Lyster, who died this morning aged 71 years. Kevin Bakhurst, Director-General...

23/03/2026

RT Documentary On One wins its first ever dedicated music award

RT Documentary On One has scooped its first ever dedicated music award. At the 2026 Icelandic Music Awards, composer lfur Eldj rn won Release of the Year in t...

23/03/2026

Czechia v Republic of Ireland live on RT2, RT Player and RT Radio 1

Inside Sport, Liveline, Morning Ireland and 2FM DRIVE will all be in Prague to bring fans to the heart of the action Every Moment, Every GenerationRT | FIFA W...

22/03/2026

VSL update Synchron Woodwinds & Strings

Free updates now available VSL have just released some free updates that add some existing features to a selection of libraries in their expansive Synchron ...

22/03/2026

Live-Action Sins of Kujo' Premieres April 2: Main Trailer and Key Art Debut

Back to All News Live-Action Sins of Kujo' Premieres April 2: Main Trailer and Key Art Debut Entertainment 22 March 2026 GlobalJapan Link copied to cl...

21/03/2026

MPG announce new Impact Award

Presented to War Child UK's HELP(2) project The MPG (Music Producers Guild) have announced the launch of the MPG Impact Award, a brand-new honour that w...

21/03/2026

Eduardo Tarilonte's Ancient ERA Persia from Best Service

Microtuning support for Arabic, Persian & Turkish scales The latest release from Best Service brings together a selection of string, wind and percussion ins...

21/03/2026

New campaign from NAATI and SBS CulturalConnect highlights how we all deserve to be understood'

New campaign from NAATI and SBS CulturalConnect highlights how we all deserve t...

21/03/2026

Statement regarding Rhoda Roberts AO

Statement regarding Rhoda Roberts AO 21 March, 2026 Media releases SBS is deeply saddened by the passing of Widjabul Wia-bal woman from the Bundjalung Na...

21/03/2026

Survey: Fans Prefer Sports on Broadcast Over Streaming

Share Copy link Facebook X Linkedin Bluesky Email...

21/03/2026

Graham Promotes Stephanie Slagle to VP, CRO & GM of WDIV Local 4

Share Copy link Facebook X Linkedin Bluesky Email...

21/03/2026

Study: Repurposed Traditional TV Ads for CTV Is a Missed Opportunity

Share Copy link Facebook X Linkedin Bluesky Email...

21/03/2026

Carr Backs Trump Army/Navy Game Executive Order

Share Copy link Facebook X Linkedin Bluesky Email...

21/03/2026

Opponents File Emergency FCC Petition to Block Nexstar/Tegna Merger

Share Copy link Facebook X Linkedin Bluesky Email...

21/03/2026

Eight States Ask for Court to Stop Nexstar/Tegna Merger

Share Copy link Facebook X Linkedin Bluesky Email...

21/03/2026

Cine Gear Connect NY Ramps Up for March 28 - 2026

Cine Gear Connect NY, presented by Universal Production Services, is filling in the slate for a full day of panels, peers, learning the latest, and mixing it up...

21/03/2026

Studio Technologies Debuts New StudioComm System at NAB 2026

Studio Technologies Debuts New StudioComm System at NAB 2026 Brie Clayton March 20, 2026 0 Comments StudioComm Model 794 Central Controller and Model ...

21/03/2026

Restoration Christian Fellowship Captures Worship Music Videos with PYXIS 12K

Restoration Christian Fellowship Captures Worship Music Videos with PYXIS 12K Brie Clayton March 20, 2026 0 Comments PYXIS' open gate provides cre...

20/03/2026

NAB 2026: Net Insight unveils Market-Leading JPEG XS at Scale for Live IP Media Production

Net Insight will introduce a JPEG XS solution for full IP environments at NAB Sh...

20/03/2026

NAB 2026: LTN and Harmonic Expand Partnership to Support FAST Growth and C-Band Migration

LTN has expanded its technology partnership with Harmonic ahead of the FCC's...

20/03/2026

NAB 2026: Solid State Logic to Preview SSL Live V6.2 with New SolidPitch Effect and Major Workflow Enhancements

Solid State Logic will preview SSL Live V6.2 at NAB Show, booth C6907. The softw...

20/03/2026

NAB 2026: Fujifilm Announces Availability of FUJINON UA22x4.8BERD 4K Broadcast Zoom Lens

FUJIFILM North America Corporation's Optical Devices Division has announced ...