Sony Pixel Power calrec Sony

Mori Speech AI Model Helps Preserve and Promote New Zealand Indigenous Language


Indigenous languages are under threat. Some 3,000 - three-quarters of the total - could disappear before the end of the century, or one every two weeks, according to UNESCO.

As part of a movement to protect such languages, New Zealand's Te Hiku Media, a broadcaster focused on the M ori people's indigenous language known as te reo, is using trustworthy AI to help preserve and revitalize the tongue.

Using ethical, transparent methods of speech data collection and analysis to maintain data sovereignty for the M ori people, Te Hiku Media is developing automatic speech recognition (ASR) models for te reo, which is a Polynesian language.

Built using the open-source NVIDIA NeMo toolkit for ASR and NVIDIA A100 Tensor Core GPUs, the speech-to-text models transcribe te reo with 92% accuracy. It can also transcribe bilingual speech using English and te reo with 82% accuracy. They're pivotal tools, made by and for the M ori people, that are helping preserve and amplify their stories.

There's immense value in using NVIDIA's open-source technologies to build the tools we need to ultimately achieve our mission, which is the preservation, promotion and revitalization of te reo M ori, said Keoni Mahelona, chief technology officer at Te Hiku Media, who leads a team of data scientists and developers, as well as M ori language experts and data curators, working on the project.

We're also helping guide the industry on ethical ways of using data and technologies to ensure they're used for the empowerment of marginalized communities, added Mahelona, a Native Hawaiian now living in New Zealand.

Building a House of Speech' Te Hiku Media began more than three decades ago as a radio station aiming to ensure te reo had space on the airwaves. Over the years, the organization incorporated television broadcasting and, with the rise of the internet, it convened a meeting in 2013 with the community's elders to form a strategy for sharing content in the digital era.

The elders agreed that we should make the stories accessible online for our community members - rather than just keeping our archives on cassettes in boxes - but once we had that objective, the challenge was how to do this correctly, in alignment with our strong roots in valuing sovereignty, Mahelona said.

Instead of uploading its video and audio sources to popular, global platforms - which, in their terms and conditions of use, require signing over certain rights related to the content - Te Hiku Media decided to build its own content distribution platform.

Called Whare K rero - meaning house of speech - the platform now holds more than 30 years' worth of digitized, archival material featuring about 1,000 hours of te reo native speakers, some of whom were born in the late 19th century, as well as more recent content from second-language learners and bilingual M ori people.

Now, around 20 M ori radio stations use and upload their content to Whare K rero. Community members can access the content through an app.

It's an invaluable resource of acoustic data, Mahelona said.

Turning to Trustworthy AI Such a trove held incredible value for those working to revitalize the language, the Te Hiku Media team quickly realized, but manual transcription required pulling lots of time and effort from limited resources. So began the organization's trustworthy AI efforts, in 2016, to accelerate its work using ASR.

No one would have a clue that there are eight NVIDIA A100 GPUs in our derelict, rundown, musky-smelling building in the far north of New Zealand - training and building M ori language models, Mahelona said. But the work has been game-changing for us.

To collect speech data in a transparent, ethically compliant, community-oriented way, Te Hiku Media began by explaining its cause to elders, garnering their support and asking them to come to the station to read phrases aloud.

It was really important that we had the support of the elders and that we recorded their voices, because that's the sort of content we want to transcribe, Mahelona said. But eventually these efforts didn't scale - we needed second-language learners, kids, middle-aged people and a lot more speech data in general.

So, the organization ran a crowdsourcing campaign, K rero M ori, to collect highly labeled speech samples according to the Kaitiakitanga license, which ensures Te Hiku Media uses the data only for the benefit of the M ori people.

In just 10 days, more than 2,500 signed up to read 200,000+ phrases, providing over 300 hours of labeled speech data, which was used to build and train the te reo M ori ASR models.

In addition to other open-source trustworthy AI tools, Te Hiku Media now uses the NVIDIA NeMo toolkit's ASR module for speech AI throughout its entire pipeline. The NeMo toolkit comprises building blocks called neural modules and includes pretrained models for language model development.

It's been absolutely amazing - NVIDIA's open-source NeMo enabled our ASR models to be bilingual and added automatic punctuation to our transcriptions, Mahelona said.

Te Hiku Media's ASR models are the engines running behind Kaituhi, a te reo M ori transcription service now available online.

The efforts have spurred similar ASR projects now underway by Native Hawaiians and the Mohawk people in southeastern Canada.

It's indigenous-led work in trustworthy AI that's inspiring other indigenous groups to think: If they can do it, we can do it, too,' Mahelona said.

Learn more about NVIDIA-powered trustworthy AI, the NVIDIA NeMo toolkit and how it enabled a Telugu language speech AI breakthrough.
See more stories from nvidia

More from Nvidia


Wide Open: NVIDIA Accelerates Inference on Meta Llama 3

NVIDIA today announced optimizations across all its platforms to accelerate Meta Llama 3, the latest generation of the large language model (LLM). The open mod...


Up to No Good: No Rest for the Wicked' Early Access Launches on GeForce NOW

It's time to get a little wicked. Members can now stream No Rest for the Wicked from the cloud. It leads six new games joining the GeForce NOW library of m...


NVIDIA Honors Partners of the Year in Europe, Middle East, Africa

NVIDIA today recognized 18 partners in Europe, the Middle East and Africa for their achievements and commitment to driving AI adoption. The recipients were hon...


Seeing Beyond: Living Optics CEO Robin Wang on Democratizing Hyperspectral Imaging

Step into the realm of the unseen with Robin Wang, CEO of Living Optics. The sta...


Moving Pictures: Transform Images Into 3D Scenes With NVIDIA Instant NeRF

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, ...


New NVIDIA RTX A400 and A1000 GPUs Enhance AI-Powered Design and Productivity Workflows

AI integration across design and productivity applications is becoming the new s...


To Cut a Long Story Short: Video Editors Benefit From DaVinci Resolve's New AI Features Powered by RTX

Editor's note: This post is part of our In the NVIDIA Studio series, which c...


AI Is Tech's Greatest Contribution to Social Elevation,' NVIDIA CEO Tells Oregon State Students

AI promises to bring the full benefits of the digital revolution to billions acr...


The Building Blocks of AI: Decoding the Role and Significance of Foundation Models

Editor's note: This post is part of the AI Decoded series, which demystifies...


Combating Corruption With Data: Cleanlab and Berkeley Research Group on Using AI-Powered Investigative Analytics

Talk about scrubbing data. Curtis Northcutt, cofounder and CEO of Cleanlab, and ...


NVIDIA Joins $110 Million Partnership to Help Universities Teach AI Skills

The Biden Administration has announced a new $110 million AI partnership between Japan and the United States that includes an initiative to fund research throug...


Broadcasting Breakthroughs: NVIDIA Holoscan for Media, Available Now, Transforms Live Media With Easy AI Integration

Whether delivering live sports programming, streaming services, network broadcas...


Start Up Your Engines: NVIDIA and Google Cloud Collaborate to Accelerate AI Development

NVIDIA and Google Cloud have announced a new collaboration to help startups arou...


NVIDIA Ranked by Fortune at No. 3 on 100 Best Companies to Work For' List

NVIDIA jumped to No. 3 on the latest list of America's 100 Best Companies to Work For by Fortune magazine and Great Place to Work. It's the company'...


The Elder Scrolls Online' Joins GeForce NOW for Game's 10th Anniversary

Rain or shine, a new month means new games. GeForce NOW kicks off April with nearly 20 new games, seven of which are available to play this week. GFN Thursday ...


A New Lens: Dotlumen CEO Cornel Amariei on Assistive Technology for the Visually Impaired

Dotlumen is illuminating a new technology to help people with visual impairments...


Coming Up ACEs: Decoding the AI Technology That's Enhancing Games With Realistic Digital Humans

Editor's note: This post is part of the AI Decoded series, which demystifies...


Greater Scope: Doctors Get Inside Look at Gut Health With AI-Powered Endoscopy

From humble beginnings as a university spinoff to an acquisition by the leading global medtech company in its field, Odin Vision has been on an accelerated jour...


Get Cozy With Palia' on GeForce NOW

Ease into spring with the warm, cozy vibes of Palia, coming to the cloud this GFN Thursday. It's part of six new titles joining the GeForce NOW library of ...


Software Developers Launch OpenUSD and Generative AI-Powered Product Configurators Built on NVIDIA Omniverse

From designing dream cars to customizing clothing, 3D product configurators are ...


NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI. In the latest MLPerf benchmarks...


Viome's Guru Banavar Discusses AI for Personalized Health

In the latest episode of NVIDIA's AI Podcast, Viome Chief Technology Officer Guru Banavar spoke with host Noah Kravitz about how AI and RNA sequencing are r...


Unlocking Peak Generations: TensorRT Accelerates AI on RTX PCs and Workstations

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, ...


Boom in AI-Enabled Medical Devices Transforms Healthcare

The future of healthcare is software-defined and AI-enabled. Around 700 FDA-cleared, AI-enabled medical devices are now on the market - more than 10x the number...


Model Innovators: How Digital Twins Are Making Industries More Efficient

A manufacturing plant near Hsinchu, Taiwan's Silicon Valley, is among facilities worldwide boosting energy efficiency with AI-enabled digital twins. A virt...


Into the Omniverse: Groundbreaking OpenUSD Advancements Put NVIDIA GTC Spotlight on Developers

Editor's note: This post is part of Into the Omniverse, a series focused on ...


NVIDIA Blackwell and Automotive Industry Innovators Dazzle at NVIDIA GTC

Generative AI, in the data center and in the car, is making vehicle experiences safer and more enjoyable. The latest advancements in automotive technology were...


AI's New Frontier: From Daydreams to Digital Deeds

Imagine a world where you can whisper your digital wishes into your device, and poof, it happens. That world may be coming sooner than you think. But if you...


You Transformed the World,' NVIDIA CEO Tells Researchers Behind Landmark AI Paper

Of GTC's 900+ sessions, the most wildly popular was a conversation hosted by...


Instant Latte: NVIDIA Gen AI Research Brews 3D Shapes in Under a Second

NVIDIA researchers have pumped a double shot of acceleration into their latest text-to-3D generative AI model, dubbed LATTE3D. Like a virtual 3D printer, LATTE...


Here Be Dragons: Dragon's Dogma 2' Comes to GeForce NOW

Arise for a new adventure with Dragon's Dogma 2, leading two new titles joining the GeForce NOW library this week. Set Forth, Arisen Fulfill a forgotten de...


AI Decoded From GTC: The Latest Developer Tools and Apps Accelerating AI on PC and Workstation

Editor's note: This post is part of the AI Decoded series, which demystifies...


NVIDIA Celebrates Americas Partners Driving AI-Powered Transformation

NVIDIA recognized 14 partners in the Americas for their achievements in transforming businesses with AI, this week at GTC. The winners of the NVIDIA Partner Ne...


Climate Pioneers: 3 Startups Harnessing NVIDIA's AI and Earth-2 Platforms

To help mitigate climate change - one of humanity's greatest challenges - researchers are turning to AI and sustainable computing to accelerate and operatio...


Secure by Design: NVIDIA AIOps Partner Ecosystem Blends AI for Businesses

In today's complex business environments, IT teams face a constant flow of challenges, from simple issues like employee account lockouts to critical securit...


Generation Sensation: New Generative AI and RTX Tools Boost Content Creation

Editor's note: This post is part of our In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates ho...


NVIDIA, Huang Win Top Honors in Innovation, Engineering

NVIDIA today was named the world's most innovative company by Fast Company magazine. The accolade comes on the heels of company founder and CEO Jensen Huan...


NVIDIA Edify Unlocks 3D Generative AI, New Image Controls for Visual Content Providers

NVIDIA Edify, a multimodal architecture for visual generative AI, is entering a ...


From Atoms to Supercomputers: NVIDIA, Partners Scale Quantum Computing

The latest advances in quantum computing include investigating molecules, deploying giant supercomputers and building the quantum workforce with a new academic ...


New NVIDIA Storage Partner Validation Program Streamlines Enterprise AI Deployments

A sharp increase in generative AI deployments is driving business innovation for...


NVIDIA Unveils Digital Blueprint for Building Next-Gen Data Centers

Designing, simulating and bringing up modern data centers is incredibly complex, involving multiple considerations like performance, energy efficiency and scala...


Generative AI Developers Harness NVIDIA Technologies to Transform In-Vehicle Experiences

Cars of the future will be more than just modes of transportation; they'll b...


All Eyes on AI: Automotive Tech on Full Display at GTC 2024

All eyes across the auto industry are on GTC - the global AI conference running in San Jose, Calif., and online through Thursday, March 21 - as the world's ...


All Aboard: NVIDIA Scores 23 World Records for Route Optimization

With nearly two dozen world records to its name, NVIDIA cuOpt now holds the top spot for 100% of the largest routing benchmarks in the last three years. And thi...


We Created a Processor for the Generative AI Era,' NVIDIA CEO Says

Generative AI promises to revolutionize every industry it touches - all that's been needed is the technology to meet the challenge. NVIDIA founder and CEO ...


NVIDIA GTC 2024: A Glimpse Into the Future of AI With Jensen Huang

NVIDIA's GTC 2024 AI conference will set the stage for another leap forward in AI. At the heart of this highly anticipated event: the opening keynote by Je...


Reach for the Stars: Eight Out-of-This-World Games Join the Cloud

The stars align this GFN Thursday as more top titles from Ubisoft and Square Enix join the cloud. Star Wars Outlaws will be coming to the GeForce NOW library a...


Currents of Change: ITIF President Daniel Castro on Energy-Efficient AI and Climate Change

AI-driven change is in the air, as are concerns about the technology's envir...


AI Decoded: Demystifying Large Language Models, the Brains Behind Chatbots

Editor's note: This post is part of our AI Decoded series, which aims to demystify AI by making the technology more accessible, while showcasing new hardwar...


Head of the Class: Explore AI's Potential in Higher Education and Research at GTC

For students, researchers and educators eager to delve into AI, GTC - NVIDIA'...