All the Feels: NVIDIA Shares Expressive Speech Synthesis Research at Interspeech

31/08/2021

AI has transformed synthesized speech from the monotone of robocalls and decades-old GPS navigation systems to the polished tone of virtual assistants in smartphones and smart speakers.

But there's still a gap between AI-synthesized speech and the human speech we hear in daily conversation and in the media. That's because people speak with complex rhythm, intonation and timbre that's challenging for AI to emulate.

The gap is closing fast: NVIDIA researchers are building models and tools for high-quality, controllable speech synthesis that capture the richness of human speech, without audio artifacts. Their latest projects are now on display in sessions at the Interspeech 2021 conference, which runs through Sept. 3.

These models can help voice automated customer service lines for banks and retailers, bring video-game or book characters to life, and provide real-time speech synthesis for digital avatars.

NVIDIA's in-house creative team even uses the technology to produce expressive narration for a video series on the power of AI.

Expressive speech synthesis is just one element of NVIDIA Research's work in conversational AI - a field that also encompasses natural language processing, automated speech recognition, keyword detection, audio enhancement and more.

Optimized to run efficiently on NVIDIA GPUs, some of this cutting-edge work has been made open source through the NVIDIA NeMo toolkit, available on our NGC hub of containers and other software.

Behind the Scenes of I AM AI NVIDIA researchers and creative professionals don't just talk the conversational AI talk. They walk the walk, putting groundbreaking speech synthesis models to work in our I AM AI video series, which features global AI innovators reshaping just about every industry imaginable.

But until recently, these videos were narrated by a human. Previous speech synthesis models offered limited control over a synthesized voice's pacing and pitch, so attempts at AI narration didn't evoke the emotional response in viewers that a talented human speaker could.

That changed over the past year when NVIDIA's text-to-speech research team developed more powerful, controllable speech synthesis models like RAD-TTS, used in our winning demo at the SIGGRAPH Real-Time Live competition. By training the text-to-speech model with audio of an individual's speech, RAD-TTS can convert any text prompt into the speaker's voice.

Another of its features is voice conversion, where one speaker's words (or even singing) is delivered in another speaker's voice. Inspired by the idea of the human voice as a musical instrument, the RAD-TTS interface gives users fine-grained, frame-level control over the synthesized voice's pitch, duration and energy.

With this interface, our video producer could record himself reading the video script, and then use the AI model to convert his speech into the female narrator's voice. Using this baseline narration, the producer could then direct the AI like a voice actor - tweaking the synthesized speech to emphasize specific words, and modifying the pacing of the narration to better express the video's tone.

The AI model's capabilities go beyond voiceover work: text-to-speech can be used in gaming, to aid individuals with vocal disabilities or to help users translate between languages in their own voice. It can even recreate the performances of iconic singers, matching not only the melody of a song, but also the emotional expression behind the vocals.

Giving Voice to AI Developers, Researchers With NVIDIA NeMo - an open-source Python toolkit for GPU-accelerated conversational AI - researchers, developers and creators gain a head start in experimenting with, and fine-tuning, speech models for their own applications.

Easy-to-use APIs and models pretrained in NeMo help researchers develop and customize models for text-to-speech, natural language processing and real-time automated speech recognition. Several of the models are trained with tens of thousands of hours of audio data on NVIDIA DGX systems. Developers can fine tune any model for their use cases, speeding up training using mixed-precision computing on NVIDIA Tensor Core GPUs.

Through NGC, NVIDIA NeMo also offers models trained on Mozilla Common Voice, a dataset with nearly 14,000 hours of crowd-sourced speech data in 76 languages. Supported by NVIDIA, the project aims to democratize voice technology with the world's largest open data voice dataset.

Voice Box: NVIDIA Researchers Unpack AI Speech Interspeech brings together more than 1,000 researchers to showcase groundbreaking work in speech technology. At this week's conference, NVIDIA Research is presenting conversational AI model architectures as well as fully formatted speech datasets for developers.

Catch the following sessions led by NVIDIA speakers:

Scene-Agnostic Multi-Microphone Speech Dereverberation - Tues., Aug. 31

SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition - Weds., Sept. 1

Hi-Fi Multi-Speaker English TTS Dataset - Weds., Sept 1

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction - Thurs., Sept. 2

Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices - Friday, Sept. 3

NeMo Inverse Text Normalization: From Development to Production - Friday, Sept. 3

Find NVIDIA NeMo models in the NGC catalog, and tune into talks by NVIDIA researchers at Interspeech.

LINK:	https://blogs.nvidia.com/blog/2021/08/31/conversational-ai-research-sp...
	See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

07/10/2026

Dalet Flex LTS Delivers Smarter Media Operations from Ingest to Distribution

Dalet, a leading technology and service provider for media-rich organizations, today announced the latest Long-Term Supported (LTS) release of Dalet Flex. Build...

06/09/2026

Dolby and MagentaTV Bring Fans Closer to the FIFA World Cup 2026 in Germany with Dolby Vision and Dolby Atmos

June 9 2026, 23:00 (PDT) Dolby and MagentaTV Bring Fans Closer to the FIFA Worl...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

15/07/2026

COW Jobs: Producer, Virtual Events - Hybrid, Full Time, Salaried

COW Jobs: Producer, Virtual Events - Hybrid, Full Time, Salaried Brie Clayton July 15, 2026 0 Comments Producer, Virtual Events (Hybrid | Full-time | ...

15/07/2026

SIMBA powers new streaming delivery network with Broadpea...

Brazilian media organization builds nationwide CDN with Broadpeak's high-performance software for low-latency streaming and new content distribution service...

15/07/2026

DHD Introduces Discovery Utility for Network-Connected Au...

DHD announces a new utility designed to increase the efficiency of its audio mixing systems during initial installation and subsequent configuration. Anyone w...

15/07/2026

BCNEXXT Brings Next-Generation Vipe to IBC2026 Breaking t...

BCNEXXT Brings Next-Generation Vipe to IBC2026, Breaking the 1:1 Channel-to-Server Ratio New innovations liberate broadcasters from legacy playout infrastructur...

15/07/2026

S&P Analysis: Three Quarters of Americans Watch Live Sports

Share Copy link Facebook X Linkedin Bluesky Email...

15/07/2026

Scripps Sports, Ion Score Women's Volleyball Rights

Share Copy link Facebook X Linkedin Bluesky Email...

15/07/2026

Audiences to get behind-the-scenes access to Trinity College Dublin's Old Library in new RT series

Trinity's Treasures features Ruth Negga, Eleanor McEvoy, and David Norris, a...

15/07/2026

Building Ireland explores the River Shannon region from Leitrim to Limerick in a new series

Join geographer Dr Susan Hegarty, Engineer Tim Joyce and Architect Orla Murphy a...

15/07/2026

NVIDIA and Japan Bring Full-Stack AI and Robotics to Every Industry

Home to leading manufacturers, robotics pioneers, infrastructure builders and gaming, of course, Japan is one of the world's centers of AI - building across...

14/07/2026

Bowling Green State Upgrades Doyt Perry Stadium With New Daktronics LED Display

South end zone videoboard, cloud-based control system will be ready for 2026 football season...

14/07/2026

Mizzou Athletics Launches Connected Digital Platform

Redesigned website, enhanced mobile app unify content, ticketing, and personalized fan engagement...

14/07/2026

DePaul Athletics, Playfly Sports Agree to Multimedia Rights Partnership

Agreement spans sponsorship sales, digital monetization, radio production, and new practice facility naming rights...

14/07/2026

American Association Expands Broadcast Reach Through FanDuel Sports Network Partnership

Independent league adds 14 regional sports network affiliates, growing distribut...

14/07/2026

Euroleague Basketball Introduces Euroleague Basketball+ Digital Ecosystem Initiative

New strategy aims to unify competitions, content, fan engagement, and commercial...

14/07/2026

Professional Fighters League, ESPN Reach Multi-Year Media Rights Deal for Brazil

ESPN and Disney+ become exclusive home of PFL events in key international MMA market...

14/07/2026

TEGNA Names Scott Gill VP of Technology and Operations

Gill will oversee engineering, technology, and sports operations across the company's 64 local television stations...

14/07/2026

Guest Post: Dynamic Media Facilities Could Reshape the Future of Broadcast Workflows

Submitted by North American Broadcasters Association (NABA) As broadcasters con...

14/07/2026

Bayerischer Rundfunk Debuts Fully Software-Defined SMPTE ST 2110 Radio OB Van Built Around Lawo Technology

Modernized mobile unit combines HOME Apps, mc 56 console, and IP infrastructure ...

14/07/2026

Scripps Sports, ION Secure U.S. Rights to 2027 FIVB Womens Volleyball World Cup

Every match of the 32-team tournament will air across ION and Scripps Sports platforms in English and Spanish...

14/07/2026

FloSports Lands Exclusive U.S. Rights to IIHF Mens World Championship Beginning in 2027

FloHockey to stream every game of the annual international tournament under four...

14/07/2026

Minnesota Lynx Add Three Games to KARE 11s Over-the-Air Schedule

Victory+ telecasts to be simulcast on TEGNA-owned station, expanding free local distribution...

14/07/2026

Avalanche Tones debut with Chainsaw Suite

Plug-ins for heavy music Avalanche Tones is the brainchild of Ava Toton, a 17-year-old musician and developer who says her goal is to make the lives of gui...

14/07/2026

IK Multimedia introduce ReSing Voices Brazilian Pack

Launched alongside new Singer Showcase purchase model IK Multimedia's innovative vocal-synthesis software has just gained its latest voice add-on, the R...

14/07/2026

MIDI Innovations Awards 2026

Registration open until 1 September 2026 The MIDI Association have revealed that the registration deadline for this year's MIDI Innovation Awards has no...

14/07/2026

Launchkey MK4 88 joins Novation line-up

88-note model completes MK4 range Novation have just introduced the final model in their flagship MIDI controller keyboard range, the Launchkey MK4 88. Roun...

14/07/2026

CBS Atlanta Adds a Noon Newscast

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Tegna Names Scott Gill VP, Technology and Operations

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Colorado Wildfires Bring Close Call for Broadcasters

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

IBC2026 sets conference agenda

IBC2026 has unveiled a powerful Conference programme bringing together global media leaders, technology innovators, creators, sports organisations, broadcasters...

14/07/2026

Nominations for Best of Show Awards at IBC2026 Now Open

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Broadcast Solutions delivers industry-first software-defi...

Broadcast Solutions, a leading systems integrator and provider of innovative solutions for the broadcast media industry, has delivered two highly capable outsid...

14/07/2026

UPDATED: Scripps, DirecTV End Blackout, Ink New Retrans Deal

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

12 States Sue to Block $110 Billion Warner Bros./Paramount Merger

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Heidi Raphael to Head N.Y. State Broadcasters Association

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

CBS Atlanta Expands Live Local News Programming

Share Copy link Facebook X Linkedin Bluesky Email...

14/07/2026

Nemotron Labs: How Open Models Give Enterprises and Nations AI They Can Trust, Control and Customize

Editor's note: This post is part of the Nemotron Labs blog series, which exp...

14/07/2026

Techtel Successfully Relocates AICD Broadcast Studio to New Sydney Headquarters

Techtel Successfully Relocates AICD Broadcast Studio to New Sydney Headquarters BroadcastBroadcast EquipmentLive StreamingBroadcast Studio2026 14 July Writ...

14/07/2026

First look revealed for Friday the 13th prequel, Crystal Lake, from A24 coming to Sky and NOW in the UK and Ireland this October

Tuesday 14 July 2026 First look revealed for Friday the 13th prequel, Crystal ...

14/07/2026

Surround Is Still the Standard

When immersive audio dominates industry headlines, it's easy to assume that every broadcaster is preparing for an Atmos future. The reality is quite differ...

14/07/2026

Fresh Thinking from MAD//Fest London 2026

Emma and Sophie from ICG's marketing team joined thousands of fellow marketers, brands and agencies at MAD//Fest London 2026, one of the UK's biggest ma...

14/07/2026

Seven paradoxes shaping the next era of media production - Episode 3

Why Trusted and Secure Media Operations Matter In this series, we explore the technologies, architectures and operational realities shaping modern media operati...

14/07/2026

How Merchants Can Prepare for the Next Evolution in Digital Commerce

Pilot Project Shows How Retailers Are Prepared for the Next Step in the Evolution of Digital Commerce Arvato Systems Drives Agentic Commerce Forward G terslo...

14/07/2026

Building a more sustainable future - Our commitment to climate action

As part of this commitment, weve joined the SME Climate Hub, publicly pledging to: measure our greenhouse gas emissions reduce them in line with a net zero p...

14/07/2026

Why Performance per Watt Is the Ultimate Metric for AI Infrastructure Efficiency

Power is AI infrastructure's inescapable constraint. How many tokens an AI factory can generate within a fixed power budget determines its revenue and profi...

13/07/2026

BravesVision GM Jeff Cravens on Launching MLB's Newest Team-Owned Network in 35 Days

The Braves opted to keep production in-house rather than hand it off to MLB...

13/07/2026

Behind The Mic: Adam Schefter Signs Multi-Year Extension with ESPN

Behind The Mic provides a roundup of recent news regarding on-air talent, including new deals, departures, and assignments compiled from press releases and repo...

View most recent headlines