Sony Pixel Power calrec Sony

All the Feels: NVIDIA Shares Expressive Speech Synthesis Research at Interspeech

31/08/2021

AI has transformed synthesized speech from the monotone of robocalls and decades-old GPS navigation systems to the polished tone of virtual assistants in smartphones and smart speakers.

But there's still a gap between AI-synthesized speech and the human speech we hear in daily conversation and in the media. That's because people speak with complex rhythm, intonation and timbre that's challenging for AI to emulate.

The gap is closing fast: NVIDIA researchers are building models and tools for high-quality, controllable speech synthesis that capture the richness of human speech, without audio artifacts. Their latest projects are now on display in sessions at the Interspeech 2021 conference, which runs through Sept. 3.

These models can help voice automated customer service lines for banks and retailers, bring video-game or book characters to life, and provide real-time speech synthesis for digital avatars.

NVIDIA's in-house creative team even uses the technology to produce expressive narration for a video series on the power of AI.

Expressive speech synthesis is just one element of NVIDIA Research's work in conversational AI - a field that also encompasses natural language processing, automated speech recognition, keyword detection, audio enhancement and more.

Optimized to run efficiently on NVIDIA GPUs, some of this cutting-edge work has been made open source through the NVIDIA NeMo toolkit, available on our NGC hub of containers and other software.

Behind the Scenes of I AM AI NVIDIA researchers and creative professionals don't just talk the conversational AI talk. They walk the walk, putting groundbreaking speech synthesis models to work in our I AM AI video series, which features global AI innovators reshaping just about every industry imaginable.

But until recently, these videos were narrated by a human. Previous speech synthesis models offered limited control over a synthesized voice's pacing and pitch, so attempts at AI narration didn't evoke the emotional response in viewers that a talented human speaker could.

That changed over the past year when NVIDIA's text-to-speech research team developed more powerful, controllable speech synthesis models like RAD-TTS, used in our winning demo at the SIGGRAPH Real-Time Live competition. By training the text-to-speech model with audio of an individual's speech, RAD-TTS can convert any text prompt into the speaker's voice.

Another of its features is voice conversion, where one speaker's words (or even singing) is delivered in another speaker's voice. Inspired by the idea of the human voice as a musical instrument, the RAD-TTS interface gives users fine-grained, frame-level control over the synthesized voice's pitch, duration and energy.

With this interface, our video producer could record himself reading the video script, and then use the AI model to convert his speech into the female narrator's voice. Using this baseline narration, the producer could then direct the AI like a voice actor - tweaking the synthesized speech to emphasize specific words, and modifying the pacing of the narration to better express the video's tone.

The AI model's capabilities go beyond voiceover work: text-to-speech can be used in gaming, to aid individuals with vocal disabilities or to help users translate between languages in their own voice. It can even recreate the performances of iconic singers, matching not only the melody of a song, but also the emotional expression behind the vocals.

Giving Voice to AI Developers, Researchers With NVIDIA NeMo - an open-source Python toolkit for GPU-accelerated conversational AI - researchers, developers and creators gain a head start in experimenting with, and fine-tuning, speech models for their own applications.

Easy-to-use APIs and models pretrained in NeMo help researchers develop and customize models for text-to-speech, natural language processing and real-time automated speech recognition. Several of the models are trained with tens of thousands of hours of audio data on NVIDIA DGX systems. Developers can fine tune any model for their use cases, speeding up training using mixed-precision computing on NVIDIA Tensor Core GPUs.

Through NGC, NVIDIA NeMo also offers models trained on Mozilla Common Voice, a dataset with nearly 14,000 hours of crowd-sourced speech data in 76 languages. Supported by NVIDIA, the project aims to democratize voice technology with the world's largest open data voice dataset.

Voice Box: NVIDIA Researchers Unpack AI Speech Interspeech brings together more than 1,000 researchers to showcase groundbreaking work in speech technology. At this week's conference, NVIDIA Research is presenting conversational AI model architectures as well as fully formatted speech datasets for developers.

Catch the following sessions led by NVIDIA speakers:

Scene-Agnostic Multi-Microphone Speech Dereverberation - Tues., Aug. 31

SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition - Weds., Sept. 1

Hi-Fi Multi-Speaker English TTS Dataset - Weds., Sept 1

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction - Thurs., Sept. 2

Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices - Friday, Sept. 3

NeMo Inverse Text Normalization: From Development to Production - Friday, Sept. 3

Find NVIDIA NeMo models in the NGC catalog, and tune into talks by NVIDIA researchers at Interspeech.
LINK: https://blogs.nvidia.com/blog/2021/08/31/conversational-ai-research-sp...
See more stories from nvidia

Most recent headlines

04/08/2024

Dalet Appoints Santiago Solanas as CEO to Lead Next Era of Growth and Innovation

Dalet, a leading technology and service provider for media-rich organizations, is excited to announce Santiago Solanas as its new Chief Executive Officer (CEO)....

03/06/2024

Dalet and Veritone Reach Agreement to Distribute, Transact and Monetize Media Archives

Dalet, a leading technology and service provider for media-rich organizations, a...

19/05/2024

TV Techs Weekly Product Wrap-Up

Missed any of our product coverage during your busy week? The TV Tech weekly product and services news wrap-up provides links to all of our coverage from May 13...

19/05/2024

The CW Shares Its 2024-2025 Lineup

LeVar Burton will host the new game show Trivial Pursuit on The CW, while Raven-Symon will host Scrabble....

19/05/2024

Biden, Trump Agree to Debates on CNN, ABC

President Joe Biden and former President Donald Trump have agreed to debates, set for June 27 on CNN and September 10 on ABC....

19/05/2024

End of the Line for Young Sheldon' on May 16

Young Sheldon signs off after seven seasons on CBS, when the series finale airs Thursday, May 18. Jim Parsons and Mayim Bialik reprise their roles as Sheldon Co...

18/05/2024

If Bundling Is Back, What's the Ideal Bundle?

PORTSMOUTH, N.H. Bundling is back in a big way, with all the major streaming companies and many pay TV operators exploring ways to simplify the consumer experie...

18/05/2024

FCC to Vote on LPTV Rules during June Open Meeting

WASHINGTON, D.C. Federal Communications Commission Chairwoman Jessica Rosenworcel has announced a tentative agenda for the June Open Commission Meeting schedule...

18/05/2024

Matthews Launches New Multipurpose Grip Rail Telescopic Grid Pipe Solution

Matthews Studio Equipment has introduced Grip Rail, which the company said offers a better way to mount equipment on location, in the studio, or on the fly....

18/05/2024

IAB Tech Labs, Google Partner on New First Party Data Solution

In a notable development in the industry-wide effort to address privacy concerns while improving efficacy of marketing efforts in a cookieless ad landscape, IAB...

18/05/2024

TV Tech Weekly Product Wrap-Up

Missed any of our product coverage during your busy week? The TV Tech weekly product and services news wrap-up provides links to all of our coverage from May 13...

18/05/2024

DHD Elevates the Art of Podcast Production

DHD Elevates the Art of Podcast Production Brie Clayton May 17, 2024 0 Comments Hero image: the DHD DX2 base and expansion modules Latest-generation ...

17/05/2024

Aerojet Rocketdyne's Camden Site Leverages Modernization Investments to Accelerate Solid Rocket Motor Production

Aerojet Rocketdyne has worked to modernize facilities at its Camden, Arkansas, l...

17/05/2024

FCC Plans to Revise LPTV Rules

The FCC has issued a Notice of Proposed Rulemaking (NPRM) that would revise rules governing low power TV stations (LPTV) in a number of areas, including online ...

17/05/2024

Demystifying Post-Production: Introducing Cinema 4D Particles Week 4

Demystifying Post-Production: Introducing Cinema 4D Particles Week 4 Brie Clayton May 17, 2024 0 Comments With the spring release of Maxon One, we&#...

17/05/2024

Takashi Yamazaki Film Godzilla Minus One Graded with DaVinci Resolve Studio

Takashi Yamazaki Film Godzilla Minus One Graded with DaVinci Resolve Studio Brie Clayton May 17, 2024 0 Comments Hero image credit: 2023 TOHO CO., LT...

17/05/2024

Sterling Event Group Streamlines Live Event Productions with AJA

Sterling Event Group Streamlines Live Event Productions with AJA Brie Clayton May 17, 2024 0 Comments Live event productions only happen once, which ...

17/05/2024

Meet the product manager

Muster Ngobi, product manager at LYNX Technik tells TVBEurope how the ever-evolving media industry provides a truly dynamic working environment By Matthew Corr...

17/05/2024

TV, Streaming Schedule for 2024 NFL Regular Season Is Released

NEW YORK As declines in linear TV viewing make the ongoing popularity of live sports, particularly football, central to financial success of the TV industry, th...

17/05/2024

Netflix Ad Tier Hits 40M Monthly Active Users

During Netflixs second Upfront presentation to advertisers, Amy Reinhard, Netflix's president of advertising, walked advertisers through the continued growt...

17/05/2024

Scripps Promotes Jeff Kiernan to VP, Local News

CINCINNATI The E.W. Scripps Company has added to its leadership team for news by promoting Jeff Kiernan a veteran journalist and general manager of Scripps'...

17/05/2024

Survey: New Disney-Fox-WBD Sports Streamer May Hurt Pay TV Sub Counts

Top executives from Disney, Fox and Warner Bros. Discovery have consistently insisted that their joint venture to launch the Venu Sports streaming bundle in the...

17/05/2024

Caitlin Clark's WNBA Debut Set Viewing Records

ESPN has announced that its coverage of Caitlin Clark's WNBA debut in the Indiana Fever versus the Connecticut Sun season opener was the most-watched WNBA g...

17/05/2024

ATEM Mini Extreme ISO switcher and Blackmagic Pocket Cinema Camera 4K

ATEM Mini Extreme ISO switcher and Blackmagic Pocket Cinema Camera 4K Brie Clayton May 16, 2024 0 Comments Blackmagic Design announced today that Yoic...

17/05/2024

Pixomondo's Virtual Production Academy Expands with Programs at Sony PCL, Vook, and Vancouver Film School

Pixomondo's Virtual Production Academy Expands with Programs at Sony PCL, Vo...

17/05/2024

WBD Upfront Show Offers Peeks at House of the Dragon,' White Lotus,' Biden-Trump Debate

The Warner Bros. Discovery upfront presentation took place Wednesday, May 15 at ...

17/05/2024

The Black Keys, Jelly Roll, Kate Hudson Set To Perform on The Voice' Finale

Season 25 of The Voice wraps on NBC Tuesday, May 21, with performances from The Black Keys, Jelly Roll, Kate Hudson, Lainey Wilson, Muni Long, Thomas Rhett and ...

17/05/2024

CNN Boss Mark Thompson's Plan Includes More News in More Categories on More Devices (Upfronts)

New CNN CEO Mark Thompson spelled out his plan for the struggling news network d...

17/05/2024

Netflix To Launch In-House Advertising Tech Platform

Netflix, a newcomer to the advertising business, said it plans to launch an in-house advertising technology platform....

17/05/2024

Netflix Plots TV Takeover at Upfront Presentation

Netflix shared some programming projects at an upfront presentation in New York. Those include the basketball-themed comedy series Running Point, a Mindy Kaling...

17/05/2024

Plex Geek Week Sale Offers 20% Off Plex Lifetime Pass

Plex is offering movie and music collectors a 20% discount off its Lifetime Plex Pass as part of its Geek Week sale....

17/05/2024

GroupM Names Toby Jenner as President, GroupM Clients

Giant media buyer GroupM said it named Toby Jenner as global president, Group M Clients, a new position at the company....

17/05/2024

Clients of Independent Agencies Boost Programmatic Buying

Smaller advertisers are increasingly buying connected TV programmatically, according to a new report from FreeWheel, Comcast's ad-tech unit....

17/05/2024

TCLtvPlus Adds Streaming Music Channels From Vevo

TCLtvPlus, the streaming app on smart TVs made by TCL, has added live linear channel from music-video programmer Vevo....

17/05/2024

StackAdapt Adopts Data From Samba TV for Programmatic Campaigns

StackAdapt said it made a deal to integrate data from Samba TV into its programmatic advertising platform....

17/05/2024

Tonight on Skeem Saam: Lehasa gets a rude awakening when Kgosi blackmails him

Tonight on Skeem Saam: Lehasa gets a rude awakening when Kgosi blackmails himDon't miss Friday, 17 May's riveting episode of South African soapie Skeem ...

17/05/2024

Tonight on House of Zwide: Dorothy is blown away by Ona's sketches for her wedding dress

Tonight on House of Zwide: Dorothy is blown away by Ona's sketches for her w...

17/05/2024

Tonight on Scandal: Dintle has a visit from her past that leaves her very unsettled

Tonight on Scandal: Dintle has a visit from her past that leaves her very unsett...

17/05/2024

Save Time and Money with WO Traffic v24.0

WO Traffic provides a solid foundation from which stations can manage, execute, and scale end-to-end ad trafficking and sales, both today and into the future. W...

17/05/2024

Broadcast Innovation in India: How AI and Automated Production Helps Smaller Sports Grow

Broadcast Innovation in India: How AI and Automated Production Helps Smaller Spo...

17/05/2024

SVG Sports Cloud Production Forum Gives Refresher Course on Cloud-Based Tools, Ecosystem

SVG Sports Cloud Production Forum Gives Refresher Course on Cloud-Based Tools, E...

17/05/2024

WNBA Tip-Off 2024: Scripps Sports Constructs New Studio for Second Season of WNBA Friday Night Spotlight on ION

WNBA Tip-Off 2024: Scripps Sports Constructs New Studio for Second Season of WNB...

17/05/2024

SVG College Summit 2024: Auburn's War Eagle Productions Breaks Down How They Produce Live Gymnastics Broadcasts

SVG College Summit 2024: Auburn's War Eagle Productions Breaks Down How They...

17/05/2024

Netflix & Shondaland Announce the Song List and Soundtrack for 'Bridgerton' Season 3: Part 1

Back to All News Netflix & Shondaland Announce the Song List and Soundtrack for...

17/05/2024

Skeem Saam: Thursday's episode, 16 May 2024 [video]

Skeem Saam: Thursday's episode, 16 May 2024 [video]Missed an episode of Skeem Saam? No problem! Watch the latest episode of your favourite South African soa...

17/05/2024

Prison Journalism: Letter to my mothers

Prison Journalism: Letter to my mothersThabo Mthembu was incarcerated in Pollsmoor Prison from 2014 to 2019. Read Thabo's story by Thabo Mthembu 17-05-20...

17/05/2024

Paul McCartney becomes UK's first billionaire musician

Paul McCartney becomes UK's first billionaire musicianMusic icon Paul McCartney has become the UK's first billionaire musician, according to the Sunday ...

17/05/2024

Tonight on Smoke and Mirrors: Sakhile advises Tiny against sabotaging Petunia

Tonight on Smoke and Mirrors: Sakhile advises Tiny against sabotaging PetuniaDon't miss Friday, 17 May's riveting episode of South African soapie Smoke ...

17/05/2024

RT'S Operation Transformation comes to a close after 17 seasons

RT has today announced that Operation Transformation (OT) is to end after 17 seasons. As series come to an end each year, RT undertakes an editorial review to...

17/05/2024

Studio One: Your Binaural Beats Lab

By Craig Anderton When I heard about binaural beats, I was interested-I like beats, and I'm into binaural audio. But this has nothing to do with either o...