Sony Pixel Power calrec Sony

All the Feels: NVIDIA Shares Expressive Speech Synthesis Research at Interspeech

31/08/2021

AI has transformed synthesized speech from the monotone of robocalls and decades-old GPS navigation systems to the polished tone of virtual assistants in smartphones and smart speakers.

But there's still a gap between AI-synthesized speech and the human speech we hear in daily conversation and in the media. That's because people speak with complex rhythm, intonation and timbre that's challenging for AI to emulate.

The gap is closing fast: NVIDIA researchers are building models and tools for high-quality, controllable speech synthesis that capture the richness of human speech, without audio artifacts. Their latest projects are now on display in sessions at the Interspeech 2021 conference, which runs through Sept. 3.

These models can help voice automated customer service lines for banks and retailers, bring video-game or book characters to life, and provide real-time speech synthesis for digital avatars.

NVIDIA's in-house creative team even uses the technology to produce expressive narration for a video series on the power of AI.

Expressive speech synthesis is just one element of NVIDIA Research's work in conversational AI - a field that also encompasses natural language processing, automated speech recognition, keyword detection, audio enhancement and more.

Optimized to run efficiently on NVIDIA GPUs, some of this cutting-edge work has been made open source through the NVIDIA NeMo toolkit, available on our NGC hub of containers and other software.

Behind the Scenes of I AM AI NVIDIA researchers and creative professionals don't just talk the conversational AI talk. They walk the walk, putting groundbreaking speech synthesis models to work in our I AM AI video series, which features global AI innovators reshaping just about every industry imaginable.

But until recently, these videos were narrated by a human. Previous speech synthesis models offered limited control over a synthesized voice's pacing and pitch, so attempts at AI narration didn't evoke the emotional response in viewers that a talented human speaker could.

That changed over the past year when NVIDIA's text-to-speech research team developed more powerful, controllable speech synthesis models like RAD-TTS, used in our winning demo at the SIGGRAPH Real-Time Live competition. By training the text-to-speech model with audio of an individual's speech, RAD-TTS can convert any text prompt into the speaker's voice.

Another of its features is voice conversion, where one speaker's words (or even singing) is delivered in another speaker's voice. Inspired by the idea of the human voice as a musical instrument, the RAD-TTS interface gives users fine-grained, frame-level control over the synthesized voice's pitch, duration and energy.

With this interface, our video producer could record himself reading the video script, and then use the AI model to convert his speech into the female narrator's voice. Using this baseline narration, the producer could then direct the AI like a voice actor - tweaking the synthesized speech to emphasize specific words, and modifying the pacing of the narration to better express the video's tone.

The AI model's capabilities go beyond voiceover work: text-to-speech can be used in gaming, to aid individuals with vocal disabilities or to help users translate between languages in their own voice. It can even recreate the performances of iconic singers, matching not only the melody of a song, but also the emotional expression behind the vocals.

Giving Voice to AI Developers, Researchers With NVIDIA NeMo - an open-source Python toolkit for GPU-accelerated conversational AI - researchers, developers and creators gain a head start in experimenting with, and fine-tuning, speech models for their own applications.

Easy-to-use APIs and models pretrained in NeMo help researchers develop and customize models for text-to-speech, natural language processing and real-time automated speech recognition. Several of the models are trained with tens of thousands of hours of audio data on NVIDIA DGX systems. Developers can fine tune any model for their use cases, speeding up training using mixed-precision computing on NVIDIA Tensor Core GPUs.

Through NGC, NVIDIA NeMo also offers models trained on Mozilla Common Voice, a dataset with nearly 14,000 hours of crowd-sourced speech data in 76 languages. Supported by NVIDIA, the project aims to democratize voice technology with the world's largest open data voice dataset.

Voice Box: NVIDIA Researchers Unpack AI Speech Interspeech brings together more than 1,000 researchers to showcase groundbreaking work in speech technology. At this week's conference, NVIDIA Research is presenting conversational AI model architectures as well as fully formatted speech datasets for developers.

Catch the following sessions led by NVIDIA speakers:

Scene-Agnostic Multi-Microphone Speech Dereverberation - Tues., Aug. 31

SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition - Weds., Sept. 1

Hi-Fi Multi-Speaker English TTS Dataset - Weds., Sept 1

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction - Thurs., Sept. 2

Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices - Friday, Sept. 3

NeMo Inverse Text Normalization: From Development to Production - Friday, Sept. 3

Find NVIDIA NeMo models in the NGC catalog, and tune into talks by NVIDIA researchers at Interspeech.
LINK: https://blogs.nvidia.com/blog/2021/08/31/conversational-ai-research-sp...
See more stories from nvidia

Most recent headlines

04/08/2024

Dalet Appoints Santiago Solanas as CEO to Lead Next Era of Growth and Innovation

Dalet, a leading technology and service provider for media-rich organizations, is excited to announce Santiago Solanas as its new Chief Executive Officer (CEO)....

03/06/2024

Dalet and Veritone Reach Agreement to Distribute, Transact and Monetize Media Archives

Dalet, a leading technology and service provider for media-rich organizations, a...

30/04/2024

SES Announces Acqusition of Intelsat for $3.1B

European telecom satellite company SES announced today that it is acquiring rival Intelsat for $3.1 billion. The deal, unanimously approved by both companies...

30/04/2024

Bakish Out at Paramount

Bob Bakish, chief executive of Paramount stepped down on Monday, effective immediately. Bakish, who was a staunch ally of Shari Redstone, controlling shareholde...

30/04/2024

Hotspring unveils Slapshot ML-driven rotoscoping solution

Hotspring said the breakthrough had been achieved through the acceleration of machine learning technology By Matthew Corrigan Published: April 30, 2024 Ho...

30/04/2024

SES to acquire Intelsat in $3.1 billion deal

Speaking about the deal, Adel Al-Saleh, CEO of SES, said: it will bring value-added, efficient, and reliable offerings for both companies media customers By Je...

30/04/2024

Bob Bakish officially out at Paramount, new Office of the CEO finalising long-term plan'

Paramount said the Office of the CEO is working with its board to develop a comp...

30/04/2024

Gray Television Using NextGen TV to Roll Out Advanced Features Like HDR

LOUISVILLE, Ky. In another sign that broadcasters are finally using NextGen TV broadcasts to offer viewers new features instead of simply simulcasting existing ...

30/04/2024

Oklahoma Journalism Hall of Fame To Honor Blaise Labbe

OKLAHOMA CITY The Oklahoma Journalism Hall of Fame will induct Blaise Labbe, a group news director for Sinclair Broadcast Group, May 2 at the 54th Anniversary I...

30/04/2024

Cox Media Group, Dish Ink New Multi-Year Agreement

ATLANTA and ENGLEWOOD, Colo. Cox Media Group and Dish have ended a dispute over retransmission fees that lasted nearly a year and a half with a new multi-year ...

30/04/2024

TelevisaUnivision Taps LTN for IP-Based Distribution

Video transport solution provider LTN has announced that it is working with TelevisaUnivision to help the Spanish-language media giant migrate to IP-based video...

30/04/2024

NBC KXAS and Telemundo KXTX Used Clear-Com Solutions for Eclipse Coverage

ALAMEDA, Calif. Clear-Com has released details about how it played a pivotal role in the seamless coverage of the recent solar eclipse on April 8, 2024 by NBC K...

30/04/2024

Scripps Appoints Seth Walters To Steer CTV Sales Strategy

CINCINNATI The E.W. Scripps Company is adding Seth Walters to its sales leadership team as head of CTV sales, effective April 29....

30/04/2024

Durham Bulls Saturday Game, First Sellout of 2024 Season

The force was strong at the Durham Bulls Athletic Park on Saturday night, April 27, 2024, as the night netted the ballpark's first official sellout crowd of...

30/04/2024

NewFront Life Is a Cabaret for LGBTQ+ Network Revry

Revry, a network celebrating queer culture. said its second annual NewFront presentation will be a cabaret-type spectacle showcasing the spirit of the LGBTQ+ co...

30/04/2024

FETV Viewers Say Howdy' as Wyatt Earp' Joins Lineup

Family Entertainment Television (FETV) said that the Western series The Life and Legend of Wyatt Earp will be joining the network's afternoon lineup startin...

30/04/2024

Bob Bakish Departing Paramount, According to Reports

Bob Bakish, president and CEO of Paramount Global, will depart the company as Paramount explores a merger, according to numerous published reports. The reports ...

30/04/2024

SES to acquire Intelsat in compelling transaction focused on the future

Value accretive transaction underpinned by 2.4 billion (NPV) of readily executable synergies. Creating a stronger multi-orbit operator with 60% of revenue in...

30/04/2024

Thales reports its order intake and sales for the first quarter of 2024

Facebook Twitter LinkedIn Order intake: 5.0 billion, up 47% ( 46% on an organic basis1 ) Sales: 4.4 billion, up 9.8% ( 7.9% on an organic basis) All f...

29/04/2024

Filming begins and further cast confirmed in Mark Gatiss drama Bookish for Alibi

(L-R) Mark Gatiss and Polly Walker on set Monday 29 April 2024 Further casting is now confirmed for Bookish (6x70), a brand-new drama created by Emmy Award-wi...

29/04/2024

She Speaks Sudan': Thomson launches programme to empower women journalists and civil society actors in exile

A groundbreaking initiative offering training and mentoring to 30 women journali...

29/04/2024

Meet the 2024 Sundance Institute Directors, Screenwriters, and Native Lab Fellows

Even though we've been doing this for over 40 years, a spark of excitement r...

29/04/2024

The AN/PRC-158: A Resilient Communications Bridge Between Air and Ground

L3Harris is delivering manpack radios to U.S. Army CH-47 Chinooks as part of the Air-to-Ground Networking Radio program, providing seamless, resilient communica...

29/04/2024

CS President Sam Mehta: Resilient Communications are Critical to Realizing JADC2

He writes in Defense One: Despite the near-universal acknowledgement throughout the U.S. government and defense industrial base of the criticality of resilient ...

29/04/2024

Clear-Com Enhances The Kennedy Center with Seamless Communication Solutions

eds3_5_jq(document).ready(function($) { $(#eds_sliderM519).chameleonSlider_2_1({ content_source:......

29/04/2024

Optimising audio loudness & normalisation across the Media Supply Chain

Codemill aims to revolutionize media workflow efficiency at this years NAB Show by introducing Just-In-Time (JIT) playback technology in Accurate.Video Validate...

29/04/2024

TF1 Chooses Broadpeak to Power Targeted Advertising for New Video Streaming Service

April 29, 2024 -- TF1 Chooses Broadpeak to Power Targeted Advertising for New...

29/04/2024

LG Adds Allen Media Group's Local Now FAST Channel in 223 Markets

LOS ANGELES Allen Media Group (AMG) has partnered with LG Electronics to bring 223 Local Now FAST channels to LG's free streaming service, LG Channels, avai...

29/04/2024

Ross Video Unveils Raiden Weather Graphics System

OTTAWA Ross Video has announced the introduction of Raiden, a data-driven weather graphics software that combines data gathering, processing, and visualization ...

29/04/2024

Melanie Georgieva Joins Panalux as Long Form Sales Direct...

Panalux, a leading rental provider of lighting and power solutions for the motion-picture industry and part of Panavision's end-to-end service offerings for...

29/04/2024

Cobalt Iron Earns Patent on Analytics-Based Dynamic Autho...

Cobalt Iron Inc., a leading provider of SaaS-based enterprise data protection, today announced that it has received a patent on its technology for dynamic autho...

29/04/2024

SDVI Rally Access Workstation Earns Two Top Awards at 202...

SDVI, the leading platform provider for cloud-native media supply chains, today announced that Rally Access Workstation, a fully managed solution for editing in...

29/04/2024

Premier Sports selects QuickLink Remote Commentary soluti...

Premier Sports, a premium sports broadcaster, has selected QuickLink's Remote Commentary solution for introducing professional, high-quality remote commenta...

29/04/2024

Glookast picks Jigsaw24 Media as exclusive UK channel par...

Glookast has chosen Jigsaw24 Media as the only UK channel partner to represent their portfolio of ingest and workflow optimisation products. The agreement, sign...

29/04/2024

Clear-Coms Eclipse HX and Agent-IC Technology Illuminate...

Clear-Com played a pivotal role in the seamless coverage of the recent solar eclipse on April 8, 2024. Leveraging its cutting-edge Eclipse HX Digital Matrix i...

29/04/2024

PlayBox Neo to Promote Latest Smart Media Playout Innovat...

PlayBox Neo will promote its complete range of television channel management, graphic branding and playout solutions to EMEA region media content owners and bro...

29/04/2024

RuPaul Game Show Lingo' Returns to CBS May 24

Lingo, RuPaul's word-twisting game show, returns for season two on CBS Friday, May 24. Two episodes air that night, and stream on Paramount Plus, too....

29/04/2024

ESPN, Amazon Prime Video Reportedly Close To New Deals With the NBA

ESPN and Amazon Prime Video are reportedly close to scoring television rights to the National Basketball Association, according to published reports....

29/04/2024

Judge Judy,' Hot Bench' Renewed for 2 More Years

Judge Judy and Hot Bench, CBS Media Ventures' genre-leading court shows, have been renewed through the 2025-26 TV season in more than 95% of the country, Gr...

29/04/2024

Irish Sports Broadcaster Premier Sports Taps QuickLink for Remote Commentary

Premier Sports, an Irish-based premium sports broadcaster, has selected QuickLink's Remote Commentary solution for introducing professional, high-quality re...

29/04/2024

BAFTA Television Craft Awards winners announced

The awards celebrate the craft of behind-the-scenes TV talent and the best programmes of 2023 By Matthew Corrigan Published: April 29, 2024 The awards cel...

29/04/2024

Watch: How Milk VFX helped create 259 shots for Netflix's Scoop

The team at Milk had to create and deliver the VFX and environment work for the royal residences featured in the drama from scratch By Jenny Priestley Publis...

29/04/2024

What's going on at Paramount Global?

CEO Bob Bakish is expected to leave the company as early as today, with a new leadership committee likely to run the company on an interim basis By Jenny Pries...

29/04/2024

Meet the director of media and entertainment and strategic products

Albena Ivanova, director, media and entertainment and strategic products at CHAOS talks to TVBEurope about her route into the industry By TVBEurope Staff Pub...

29/04/2024

Anna Valley brand name acquired by AV company Grand Technix

The acquisition gives Grand Technix the opportunity to expand its footprint in the audio visual and broadcast technology sectors By Jenny Priestley Published...

29/04/2024

Screen Australia announces James J. Robinson's debut feature First Light

29 04 2024 - Media release Screen Australia announces James J. Robinson's debut feature First Light First Light Principal Photography is underway on Firs...

29/04/2024

Capitol Broadcasting Becomes First Company Inducted into NC Media and Journalism Hall of Fame

Capitol Broadcasting recently became the inaugural company honored with inductio...

29/04/2024

Tonight on House of Zwide: Zanele tells a surprised Zola that Faith fully supports her dropping charges against him

Tonight on House of Zwide: Zanele tells a surprised Zola that Faith fully suppor...

29/04/2024

Tonight on Scandal: Nhlamulo's request has Mdala reeling

Tonight on Scandal: A line is crossed and a mother starts to lose focusDon't miss Friday, 26 April's riveting episode of South African soapie Scandal! o...