All the Feels: NVIDIA Shares Expressive Speech Synthesis Research at Interspeech
31/08/2021
But there's still a gap between AI-synthesized speech and the human speech we hear in daily conversation and in the media. That's because people speak with complex rhythm, intonation and timbre that's challenging for AI to emulate.
The gap is closing fast: NVIDIA researchers are building models and tools for high-quality, controllable speech synthesis that capture the richness of human speech, without audio artifacts. Their latest projects are now on display in sessions at the Interspeech 2021 conference, which runs through Sept. 3.
These models can help voice automated customer service lines for banks and retailers, bring video-game or book characters to life, and provide real-time speech synthesis for digital avatars.
NVIDIA's in-house creative team even uses the technology to produce expressive narration for a video series on the power of AI.
Expressive speech synthesis is just one element of NVIDIA Research's work in conversational AI - a field that also encompasses natural language processing, automated speech recognition, keyword detection, audio enhancement and more.
Optimized to run efficiently on NVIDIA GPUs, some of this cutting-edge work has been made open source through the NVIDIA NeMo toolkit, available on our NGC hub of containers and other software.
Behind the Scenes of I AM AI NVIDIA researchers and creative professionals don't just talk the conversational AI talk. They walk the walk, putting groundbreaking speech synthesis models to work in our I AM AI video series, which features global AI innovators reshaping just about every industry imaginable.
But until recently, these videos were narrated by a human. Previous speech synthesis models offered limited control over a synthesized voice's pacing and pitch, so attempts at AI narration didn't evoke the emotional response in viewers that a talented human speaker could.
That changed over the past year when NVIDIA's text-to-speech research team developed more powerful, controllable speech synthesis models like RAD-TTS, used in our winning demo at the SIGGRAPH Real-Time Live competition. By training the text-to-speech model with audio of an individual's speech, RAD-TTS can convert any text prompt into the speaker's voice.
Another of its features is voice conversion, where one speaker's words (or even singing) is delivered in another speaker's voice. Inspired by the idea of the human voice as a musical instrument, the RAD-TTS interface gives users fine-grained, frame-level control over the synthesized voice's pitch, duration and energy.
With this interface, our video producer could record himself reading the video script, and then use the AI model to convert his speech into the female narrator's voice. Using this baseline narration, the producer could then direct the AI like a voice actor - tweaking the synthesized speech to emphasize specific words, and modifying the pacing of the narration to better express the video's tone.
The AI model's capabilities go beyond voiceover work: text-to-speech can be used in gaming, to aid individuals with vocal disabilities or to help users translate between languages in their own voice. It can even recreate the performances of iconic singers, matching not only the melody of a song, but also the emotional expression behind the vocals.
Giving Voice to AI Developers, Researchers With NVIDIA NeMo - an open-source Python toolkit for GPU-accelerated conversational AI - researchers, developers and creators gain a head start in experimenting with, and fine-tuning, speech models for their own applications.
Easy-to-use APIs and models pretrained in NeMo help researchers develop and customize models for text-to-speech, natural language processing and real-time automated speech recognition. Several of the models are trained with tens of thousands of hours of audio data on NVIDIA DGX systems. Developers can fine tune any model for their use cases, speeding up training using mixed-precision computing on NVIDIA Tensor Core GPUs.
Through NGC, NVIDIA NeMo also offers models trained on Mozilla Common Voice, a dataset with nearly 14,000 hours of crowd-sourced speech data in 76 languages. Supported by NVIDIA, the project aims to democratize voice technology with the world's largest open data voice dataset.
Voice Box: NVIDIA Researchers Unpack AI Speech Interspeech brings together more than 1,000 researchers to showcase groundbreaking work in speech technology. At this week's conference, NVIDIA Research is presenting conversational AI model architectures as well as fully formatted speech datasets for developers.
Catch the following sessions led by NVIDIA speakers:
Scene-Agnostic Multi-Microphone Speech Dereverberation - Tues., Aug. 31
SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition - Weds., Sept. 1
Hi-Fi Multi-Speaker English TTS Dataset - Weds., Sept 1
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction - Thurs., Sept. 2
Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices - Friday, Sept. 3
NeMo Inverse Text Normalization: From Development to Production - Friday, Sept. 3
Find NVIDIA NeMo models in the NGC catalog, and tune into talks by NVIDIA researchers at Interspeech.
LINK: | https://blogs.nvidia.com/blog/2021/08/31/conversational-ai-research-sp... |
See more stories from nvidia |
Most recent headlines
04/08/2024
Dalet Appoints Santiago Solanas as CEO to Lead Next Era of Growth and Innovation
Dalet, a leading technology and service provider for media-rich organizations, is excited to announce Santiago Solanas as its new Chief Executive Officer (CEO)....
03/06/2024
Dalet and Veritone Reach Agreement to Distribute, Transact and Monetize Media Archives
Dalet, a leading technology and service provider for media-rich organizations, a...
30/04/2024
SES Announces Acqusition of Intelsat for $3.1B
European telecom satellite company SES announced today that it is acquiring rival Intelsat for $3.1 billion. The deal, unanimously approved by both companies...
30/04/2024
Bakish Out at Paramount
Bob Bakish, chief executive of Paramount stepped down on Monday, effective immediately. Bakish, who was a staunch ally of Shari Redstone, controlling shareholde...
30/04/2024
Hotspring unveils Slapshot ML-driven rotoscoping solution
Hotspring said the breakthrough had been achieved through the acceleration of machine learning technology By Matthew Corrigan Published: April 30, 2024 Ho...
30/04/2024
SES to acquire Intelsat in $3.1 billion deal
Speaking about the deal, Adel Al-Saleh, CEO of SES, said: it will bring value-added, efficient, and reliable offerings for both companies media customers By Je...
30/04/2024
Bob Bakish officially out at Paramount, new Office of the CEO finalising long-term plan'
Paramount said the Office of the CEO is working with its board to develop a comp...
30/04/2024
Gray Television Using NextGen TV to Roll Out Advanced Features Like HDR
LOUISVILLE, Ky. In another sign that broadcasters are finally using NextGen TV broadcasts to offer viewers new features instead of simply simulcasting existing ...
30/04/2024
Oklahoma Journalism Hall of Fame To Honor Blaise Labbe
OKLAHOMA CITY The Oklahoma Journalism Hall of Fame will induct Blaise Labbe, a group news director for Sinclair Broadcast Group, May 2 at the 54th Anniversary I...
30/04/2024
Cox Media Group, Dish Ink New Multi-Year Agreement
ATLANTA and ENGLEWOOD, Colo. Cox Media Group and Dish have ended a dispute over retransmission fees that lasted nearly a year and a half with a new multi-year ...
30/04/2024
TelevisaUnivision Taps LTN for IP-Based Distribution
Video transport solution provider LTN has announced that it is working with TelevisaUnivision to help the Spanish-language media giant migrate to IP-based video...
30/04/2024
NBC KXAS and Telemundo KXTX Used Clear-Com Solutions for Eclipse Coverage
ALAMEDA, Calif. Clear-Com has released details about how it played a pivotal role in the seamless coverage of the recent solar eclipse on April 8, 2024 by NBC K...
30/04/2024
Scripps Appoints Seth Walters To Steer CTV Sales Strategy
CINCINNATI The E.W. Scripps Company is adding Seth Walters to its sales leadership team as head of CTV sales, effective April 29....
30/04/2024
Durham Bulls Saturday Game, First Sellout of 2024 Season
The force was strong at the Durham Bulls Athletic Park on Saturday night, April 27, 2024, as the night netted the ballpark's first official sellout crowd of...
30/04/2024
NewFront Life Is a Cabaret for LGBTQ+ Network Revry
Revry, a network celebrating queer culture. said its second annual NewFront presentation will be a cabaret-type spectacle showcasing the spirit of the LGBTQ+ co...
30/04/2024
FETV Viewers Say Howdy' as Wyatt Earp' Joins Lineup
Family Entertainment Television (FETV) said that the Western series The Life and Legend of Wyatt Earp will be joining the network's afternoon lineup startin...
30/04/2024
Bob Bakish Departing Paramount, According to Reports
Bob Bakish, president and CEO of Paramount Global, will depart the company as Paramount explores a merger, according to numerous published reports. The reports ...
30/04/2024
SES to acquire Intelsat in compelling transaction focused on the future
Value accretive transaction underpinned by 2.4 billion (NPV) of readily executable synergies. Creating a stronger multi-orbit operator with 60% of revenue in...
30/04/2024
Thales reports its order intake and sales for the first quarter of 2024
Facebook Twitter LinkedIn Order intake: 5.0 billion, up 47% ( 46% on an organic basis1 ) Sales: 4.4 billion, up 9.8% ( 7.9% on an organic basis) All f...
29/04/2024
Filming begins and further cast confirmed in Mark Gatiss drama Bookish for Alibi
(L-R) Mark Gatiss and Polly Walker on set Monday 29 April 2024 Further casting is now confirmed for Bookish (6x70), a brand-new drama created by Emmy Award-wi...
29/04/2024
She Speaks Sudan': Thomson launches programme to empower women journalists and civil society actors in exile
A groundbreaking initiative offering training and mentoring to 30 women journali...
29/04/2024
Meet the 2024 Sundance Institute Directors, Screenwriters, and Native Lab Fellows
Even though we've been doing this for over 40 years, a spark of excitement r...
29/04/2024
The AN/PRC-158: A Resilient Communications Bridge Between Air and Ground
L3Harris is delivering manpack radios to U.S. Army CH-47 Chinooks as part of the Air-to-Ground Networking Radio program, providing seamless, resilient communica...
29/04/2024
CS President Sam Mehta: Resilient Communications are Critical to Realizing JADC2
He writes in Defense One: Despite the near-universal acknowledgement throughout the U.S. government and defense industrial base of the criticality of resilient ...
29/04/2024
Clear-Com Enhances The Kennedy Center with Seamless Communication Solutions
eds3_5_jq(document).ready(function($) { $(#eds_sliderM519).chameleonSlider_2_1({ content_source:......
29/04/2024
Optimising audio loudness & normalisation across the Media Supply Chain
Codemill aims to revolutionize media workflow efficiency at this years NAB Show by introducing Just-In-Time (JIT) playback technology in Accurate.Video Validate...
29/04/2024
TF1 Chooses Broadpeak to Power Targeted Advertising for New Video Streaming Service
April 29, 2024 -- TF1 Chooses Broadpeak to Power Targeted Advertising for New...
29/04/2024
LG Adds Allen Media Group's Local Now FAST Channel in 223 Markets
LOS ANGELES Allen Media Group (AMG) has partnered with LG Electronics to bring 223 Local Now FAST channels to LG's free streaming service, LG Channels, avai...
29/04/2024
Ross Video Unveils Raiden Weather Graphics System
OTTAWA Ross Video has announced the introduction of Raiden, a data-driven weather graphics software that combines data gathering, processing, and visualization ...
29/04/2024
Melanie Georgieva Joins Panalux as Long Form Sales Direct...
Panalux, a leading rental provider of lighting and power solutions for the motion-picture industry and part of Panavision's end-to-end service offerings for...
29/04/2024
Cobalt Iron Earns Patent on Analytics-Based Dynamic Autho...
Cobalt Iron Inc., a leading provider of SaaS-based enterprise data protection, today announced that it has received a patent on its technology for dynamic autho...
29/04/2024
SDVI Rally Access Workstation Earns Two Top Awards at 202...
SDVI, the leading platform provider for cloud-native media supply chains, today announced that Rally Access Workstation, a fully managed solution for editing in...
29/04/2024
Premier Sports selects QuickLink Remote Commentary soluti...
Premier Sports, a premium sports broadcaster, has selected QuickLink's Remote Commentary solution for introducing professional, high-quality remote commenta...
29/04/2024
Glookast picks Jigsaw24 Media as exclusive UK channel par...
Glookast has chosen Jigsaw24 Media as the only UK channel partner to represent their portfolio of ingest and workflow optimisation products. The agreement, sign...
29/04/2024
Clear-Coms Eclipse HX and Agent-IC Technology Illuminate...
Clear-Com played a pivotal role in the seamless coverage of the recent solar eclipse on April 8, 2024. Leveraging its cutting-edge Eclipse HX Digital Matrix i...
29/04/2024
PlayBox Neo to Promote Latest Smart Media Playout Innovat...
PlayBox Neo will promote its complete range of television channel management, graphic branding and playout solutions to EMEA region media content owners and bro...
29/04/2024
RuPaul Game Show Lingo' Returns to CBS May 24
Lingo, RuPaul's word-twisting game show, returns for season two on CBS Friday, May 24. Two episodes air that night, and stream on Paramount Plus, too....
29/04/2024
ESPN, Amazon Prime Video Reportedly Close To New Deals With the NBA
ESPN and Amazon Prime Video are reportedly close to scoring television rights to the National Basketball Association, according to published reports....
29/04/2024
Judge Judy,' Hot Bench' Renewed for 2 More Years
Judge Judy and Hot Bench, CBS Media Ventures' genre-leading court shows, have been renewed through the 2025-26 TV season in more than 95% of the country, Gr...
29/04/2024
Irish Sports Broadcaster Premier Sports Taps QuickLink for Remote Commentary
Premier Sports, an Irish-based premium sports broadcaster, has selected QuickLink's Remote Commentary solution for introducing professional, high-quality re...
29/04/2024
BAFTA Television Craft Awards winners announced
The awards celebrate the craft of behind-the-scenes TV talent and the best programmes of 2023 By Matthew Corrigan Published: April 29, 2024 The awards cel...
29/04/2024
Watch: How Milk VFX helped create 259 shots for Netflix's Scoop
The team at Milk had to create and deliver the VFX and environment work for the royal residences featured in the drama from scratch By Jenny Priestley Publis...
29/04/2024
What's going on at Paramount Global?
CEO Bob Bakish is expected to leave the company as early as today, with a new leadership committee likely to run the company on an interim basis By Jenny Pries...
29/04/2024
Meet the director of media and entertainment and strategic products
Albena Ivanova, director, media and entertainment and strategic products at CHAOS talks to TVBEurope about her route into the industry By TVBEurope Staff Pub...
29/04/2024
Anna Valley brand name acquired by AV company Grand Technix
The acquisition gives Grand Technix the opportunity to expand its footprint in the audio visual and broadcast technology sectors By Jenny Priestley Published...
29/04/2024
Screen Australia announces James J. Robinson's debut feature First Light
29 04 2024 - Media release Screen Australia announces James J. Robinson's debut feature First Light First Light Principal Photography is underway on Firs...
29/04/2024
Capitol Broadcasting Becomes First Company Inducted into NC Media and Journalism Hall of Fame
Capitol Broadcasting recently became the inaugural company honored with inductio...
29/04/2024
Tonight on House of Zwide: Zanele tells a surprised Zola that Faith fully supports her dropping charges against him
Tonight on House of Zwide: Zanele tells a surprised Zola that Faith fully suppor...
29/04/2024
Tonight on Scandal: Nhlamulo's request has Mdala reeling
Tonight on Scandal: A line is crossed and a mother starts to lose focusDon't miss Friday, 26 April's riveting episode of South African soapie Scandal! o...