
AI has transformed synthesized speech from the monotone of robocalls and decades-old GPS navigation systems to the polished tone of virtual assistants in smartphones and smart speakers.
But there's still a gap between AI-synthesized speech and the human speech we hear in daily conversation and in the media. That's because people speak with complex rhythm, intonation and timbre that's challenging for AI to emulate.
The gap is closing fast: NVIDIA researchers are building models and tools for high-quality, controllable speech synthesis that capture the richness of human speech, without audio artifacts. Their latest projects are now on display in sessions at the Interspeech 2021 conference, which runs through Sept. 3.
These models can help voice automated customer service lines for banks and retailers, bring video-game or book characters to life, and provide real-time speech synthesis for digital avatars.
NVIDIA's in-house creative team even uses the technology to produce expressive narration for a video series on the power of AI.
Expressive speech synthesis is just one element of NVIDIA Research's work in conversational AI - a field that also encompasses natural language processing, automated speech recognition, keyword detection, audio enhancement and more.
Optimized to run efficiently on NVIDIA GPUs, some of this cutting-edge work has been made open source through the NVIDIA NeMo toolkit, available on our NGC hub of containers and other software.
Behind the Scenes of I AM AI NVIDIA researchers and creative professionals don't just talk the conversational AI talk. They walk the walk, putting groundbreaking speech synthesis models to work in our I AM AI video series, which features global AI innovators reshaping just about every industry imaginable.
But until recently, these videos were narrated by a human. Previous speech synthesis models offered limited control over a synthesized voice's pacing and pitch, so attempts at AI narration didn't evoke the emotional response in viewers that a talented human speaker could.
That changed over the past year when NVIDIA's text-to-speech research team developed more powerful, controllable speech synthesis models like RAD-TTS, used in our winning demo at the SIGGRAPH Real-Time Live competition. By training the text-to-speech model with audio of an individual's speech, RAD-TTS can convert any text prompt into the speaker's voice.
Another of its features is voice conversion, where one speaker's words (or even singing) is delivered in another speaker's voice. Inspired by the idea of the human voice as a musical instrument, the RAD-TTS interface gives users fine-grained, frame-level control over the synthesized voice's pitch, duration and energy.
With this interface, our video producer could record himself reading the video script, and then use the AI model to convert his speech into the female narrator's voice. Using this baseline narration, the producer could then direct the AI like a voice actor - tweaking the synthesized speech to emphasize specific words, and modifying the pacing of the narration to better express the video's tone.
The AI model's capabilities go beyond voiceover work: text-to-speech can be used in gaming, to aid individuals with vocal disabilities or to help users translate between languages in their own voice. It can even recreate the performances of iconic singers, matching not only the melody of a song, but also the emotional expression behind the vocals.
Giving Voice to AI Developers, Researchers With NVIDIA NeMo - an open-source Python toolkit for GPU-accelerated conversational AI - researchers, developers and creators gain a head start in experimenting with, and fine-tuning, speech models for their own applications.
Easy-to-use APIs and models pretrained in NeMo help researchers develop and customize models for text-to-speech, natural language processing and real-time automated speech recognition. Several of the models are trained with tens of thousands of hours of audio data on NVIDIA DGX systems. Developers can fine tune any model for their use cases, speeding up training using mixed-precision computing on NVIDIA Tensor Core GPUs.
Through NGC, NVIDIA NeMo also offers models trained on Mozilla Common Voice, a dataset with nearly 14,000 hours of crowd-sourced speech data in 76 languages. Supported by NVIDIA, the project aims to democratize voice technology with the world's largest open data voice dataset.
Voice Box: NVIDIA Researchers Unpack AI Speech Interspeech brings together more than 1,000 researchers to showcase groundbreaking work in speech technology. At this week's conference, NVIDIA Research is presenting conversational AI model architectures as well as fully formatted speech datasets for developers.
Catch the following sessions led by NVIDIA speakers:
Scene-Agnostic Multi-Microphone Speech Dereverberation - Tues., Aug. 31
SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition - Weds., Sept. 1
Hi-Fi Multi-Speaker English TTS Dataset - Weds., Sept 1
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction - Thurs., Sept. 2
Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices - Friday, Sept. 3
NeMo Inverse Text Normalization: From Development to Production - Friday, Sept. 3
Find NVIDIA NeMo models in the NGC catalog, and tune into talks by NVIDIA researchers at Interspeech.
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
13/10/2025
Spectrum Brings Selected L.A. Lakers Games to Apple Vision Pro With New Immersiv...
13/10/2025
Media Climate Accord aims to offer united approach to M&E industry sustainabilit...
13/10/2025
Riot Games streamlines production of Valorant Champions Paris with ST 2110 flypa...
13/10/2025
Feeling the NRG: Riot Games puts on a show for Valorant Champions Paris final By Jo Ruddock
Monday, October 13, 2025 - 09:17
Print This Story
After more t...
13/10/2025
FOX Sports MLB Postseason Audio Aims To Make Officials' Calls More AccurateA1 Joe Carpenter hopes to bring some baseball CSI' to the ABS ump-cam system...
13/10/2025
New SBS and NITV Original RECKLESS a Deadly Funny Thriller Straight Out of Fre...
13/10/2025
Regional sports network moves from satellite to IP to cut distribution costs by more than half and streamline broadcast and direct-to-consumer delivery
Mid-Atl...
13/10/2025
Delta Live, the award-winning audio supplier, has underlined its position at the forefront of live sound with significant investments in cutting edge audio syst...
13/10/2025
Abu Dhabi, UAE October 13, 2025: Space42 (ADX: SPACE42), the UAE-based AI-powe...
13/10/2025
Nick Blood and Saffron Hocking lead casting for Hit Point, brand new original drama series for U and U&Dave
Developed & Produced by Urban Myth Films (a STUDIOC...
13/10/2025
The series from A24 will land in the UK & Ireland in 2026Monday 13 October 2025
...
13/10/2025
Back to All News
Grand Galaxy Hotel' Open for Business: Netflix Confirms Production and Cast
Entertainment
13 October 2025
GlobalSouth Korea
Link copi...
13/10/2025
Back to All News
Netflix Partners with GOBELINS Paris and Guillermo del Toro to...
13/10/2025
Back to All News
Stories Set to Thrill, Move, and Entertain: Netflix Announces ...
13/10/2025
Fox Corporation Executives to Discuss First Quarter Fiscal 2026 Financial Result...
13/10/2025
At the OCP Global Summit, NVIDIA is offering a glimpse into the future of gigawa...
13/10/2025
Season 2 brings murder and West of Ireland humour - and rain - to our screens, with M ir ad Tyers joining the cast
Watch trailer here.
A small-town obituary w...
13/10/2025
The Katie Hannon Interview Live airs tonight & Wednesday night at 7pm
As part of RT 's comprehensive election campaign coverage, journalist Katie Hannon w...
11/10/2025
SVG New Sponsor Spotlight: TAB M Solutions' Joe Wire, Kevin Tucker on Guidin...
11/10/2025
By Jessica Herndon
One of the most exciting things about the Sundance Film Fest...
11/10/2025
STAMFORD, Conn. In a move that highlights the growing importance of streaming apps on pay TV platforms, Charter Communications' Spectrum operating brand has...
11/10/2025
Netflix is expanding its video game offerings from mobile into TV by launching party games that its subscribers can play on smart TVs....
11/10/2025
STAMFORD, Conn. Charter Communications' Spectrum News has reached an deal with Comcast to expand distribution of its local news channels to Xfinity TV cust...
11/10/2025
Professional podcasts are booming. They're an effective way to establish company executives as industry leaders, humanize a large organization, drill down o...
11/10/2025
PlayBox Neo, a leading provider of media playout and channel branding solutions, will present its PlayBox Neo Suite media platform for the first time in the U.S...
11/10/2025
As a testament to its commitment to the broadcast market, FOR-A America will bring several popular and future-facing technologies to the NAB Show New York, runn...
11/10/2025
European technology developer Profuz Digital reflects on another successful IBC Show in Amsterdam from 12 15 September after showcasing the latest version of ...
11/10/2025
Cobalt Digital, the leading designer and manufacturer of award-winning signal processing products, and a founding partner in the openGear initiative, is headin...
11/10/2025
Lightware, an industry leader in signal management, is at the center of a growing range of high-profile integrations with its UBEX platform. Built to deliver un...
11/10/2025
FOR-A Latin America and the Caribbean (LAC) will bring its industry-leading signal processing, frame rate conversion and graphics playout software to CAPER 2025...
11/10/2025
Clear-Com is happy to announce its latest collaboration with BNE Productions, a premier production company known for delivering world-class audio for live even...
11/10/2025
Dean's List: Tommy Neblett Shares His YouTube Top Five Boston Conservatory's dean of dance reveals his favorite student dance videos.
By
Sarah Godcher...
10/10/2025
SVG New Sponsor Spotlight: TAB M Solutions' Joe Wire, Jeff Tucker on Guiding...
10/10/2025
SVG Students To Watch: Vincent Macri, Monmouth University The Jersey local runs Camera 1 on Hawks games and is expanding into technical directing By Brandon Co...
10/10/2025
Flexible budgets: Inside the DFL's new customisable camera concepts for Bund...
10/10/2025
Facing the future: TVN on its technical services for the new Bundesliga season with remote production and all the bells and whistles By Heather McLean
Monday...
10/10/2025
Evolving in-house: Developing broadcast expertise and pushing the women's ga...
10/10/2025
Growing the game: The Deutscher Fu ball-Bund on pushing production innovation fo...
10/10/2025
Proximity and authenticity: DFL kicks off the new football season with more broa...
10/10/2025
Spectrum Brings Select L.A. Lakers Games to Apple Vision Pro With New Immersive ...
10/10/2025
From left, Scoot McNairy, Andrew Durham, Nessa Dougherty, and Emilia Jones attend the premiere of Fairyland at the 2023 Sundance Film Festival. Photo by Jemal...
10/10/2025
By Chuck Parker, CEO of Sohonet
If you work in film and television, you can feel it: anxiety is high. Budgets are tight, schedules are tighter, and AI is a c...
10/10/2025
L3Harris' WESCAM MX-Series EO/IR sensor systems have a long history of supporting complex missions in harsh environments, as seen here on a Kaplan-20 Next G...
10/10/2025
Cobalt Digital Booth # 607 // Journalists: Click to visit Cobalt
NAB NY 2025 Audio monitors join Cobalt's platform, including its latest routers, multiview...
10/10/2025
NEW YORK - October 9, 2025 - Nielsen, the global leader in audience measurement, data and analytics, today announced the release of The Marketing ROI Blueprint:...
10/10/2025
CHAMPAIGN, Ill. Cobalt Digital will feature its Aria series of audio solutions designed to simplify monitoring, embedding and routing at NAB Show New York, set ...
10/10/2025
LOS ANGELES and PONTE VEDRA BEACH, Florida Amazon's Prime Video has announced a new deal that will allow it to exclusively stream a revival of the PGA Tour&...
10/10/2025
ATLANTA Local Now, Allen Media Group's free streaming service, will add five channels from Fox to its growing lineup. The new offerings are Fox Sports, Fox ...
10/10/2025
WASHINGTON The National Association of Broadcasters is applauding a draft notice from the Federal Communications Commission that would potentially speed up the ...