
University of Washington researchers have developed new algorithms that can turn audio clips into a realistic, lip-synced video of the person speaking those words.
As detailed in a paper to be presented August 2 at SIGGRAPH 2017 in L.A., the team successfully generated realistic video of former president Barack Obama talking about terrorism, fatherhood, job creation and other topics using audio clips of those speeches and existing weekly video addresses that were originally on a different topic.
Ira Kemelmacher-Shlizerman, an assistant professor at the UW's Paul G. Allen School of Computer Science & Engineering said, Realistic audio-to-video conversion has practical applications like improving video conferencing for meetings, as well as futuristic ones such as being able to hold a conversation with a historical figure in virtual reality by creating visuals just from audio.
In a visual form of lip-syncing, the system converts audio files of an individual's speech into realistic mouth shapes, which are then grafted onto and blended with the head of that person from another existing video.
In the future video, chat tools like Skype or Messenger will enable anyone to collect videos that could be used to train computer models, Kemelmacher-Shlizerman said.
Because streaming audio over the internet takes up far less bandwidth than video, the new system has the potential to end video chats that are constantly timing out from poor connections.
When you watch Skype or Google Hangouts, often the connection is stuttery and low-resolution and really unpleasant, but often the audio is pretty good, said co-author and Allen School professor Steve Seitz. So if you could use the audio to produce much higher-quality video, that would be terrific.
By reversing the process feeding video into the network instead of just audio the team could also potentially develop algorithms that could detect whether a video is real or manufactured.
The new machine learning tool makes significant progress in overcoming what's known as the uncanny valley problem, which has dogged efforts to create realistic video from audio. When synthesised human likenesses appear to be almost real but still manage to somehow miss the mark people find them creepy or off-putting.
People are particularly sensitive to any areas of your mouth that don't look realistic, said lead author Supasorn Suwajanakorn, a recent doctoral graduate in the Allen School. If you don't render teeth right or the chin moves at the wrong time, people can spot it right away and it's going to look fake. So you have to render the mouth region perfectly to get beyond the uncanny valley.
A neural network first converts the sounds from an audio file into basic mouth shapes. Then the system grafts and blends those mouth shapes onto an existing target video and adjusts the timing to create a new realistic, lip-synced video.
Previously, audio-to-video conversion processes have involved filming multiple people in a studio saying the same sentences over and over to try to capture how a particular sound correlates to different mouth shapes, which is expensive, tedious and time-consuming. By contrast, Suwajanakorn developed algorithms that can learn from videos that exist in the wild on the internet or elsewhere.
There are millions of hours of video that already exist from interviews, video chats, movies, television programs and other sources. And these deep learning algorithms are very data hungry, so it's a good match to do it this way, Suwajanakorn said.
Rather than synthesising the final video directly from audio, the team tackled the problem in two steps. The first involved training a neural network to watch videos of an individual and translate different audio sounds into basic mouth shapes.
By combining previous research from the UW Graphics and Image Laboratory team with a new mouth synthesis technique, they were then able to realistically superimpose and blend those mouth shapes and textures on an existing reference video of that person. Another key insight was to allow a small time shift to enable the neural network to anticipate what the speaker is going to say next.
The new lip-syncing process enabled the researchers to create realistic videos of Obama speaking in the White House, using words he spoke on a television talk show or during an interview decades ago.
Currently, the neural network is designed to learn on one individual at a time, meaning that Obama's voice speaking words he actually uttered is the only information used to drive the synthesised video. Future steps, however, include helping the algorithms generalise across situations to recognise a person's voice and speech patterns with less data with only an hour of video to learn from, for instance, instead of 14 hours.
The research was funded by Samsung, Google, Facebook, Intel and the UW Animation Research Labs.
A neural network first converts the sounds from an audio file into basic mouth shapes. Then the system grafts and blends those mouth shapes onto an existing target video and adjusts the timing to create a new realistic, lip-synced video.
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
03/11/2025
In less than two weeks during late September and early October, the Federal Communications Commission acted on two proposed rulemakings that could have an enorm...
03/11/2025
Josh Miely is returning to a more hands-on radio and TV role with the National Association of Broadcasters....
03/11/2025
Broadcasters have spent years trying to integrate different vendor technologies in their facilities. As the industry has moved closer to software, that struggle...
03/11/2025
As the malevolent siege against broadcasters' interests intensifies from the far reaches of artificial intelligence misuse to relentless innovation in the m...
03/11/2025
Wheatstone founder and owner Gary Snow will retire from the company by the end of next year....
03/11/2025
In ye olde days of traditional television, when U.S. TV viewing options were limited to ABC, CBS, NBC and PBS, Nielsen's paper diaries were sufficient for t...
03/11/2025
They've made that decision and ruined an awful lot of people's lives.
...
02/11/2025
Abu Dhabi, UAE November 2, 2025: Space42 (ADX: SPACE42), the UAE-based AI-powe...
01/11/2025
Thunderbolt 3 Now Standard on Symphony MkII - Starting November 11 Beginning November 11, all new Apogee Symphony I/O MkII units will ship with Thunderbolt 3 as...
01/11/2025
How to Expand the Apogee Symphony Desktop with Cranborne 500ADAT Want to expand your Symphony Desktop beyond two inputs? Whether you're tracking a full drum...
01/11/2025
aconnic AG (ISIN: DE000A0LBKW6), Munich, has published the Financial Report for ...
01/11/2025
tvONE is proud to announce a strategic partnership with Matrox Video, combining CALICO PRO's high-performance video processing with the Matrox ConvertIP Ser...
01/11/2025
CJP Broadcast has joined the Grass Valley partner programme as both a Systems Integration Partner and AMPP Partner. The collaboration enhances CJP's ability...
01/11/2025
TAG Video Systems, the leader in software-based IP end-to-end workflow monitoring, deep probing, and real-time visualization, has earned a higher-rated DPP Comm...
01/11/2025
Michael Napodano Appointed New CEO Of Operative Media
Operative today announced the appointment of Mike Napodano as Chief Executive Officer, marking the next s...
01/11/2025
Film industry professionals flocked to Cine Gear Expo Atlanta 2025 at celebrated Trilith Studios in Fayetteville, Georgia, on October 3 and 4. Back for its 6th ...
01/11/2025
Photo courtesy of Peacock and Sky
Christopher Ross, BSC, began his cinematic obsession early. He cites reading Scorsese on Scorsese as a teenager with teaching...
01/11/2025
NEW YORK ITN and the sell-side advertising company Magnite have announced the launch of what they are billing as the industrys first Local Linear TV Private Mar...
31/10/2025
FanDuel Sports Network To Deliver Selected Live NBA, NHL Games to Major Streamin...
31/10/2025
NBC Jumps Out of the Gate in Extended Breeder's Cup Deal With Dual Drones, J...
31/10/2025
FOR IMMEDIATE RELEASE
30 October 2025
It is with great sadness that we mourn the passing of Segomotso Keorapetse, an award- winning South African television d...
31/10/2025
IRVING, Texas As station groups move into an era that promises rapid tech, regulatory and economic changes, Nexstar Media Group said its board has extended chai...
31/10/2025
While some analysts have questioned the ongoing economic viability of broacast-TV late night shows amid ongoing declines in linear viewing, new data from Tubula...
31/10/2025
The contentious contract negotiations between The Walt Disney Co. and YouTube TV have resulted in a blackout of Disney-owned programming on the pay TV operator....
31/10/2025
CINCINNATI Video conversion and AV signal distribution specialist tvONE and Matrox Video have struck a strategic partnership, combining CALICO PRO's video p...
31/10/2025
NEW YORK The Interactive Advertising Bureau (IAB) today released a new industry guide that discusses the urgency of adopting new standards that will help advert...
31/10/2025
While some analysts have questioned the ongoing economic viability of late night shows on broadcast TV amid ongoing declines in linear viewing, new data from Tu...
31/10/2025
Berklee Celebrates the Inauguration of President Jim Lucchese In his inaugural address, Lucchese shared an optimistic vision for Berklee's future as a for...
31/10/2025
Back to All News
Family, Food, and Films: Netflix's Dining with the Kapoors...
31/10/2025
The review highlights DPA 4055 Kick Drum Microphone for its compact design, ease of placement, and authentic tone that captures the true character of the drum p...
31/10/2025
The RT Raidi na Gaeltachta Award 2025 will be presented to journalist P il n N Chiar in at the Oireachtas na Samhna in Belfast tomorrow, Saturday 1 November,...
31/10/2025
RT lyric fm is calling for choirs across Ireland to share their festive music-m...
31/10/2025
Three awards were presented to RT Raidi na Gaeltachta broadcasters at the Oire...
31/10/2025
RT continues its proud tradition of championing Ireland's vibrant arts and cultural landscape through its RT Supporting the Arts initiative. This November...
31/10/2025
RT selects Irish independent production company to produce Christian Worship on...
31/10/2025
Amidst Gyeongju, South Korea's ancient temples and modern skylines, Jensen H...
30/10/2025
Midwich has signed a UK and Ireland distribution deal with X2O Media, a worldwid...
30/10/2025
SVG Students To Watch: Sam Newitt, Kansas State UniversityThe South Dakota native thrives in many roles behind the scenes at K-StateHD.TVBy Brandon Costa, Direc...
30/10/2025
SVG Sit-Down: Swerve Sports' Christy Tanner Explores the Young FAST Channel&...
30/10/2025
SVG Campus Shot Callers: Andy Liebsch, Senior Director, Video Services, Kansas S...
30/10/2025
Diversified Names Paul Lidsky CEO, Expanding Leadership Role After Serving as Bo...
30/10/2025
NBA, Cosm Enter Long-Term Partnership for Shared Reality Production, Distributio...
30/10/2025
SVG New Sponsor Spotlight: FanConnect's Brett Crossley on Reimagining the Ga...
30/10/2025
FanDuel Sports Network to Deliver Select Live NBA, NHL Games to Major Streaming ...
30/10/2025
As the year comes to a close, we can feel the invigorating wind sweeping in for ...
30/10/2025
By Bailey Pennick
One of the most exciting things about the Sundance Film Festi...
30/10/2025
The SGL Carbon site in Bonn has a long tradition of training. For many years, young talent has been successfully trained here, regularly achieving excellent exa...
30/10/2025
SBS, NITV and Screen Australia announce 2025 Digital Originals Shortlist
29 October, 2025
Media releases
SBS, NITV and Screen Australia are excited to unve...
30/10/2025
Jon Rambeau, President of Integrated Mission Systems at L3Harris Technologies, speaks about industrial collaboration at the Asia-Pacific Economic Cooperation (A...