
University of Washington researchers have developed new algorithms that can turn audio clips into a realistic, lip-synced video of the person speaking those words.
As detailed in a paper to be presented August 2 at SIGGRAPH 2017 in L.A., the team successfully generated realistic video of former president Barack Obama talking about terrorism, fatherhood, job creation and other topics using audio clips of those speeches and existing weekly video addresses that were originally on a different topic.
Ira Kemelmacher-Shlizerman, an assistant professor at the UW's Paul G. Allen School of Computer Science & Engineering said, Realistic audio-to-video conversion has practical applications like improving video conferencing for meetings, as well as futuristic ones such as being able to hold a conversation with a historical figure in virtual reality by creating visuals just from audio.
In a visual form of lip-syncing, the system converts audio files of an individual's speech into realistic mouth shapes, which are then grafted onto and blended with the head of that person from another existing video.
In the future video, chat tools like Skype or Messenger will enable anyone to collect videos that could be used to train computer models, Kemelmacher-Shlizerman said.
Because streaming audio over the internet takes up far less bandwidth than video, the new system has the potential to end video chats that are constantly timing out from poor connections.
When you watch Skype or Google Hangouts, often the connection is stuttery and low-resolution and really unpleasant, but often the audio is pretty good, said co-author and Allen School professor Steve Seitz. So if you could use the audio to produce much higher-quality video, that would be terrific.
By reversing the process feeding video into the network instead of just audio the team could also potentially develop algorithms that could detect whether a video is real or manufactured.
The new machine learning tool makes significant progress in overcoming what's known as the uncanny valley problem, which has dogged efforts to create realistic video from audio. When synthesised human likenesses appear to be almost real but still manage to somehow miss the mark people find them creepy or off-putting.
People are particularly sensitive to any areas of your mouth that don't look realistic, said lead author Supasorn Suwajanakorn, a recent doctoral graduate in the Allen School. If you don't render teeth right or the chin moves at the wrong time, people can spot it right away and it's going to look fake. So you have to render the mouth region perfectly to get beyond the uncanny valley.
A neural network first converts the sounds from an audio file into basic mouth shapes. Then the system grafts and blends those mouth shapes onto an existing target video and adjusts the timing to create a new realistic, lip-synced video.
Previously, audio-to-video conversion processes have involved filming multiple people in a studio saying the same sentences over and over to try to capture how a particular sound correlates to different mouth shapes, which is expensive, tedious and time-consuming. By contrast, Suwajanakorn developed algorithms that can learn from videos that exist in the wild on the internet or elsewhere.
There are millions of hours of video that already exist from interviews, video chats, movies, television programs and other sources. And these deep learning algorithms are very data hungry, so it's a good match to do it this way, Suwajanakorn said.
Rather than synthesising the final video directly from audio, the team tackled the problem in two steps. The first involved training a neural network to watch videos of an individual and translate different audio sounds into basic mouth shapes.
By combining previous research from the UW Graphics and Image Laboratory team with a new mouth synthesis technique, they were then able to realistically superimpose and blend those mouth shapes and textures on an existing reference video of that person. Another key insight was to allow a small time shift to enable the neural network to anticipate what the speaker is going to say next.
The new lip-syncing process enabled the researchers to create realistic videos of Obama speaking in the White House, using words he spoke on a television talk show or during an interview decades ago.
Currently, the neural network is designed to learn on one individual at a time, meaning that Obama's voice speaking words he actually uttered is the only information used to drive the synthesised video. Future steps, however, include helping the algorithms generalise across situations to recognise a person's voice and speech patterns with less data with only an hour of video to learn from, for instance, instead of 14 hours.
The research was funded by Samsung, Google, Facebook, Intel and the UW Animation Research Labs.
A neural network first converts the sounds from an audio file into basic mouth shapes. Then the system grafts and blends those mouth shapes onto an existing target video and adjusts the timing to create a new realistic, lip-synced video.
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
06/09/2026
June 9 2026, 23:00 (PDT) Dolby and MagentaTV Bring Fans Closer to the FIFA Worl...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
When Canada qualified for the FIFA World Cup 2026, OneSoccer knew this wasn'...
04/07/2026
Celebrates company's 80th anniversary
Rhodes have recently revealed that they will be producing a limited run of electric pianos in celebration of their...
04/07/2026
Blackmagic Design Cameras Empower Youth Broadcasting Program
Brie Clayton July 3, 2026
0 Comments
Blackmagic Pocket Cinema Camera 6K Pro and Blackmagi...
04/07/2026
iZotope Joins Boris FX
Boris Yamnitsky July 3, 2026
0 Comments
iZotope, the team behind RX and Ozone, is joining Boris FX. A letter from founder Boris...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
03/07/2026
New bundle & three Single Packs
Continuing to expand their already sizeable orchestral collection, VSL's latest release introduces three new Single Pack...
03/07/2026
Venue installs new ATC-equipped control room
LSO St Luke's, a unique music venue that also serves as the home of the London Symphony Orchestra, have rec...
03/07/2026
Caden Pearson appointed new Commissioning Editor for NITV
3 July, 2026
Media releases
National Indigenous Television (NITV) has strengthened its commitment...
03/07/2026
1 February 2023
SHARE Facebook Twitter Linkedin Email
Munich, Germany, 1st February 2023: Cinegy GmbH, the premier provider of software technology for digit...
03/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
03/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
03/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
03/07/2026
Death Star
Andy Marken July 2, 2026
0 Comments
Look, I can't get involved. I've got work to do. It's not that I like the Empire; I hate i...
03/07/2026
Berklee's New Visual Identity: Honoring Our History, Building for What's...
03/07/2026
Scripps Research scientists awarded $2M to advance global disease surveillance Two Gates Foundation grants will expand wastewater surveillance and AI-driven dis...
03/07/2026
Joan Pulupa joins Scripps Research faculty to study the organization of DNA in brain cells and its links to neurodegeneration Using smell-sensing neurons and ad...
02/07/2026
Entering her senior year, this hometown girl is paving a career in live sports production gaining experience in replay and audio and as a TD
In the live-sports...
02/07/2026
In-venue and creative video staffers at the professional and collegiate level ha...
02/07/2026
BLAST, a competitive entertainment company focused on esports, has announced more than $133 million in revenue for 2025, representing more than 40% year-over-ye...
02/07/2026
Riedel Communications has announced official SKAARHOJ panel support for SimplyLive production workflows, enabled through the SimplyLive 2.1 release. The integra...
02/07/2026
The Fire Rescue Service of the Czech Republic has deployed LiveU video-over-bond...
02/07/2026
Gravity Media USA has announced the appointment of Brittney Boston as Head of Business Development, effective July 1, 2026. Based in Nashville, Tennessee, Bosto...
02/07/2026
TwelveLabs, a video intelligence company, has announced $100 million in Series B funding co-led by NEA and NAVER Ventures, with participation from Amazon, Radic...
02/07/2026
The Pro Padel League (PPL) has announced a broadcast partnership with USA Sports that will air five PPL championship matches on CNBC during the 2026 season, the...
02/07/2026
LiveLike, a digital fan engagement platform, has announced eight confirmed FIFA ...
02/07/2026
Cobalt Digital has received Future's Best of Show Award, presented by AV Technology at InfoComm 2026, for its blueCORE family of standalone signal processor...
02/07/2026
Synamedia has announced the appointment of Dr. Tzvi Gerstl as Chief Executive Officer. Paul Segre, who has served as CEO for the past six years, will transition...
02/07/2026
The Esports Foundation (EF) and Sony Group Corporation have announced an expanded collaboration for the Esports World Cup 2026 (EWC), taking place in Paris, Fra...
02/07/2026
Zee Entertainment Enterprises Ltd. ( Z') has announced exclusive broadcast and digital rights for the Bundesliga in India for five years, beginning with the...
02/07/2026
An effort uniting News, Sports, Local, and Telemundo, the 50+-camera live produc...
02/07/2026
Zoey Deutch, John Slattery, Ken Marino, Miles Gutierrez-Riley, and Ben Wang appe...
02/07/2026
Stammering, stuttering, strangulated tones
The Crow Hill Company's latest creation promises to be the most original sound set they've produced to d...
02/07/2026
A new era in unmixing and spectral editing
The latest version of Steinberg's spectral audio-editing software has just arrived, building on the strength...
02/07/2026
Aims to simplify additive synthesis
Sine Machine is the debut launch from Melatonin, a Vienna-based developer who have spent the past six years creating wha...
02/07/2026
Products to remain fully active & supported
Following the news of Native Instruments joining the inMusic brand line-up, Academy and Emmy Award-winning visua...
02/07/2026
What you missed!
Last weekend, Saturday 27 June 2026, saw the debut of Sound On Sounds new GearExpo UK event, the largest dedicated pro-audio event to take ...
02/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
02/07/2026
Following the successful launch of its inaugural APAC Mentoring Programme last month, the Rise AV APAC Regional Council will bring the conversation around mento...
02/07/2026
Blackmagic PYXIS 6K Used to Shoot Director Takahisa Zeze's Cry Out
Brie Clayton July 2, 2026
0 Comments
Highly mobile camera supports tense and de...
02/07/2026
Broadcast Solutions acquires BFE, expanding its lead in European broadcast, medi...
02/07/2026
Berklee Alum and Faculty Perform at Boston Public Library's 250th Anniversar...
02/07/2026
Broadcast Solutions GmbH, a leading systems integrator and provider of innovative solutions for the broadcast media industry, is acquiring BFE Studio und Medien...
02/07/2026
Cinegy GmbH, the premier provider of software-defined television technology, has extended the ingest facility at leading Brazilian sports company LiveMode, work...
02/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
02/07/2026
Standalone processors acknowledged for the innovation and value they bring to Pro AV
Cobalt Digital, a leading designer and manufacturer of signal processing ...
02/07/2026
Synamedia announced today the appointment of Dr Tzvi Gerstl as Chief Executive Officer. Paul Segre, who has served as CEO for the past six years, will transitio...