Sony Pixel Power calrec Sony

Why captioning cant be fully automated

31/10/2017

Author:

ContributorPublish date:

Aug 3, 2017Social count:

0

0

SHARES

You may have heard the headlines - We've reached human parity (Microsoft, 16th October) as they reach an accuracy of over 94 per cent; Google openly planning to compete with Dragon developers Nuance; Amazon attempting to revolutionise access to the internet via Echo and Alexa. It seems like everyone's at the Speech Recognition game surely the end is nigh for traditional methods of creating captions?

I think captioners can rest easy for a good while yet - for a few simple reasons. The first is simply the scale of the task that regulators and audiences set the captioner; typically a pre-recorded programme must be captioned 100 per cent accurately, and a live show should hit at least 98 per cent. Taking the pre-recorded example, how hard can that be for a machine? Surely there's all the time in the world to get it right?

Consider what 100 per cent actually means; not only does every word have to be identified and spelled correctly (no mean feat on a show such as Mastermind, where deliberately obscure questions can trigger equally obscure and possibly wrong answers). Imagine writing down every word you utter during any given day; would you go for something akin to the dialogue in a play accurate with all its disfluencies' (those crutch-like Ums' and Errs' that let your brain change gear whilst letting your mouth free-wheel). Do you talk in nice, tidy grammatical sentences? Do you pause neatly for mental punctuation? I guessed as much. If you simply transcribe such speech verbatim you'll get a very accurate representation of the words uttered, but that won't make for comprehensible captions and it could well be illegibly fast.

Speech recognition also thrives on good quality audio; not just a clear voice, but an absence of echo, background noise, music and so forth. It is possible with care and a complex workflow to ensure that the music and the speech remain separate in a recording but that doesn't help with poor acoustics or a duff recording. Much more research is needed to assist with improving ASR in complex audio environments and we're helping a PhD student at the University of Edinburgh to research precisely this.

The automatic insertion of punctuation is in its infancy; some inroads have been made by our research partners at Edinburgh, using techniques more commonly found in Machine Translation. Whilst ASR uses a largely probability-based approach to working out what's been said, punctuation needs something more rule-based. Questions are another matter entirely; cadence can be a good indicator for some speakers (as most languages will let you ignore the formalities of question words) but that's not a universal rule.

Identifying speaker changes is another area that needs more research; for many of our clients we need to be able to accurately identify either a change of speaker (denoted by chevrons or a change in text colour) or by identifying the speaker themselves. Whilst automated diarisation' reaches good levels of accuracy, it doesn't yet reach the level of accuracy required for broadcast.

Does this mean we can't use ASR at all? I think not. Not all content is the same; it's not all shouty gameshows, talkshows where each guest cuts across everyone else and sports output captured in the open, with the roar of the crowd and the rumble of the music bed. Some material is recorded cleanly, with professional speakers speaking at a moderate pace on a subject matter with plenty of background data to assist with the more tricky terms. If we have enough of this kind of data we can train ASR engines to make a pretty good job of transcription. We can utilise the vast archives of media with matching captions to create speech recognition engines, punctuation models and caption translation' systems to replicate the kind of output that a human could produce. We can then use audio alignment' tools to break this transcription up into readable blocks and time-align them to the original speaker's voice, leading to fully automated captions.

No doubt if I review this article in ten years' time I'll cringe at the bold assertions made about the progress of automated captioning, but I feel confident that genres such as comedy will remain a bastion of human-generated captioning even in 2027. Comedy is typically based around word play, incongruity and surprise. Speech engines are most comfortable with the opposite of this they know what they've been trained on, and a new comic turn of phrase will almost certainly bring about an unintentionally comic transcription. I'm pretty sure that a human captioner will be wrestling with the likes of Have I Got News For You for many years to come.

By Matt Simpson, head of product management, access services, Broadcast and Media Services, Ericsson
LINK: https://www.tvbeurope.com/features-2/captioning-cant-fully-automated...
See more stories from tvb

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

02/05/2026

Dalet Flex LTS Delivers Smarter Search, Faster Editing, and an AI-Ready Foundation for Modern Media

Dalet, a leading technology and service provider for media-rich organizations, t...

01/05/2026

NBCUniversal's Peacock to Be First Streamer to Integrate Dolby's Full Suite of Premium Picture and Sound Innovations

January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...

15/04/2026

BBC World Service TV Selects Open Broadcast Systems IP Decoders for Global Distribution

Open Broadcast Systems has announced that BBC World Service has selected its IP ...

15/04/2026

NAB 2026: LiveU Expands Collaboration with Sony to Include File-Based Workflow Integration

LiveU has announced an expansion of its collaboration with Sony Corporation, add...

15/04/2026

NAB 2026: Ateme and NVIDIA Announce Immersive Video Workflow for Apple Vision Pro

Ateme has announced a collaboration with NVIDIA to support live Apple Immersive ...

15/04/2026

Professional Fighters League Renews Multi-Year Partnership with DAZN DACH

The Professional Fighters League (PFL) has announced a multi-year partnership renewal with DAZN DACH, covering Germany, Switzerland, Austria, Liechtenstein, and...

15/04/2026

NAB 2026: Canon Sets New Benchmark with CINE-SERVO 40-1200m Lens; New Remote Camera Controller Supports Up to 200 Cameras

Canon U.S.A. (NAB Booth C3825) today took the lid off of the CINE-SERVO 40-1200m...

15/04/2026

NAB 2026: Panasonic and NEP Group to Demonstrate KAIROS and NEP Platform Integration

Panasonic Video and Audio Systems North America and NEP Group will demonstrate a...

15/04/2026

Exclusive Wasabi Report: AI Spending Is Surging, But ROI Tells a Different Story

For the fourth year running, independent analysts found businesses across all industries and verticals pay roughly the same amount in fees as they spend on stor...

15/04/2026

NBC Sports to Broadcast The Soccer Tournament Live on NBC, Peacock, and NBCSN, May 30-June 1

The Soccer Tournament (TST) has announced a media rights deal with NBC Sports to...

15/04/2026

NAB 2026: JB&A Announces Exhibitors for Pre-NAB 2026 Technology Event

JB&A will host the Pre-NAB 2026 Technology Event on April 17-18 at Flamingo Las Vegas, ahead of NAB Show. The event features hands-on demonstrations and technic...

15/04/2026

NAB 2026: Sennheiser Group to Exhibit with Spectera and AMBEO Updates

The Sennheiser Group will exhibit at NAB Show 2026 (Booth 4931, Central Hall), with demonstrations from Sennheiser, Neumann, and Merging across three areas: Rel...

15/04/2026

NAB 2026: NAB Show 2026 to Feature Expanded AI, Sports, and Creator Economy Programming

NAB Show 2026 will take place April 18-22 at the Las Vegas Convention Center, wi...

15/04/2026

NAB 2026: AI-Media Launches LEXI Text Encoder and LEXI Voice Encoder

AI-Media has announced the LEXI Text Encoder and LEXI Voice Encoder at NAB Show 2026, the company's first new encoder hardware release in more than a decade...

15/04/2026

NAB 2026: Cartoni Debuts New Camera Support Products

Italian camera support manufacturer Cartoni will introduce several new products at NAB Show 2026 (Booth C6540, Central Hall), including the Master 30 OB fluid h...

15/04/2026

NAB 2026: Lawo and swXtch.io Sign MOU to Explore groundSwXtch Integration

Lawo and swXtch.io have announced a memorandum of understanding at NAB Show 2026, under which Lawo will explore incorporating swXtch.io's groundSwXtch softw...

15/04/2026

NAB 2026: CacheFly to Demonstrate New CDN Features

CacheFly will exhibit at NAB Show 2026 (Booth W3129, April 19-22, Las Vegas Convention Center), showcasing three new additions to its content delivery platform:...

15/04/2026

NAB 2026: Synamedia Launches GO Shorts for Mobile-First Short-Form Video

Synamedia has announced GO Shorts, a new module within its Synamedia Go OTT platform that uses AI to convert an operator's existing content library into a s...

15/04/2026

NAB 2026 Preview, Central Hall: Everything You Need To Know Heading Into the Show

The NAB Show kicks off on Saturday, and the SVG and SVG Europe editorial teams a...

15/04/2026

AJA Video Systems to Acquire Video Encoding Software Company Comprimato

AJA Video Systems has announced an agreement to acquire Comprimato, a live video encoding and processing software company. The deal will unite the two companies...

15/04/2026

NBA Playoffs 2026: Prime Vision, Prime Insights Offer New Data-Driven Experiences for NBA Fans

Prime Video Sports' NBA Playoffs coverage, which includes the entire SoFi NB...

15/04/2026

Top Live-Sound-System Manufacturers Team Up To Better Manage Stadium Noise

Just announced, the SDE standard provides a unified method and file format to ensure consistent and reliably comparable noise predictions Sports and entertainm...

15/04/2026

Spotify Podcast Awards Return to Celebrate Latin America's Most Influential Voices

From immersive storytelling to laugh-out-loud comedies, podcasts are booming in ...

15/04/2026

Spotify Expands Audiobook Features, and Printed Book Sales Go Live in the US and UK

Books have always moved with us, whether tucked in our bags or humming in our he...

15/04/2026

Spotify and NIVA Partner to Support Independent Venues Across the US

For many artists, independent venues are where music careers begin and fan communities take shape. Independent venue operators work hard every day to keep local...

15/04/2026

Spotify Editors Reveal Their Picks for Best Book of the Century (So Far)

From gripping thrillers to poignant memoirs, the 21st century has had no shortage of unforgettable books. To celebrate the standout storytelling of our modern e...

15/04/2026

SonicWorld introduce Telsie T

Vintage broadcast experts release second plug-in Telsie T is the second plug-in to be released by SonicWorld, a German audio company who specialise in servi...

15/04/2026

UAD Explore Free from Universal Audio

Includes eight free UAD plug-ins Universal Audio's latest bundle brings together a selection of their renowned plug-ins and virtual instruments, and is ...

15/04/2026

Maximum uptime for broadcasters: Rohde & Schwarz launches R&SBroadcastShield at NAB 2026

Maximum uptime for broadcasters: Rohde & Schwarz launches R&S BroadcastShield at...

15/04/2026

WIDOW: The Mission Software Defining Rotary Strike

Image courtesy of MD Helicopters...

15/04/2026

L3Harris Announces Billion Dollar Expansion to Boost Solid Rocket Motor Production in Orange County, Virginia

Virginia Gov. Abigail Spanberger, L3Harris VP Mark Farley, and state and local l...

15/04/2026

Advancing America's Space Defense: L3Harris Completes Critical Milestone on Way to Delivering GBOSS Capability to the Warfighter

U.S. Space Forces Ground-Based Optical Sensor System upgrade at the Maui Space S...

15/04/2026

Winter Olympics, Super Bowl Power NBCU-Versant to Gold Medal Performance in Nielsen's February Gauge Reports

NBCU-Versant notches 13.1% of TV viewing in February, its best since August 2024...

15/04/2026

Nielsen CMI shows New Zealand's over-65s are a growing, cashed-up and still-working audience brands can't ignore

New data reveals older Kiwis are financially resilient, loyal to local products,...

15/04/2026

aconnic launches ACCEED 4430 10 Gigabit system for high volume enterprise and business service aggregation

aconnic AG (ISIN: DE000A0LBKW6), Munich, announces the market launch of the ACCE...

15/04/2026

Autocue to Mark 2026 NAB Show Debut of its New PTZ Prompter

Share Copy link Facebook X Linkedin Bluesky Email...

15/04/2026

Locality Deploys Nielsen's Media Data Engine

Share Copy link Facebook X Linkedin Bluesky Email...

15/04/2026

Viant Announces Agreement to Acquire TVision

Share Copy link Facebook X Linkedin Bluesky Email...

15/04/2026

Evergent introduces Agentic Revenue Orchestration Platfor...

Evergent introduces its Agentic Revenue Orchestration Platform, transforming how subscription businesses across direct-to-consumer streaming, pay-TV, telecommun...

15/04/2026

CentralCast Delivers Breakthrough Efficiencies to Public...

Harmonic's XOS Media Processor Delivers Exceptional Video Quality to More than Half of U.S. Public Media Viewership Harmonic (NASDAQ: HLIT) today announce...

15/04/2026

DPA N Series Wireless System Unlocks Duplex Gap and Guard...

LONGMONT, COLORADO, APRIL 15, 2026 DPA Microphones N Series Digital Wireless System users in North America can now take full advantage of the system's exc...

15/04/2026

Cobalt Iron Launches Compass Tape Gateway Modernizing IBM...

Cobalt Iron, a leading provider of SaaS-based enterprise data protection, today announced the launch of Compass Tape Gateway (CTG), a transformative enhancemen...

15/04/2026

Disguise to Showcase Cutting-Edge Experience Tech for Sports, Broadcast and More at NAB 2026

Disguise to Showcase Cutting-Edge Experience Tech for Sports, Broadcast and More...

15/04/2026

Arooj Aftab Makes the Music She Wants to Hear

Arooj Aftab Makes the Music She Wants to Hear The singular artist explores the juxtaposition of grief and joy, dark and light, in her distinctive sound. Apri...

15/04/2026

Panasonic, NEP Partner on IP-Based Live Production

Share Copy link Facebook X Linkedin Bluesky Email...

15/04/2026

Encompass Digital Media Powers Global Cloud Transformatio...

Interra Systems, a provider of end-to-end quality assurance solutions for the digital media industry, is proud to announce its central role in the digital trans...