
Author:
ContributorPublish date:
Aug 3, 2017Social count:
0
0
SHARES
You may have heard the headlines - We've reached human parity (Microsoft, 16th October) as they reach an accuracy of over 94 per cent; Google openly planning to compete with Dragon developers Nuance; Amazon attempting to revolutionise access to the internet via Echo and Alexa. It seems like everyone's at the Speech Recognition game surely the end is nigh for traditional methods of creating captions?
I think captioners can rest easy for a good while yet - for a few simple reasons. The first is simply the scale of the task that regulators and audiences set the captioner; typically a pre-recorded programme must be captioned 100 per cent accurately, and a live show should hit at least 98 per cent. Taking the pre-recorded example, how hard can that be for a machine? Surely there's all the time in the world to get it right?
Consider what 100 per cent actually means; not only does every word have to be identified and spelled correctly (no mean feat on a show such as Mastermind, where deliberately obscure questions can trigger equally obscure and possibly wrong answers). Imagine writing down every word you utter during any given day; would you go for something akin to the dialogue in a play accurate with all its disfluencies' (those crutch-like Ums' and Errs' that let your brain change gear whilst letting your mouth free-wheel). Do you talk in nice, tidy grammatical sentences? Do you pause neatly for mental punctuation? I guessed as much. If you simply transcribe such speech verbatim you'll get a very accurate representation of the words uttered, but that won't make for comprehensible captions and it could well be illegibly fast.
Speech recognition also thrives on good quality audio; not just a clear voice, but an absence of echo, background noise, music and so forth. It is possible with care and a complex workflow to ensure that the music and the speech remain separate in a recording but that doesn't help with poor acoustics or a duff recording. Much more research is needed to assist with improving ASR in complex audio environments and we're helping a PhD student at the University of Edinburgh to research precisely this.
The automatic insertion of punctuation is in its infancy; some inroads have been made by our research partners at Edinburgh, using techniques more commonly found in Machine Translation. Whilst ASR uses a largely probability-based approach to working out what's been said, punctuation needs something more rule-based. Questions are another matter entirely; cadence can be a good indicator for some speakers (as most languages will let you ignore the formalities of question words) but that's not a universal rule.
Identifying speaker changes is another area that needs more research; for many of our clients we need to be able to accurately identify either a change of speaker (denoted by chevrons or a change in text colour) or by identifying the speaker themselves. Whilst automated diarisation' reaches good levels of accuracy, it doesn't yet reach the level of accuracy required for broadcast.
Does this mean we can't use ASR at all? I think not. Not all content is the same; it's not all shouty gameshows, talkshows where each guest cuts across everyone else and sports output captured in the open, with the roar of the crowd and the rumble of the music bed. Some material is recorded cleanly, with professional speakers speaking at a moderate pace on a subject matter with plenty of background data to assist with the more tricky terms. If we have enough of this kind of data we can train ASR engines to make a pretty good job of transcription. We can utilise the vast archives of media with matching captions to create speech recognition engines, punctuation models and caption translation' systems to replicate the kind of output that a human could produce. We can then use audio alignment' tools to break this transcription up into readable blocks and time-align them to the original speaker's voice, leading to fully automated captions.
No doubt if I review this article in ten years' time I'll cringe at the bold assertions made about the progress of automated captioning, but I feel confident that genres such as comedy will remain a bastion of human-generated captioning even in 2027. Comedy is typically based around word play, incongruity and surprise. Speech engines are most comfortable with the opposite of this they know what they've been trained on, and a new comic turn of phrase will almost certainly bring about an unintentionally comic transcription. I'm pretty sure that a human captioner will be wrestling with the likes of Have I Got News For You for many years to come.
By Matt Simpson, head of product management, access services, Broadcast and Media Services, Ericsson
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
13/10/2025
Spectrum Brings Selected L.A. Lakers Games to Apple Vision Pro With New Immersiv...
13/10/2025
Media Climate Accord aims to offer united approach to M&E industry sustainabilit...
13/10/2025
Riot Games streamlines production of Valorant Champions Paris with ST 2110 flypa...
13/10/2025
Feeling the NRG: Riot Games puts on a show for Valorant Champions Paris final By Jo Ruddock
Monday, October 13, 2025 - 09:17
Print This Story
After more t...
13/10/2025
FOX Sports MLB Postseason Audio Aims To Make Officials' Calls More AccurateA1 Joe Carpenter hopes to bring some baseball CSI' to the ABS ump-cam system...
13/10/2025
New SBS and NITV Original RECKLESS a Deadly Funny Thriller Straight Out of Fre...
13/10/2025
Regional sports network moves from satellite to IP to cut distribution costs by more than half and streamline broadcast and direct-to-consumer delivery
Mid-Atl...
13/10/2025
Delta Live, the award-winning audio supplier, has underlined its position at the forefront of live sound with significant investments in cutting edge audio syst...
13/10/2025
Abu Dhabi, UAE October 13, 2025: Space42 (ADX: SPACE42), the UAE-based AI-powe...
13/10/2025
Nick Blood and Saffron Hocking lead casting for Hit Point, brand new original drama series for U and U&Dave
Developed & Produced by Urban Myth Films (a STUDIOC...
13/10/2025
The series from A24 will land in the UK & Ireland in 2026Monday 13 October 2025
...
13/10/2025
Back to All News
Grand Galaxy Hotel' Open for Business: Netflix Confirms Production and Cast
Entertainment
13 October 2025
GlobalSouth Korea
Link copi...
13/10/2025
Back to All News
Netflix Partners with GOBELINS Paris and Guillermo del Toro to...
13/10/2025
Back to All News
Stories Set to Thrill, Move, and Entertain: Netflix Announces ...
13/10/2025
Fox Corporation Executives to Discuss First Quarter Fiscal 2026 Financial Result...
13/10/2025
At the OCP Global Summit, NVIDIA is offering a glimpse into the future of gigawa...
13/10/2025
Season 2 brings murder and West of Ireland humour - and rain - to our screens, with M ir ad Tyers joining the cast
Watch trailer here.
A small-town obituary w...
13/10/2025
The Katie Hannon Interview Live airs tonight & Wednesday night at 7pm
As part of RT 's comprehensive election campaign coverage, journalist Katie Hannon w...
11/10/2025
SVG New Sponsor Spotlight: TAB M Solutions' Joe Wire, Kevin Tucker on Guidin...
11/10/2025
By Jessica Herndon
One of the most exciting things about the Sundance Film Fest...
11/10/2025
STAMFORD, Conn. In a move that highlights the growing importance of streaming apps on pay TV platforms, Charter Communications' Spectrum operating brand has...
11/10/2025
Netflix is expanding its video game offerings from mobile into TV by launching party games that its subscribers can play on smart TVs....
11/10/2025
STAMFORD, Conn. Charter Communications' Spectrum News has reached an deal with Comcast to expand distribution of its local news channels to Xfinity TV cust...
11/10/2025
Professional podcasts are booming. They're an effective way to establish company executives as industry leaders, humanize a large organization, drill down o...
11/10/2025
PlayBox Neo, a leading provider of media playout and channel branding solutions, will present its PlayBox Neo Suite media platform for the first time in the U.S...
11/10/2025
As a testament to its commitment to the broadcast market, FOR-A America will bring several popular and future-facing technologies to the NAB Show New York, runn...
11/10/2025
European technology developer Profuz Digital reflects on another successful IBC Show in Amsterdam from 12 15 September after showcasing the latest version of ...
11/10/2025
Cobalt Digital, the leading designer and manufacturer of award-winning signal processing products, and a founding partner in the openGear initiative, is headin...
11/10/2025
Lightware, an industry leader in signal management, is at the center of a growing range of high-profile integrations with its UBEX platform. Built to deliver un...
11/10/2025
FOR-A Latin America and the Caribbean (LAC) will bring its industry-leading signal processing, frame rate conversion and graphics playout software to CAPER 2025...
11/10/2025
Clear-Com is happy to announce its latest collaboration with BNE Productions, a premier production company known for delivering world-class audio for live even...
11/10/2025
Dean's List: Tommy Neblett Shares His YouTube Top Five Boston Conservatory's dean of dance reveals his favorite student dance videos.
By
Sarah Godcher...
10/10/2025
SVG New Sponsor Spotlight: TAB M Solutions' Joe Wire, Jeff Tucker on Guiding...
10/10/2025
SVG Students To Watch: Vincent Macri, Monmouth University The Jersey local runs Camera 1 on Hawks games and is expanding into technical directing By Brandon Co...
10/10/2025
Flexible budgets: Inside the DFL's new customisable camera concepts for Bund...
10/10/2025
Facing the future: TVN on its technical services for the new Bundesliga season with remote production and all the bells and whistles By Heather McLean
Monday...
10/10/2025
Evolving in-house: Developing broadcast expertise and pushing the women's ga...
10/10/2025
Growing the game: The Deutscher Fu ball-Bund on pushing production innovation fo...
10/10/2025
Proximity and authenticity: DFL kicks off the new football season with more broa...
10/10/2025
Spectrum Brings Select L.A. Lakers Games to Apple Vision Pro With New Immersive ...
10/10/2025
From left, Scoot McNairy, Andrew Durham, Nessa Dougherty, and Emilia Jones attend the premiere of Fairyland at the 2023 Sundance Film Festival. Photo by Jemal...
10/10/2025
By Chuck Parker, CEO of Sohonet
If you work in film and television, you can feel it: anxiety is high. Budgets are tight, schedules are tighter, and AI is a c...
10/10/2025
L3Harris' WESCAM MX-Series EO/IR sensor systems have a long history of supporting complex missions in harsh environments, as seen here on a Kaplan-20 Next G...
10/10/2025
Cobalt Digital Booth # 607 // Journalists: Click to visit Cobalt
NAB NY 2025 Audio monitors join Cobalt's platform, including its latest routers, multiview...
10/10/2025
NEW YORK - October 9, 2025 - Nielsen, the global leader in audience measurement, data and analytics, today announced the release of The Marketing ROI Blueprint:...
10/10/2025
CHAMPAIGN, Ill. Cobalt Digital will feature its Aria series of audio solutions designed to simplify monitoring, embedding and routing at NAB Show New York, set ...
10/10/2025
LOS ANGELES and PONTE VEDRA BEACH, Florida Amazon's Prime Video has announced a new deal that will allow it to exclusively stream a revival of the PGA Tour&...
10/10/2025
ATLANTA Local Now, Allen Media Group's free streaming service, will add five channels from Fox to its growing lineup. The new offerings are Fox Sports, Fox ...
10/10/2025
WASHINGTON The National Association of Broadcasters is applauding a draft notice from the Federal Communications Commission that would potentially speed up the ...