Sony Pixel Power calrec Sony

Transcribing the future

11/10/2018

-- --

Facebook

Twitter

Google

Pinterest

SCREEN AFRICA EXCLUSIVE: A while back I was handed a bunch of very long audio files and was tasked with cutting about 60 hours of interviews, sound bites and voice over down to a 30-minute radio piece. Easy I thought, as long as I could get the material transcribed quickly and at reasonable cost and there my journey of discovery began.

Voice to text transcriptions have long been used in the media, medical and legal industries, traditionally done by human transcription teams. It's big business, but turn arounds can be slow and files often need a second error check to make sure the content is accurate. The cost of transcription actually wasn't a factor in my case - it was speed that I needed, I needed a machine to plough through my audio files and spit out a transcript so that I could search for key words and edit my story together.

-- --

Trolling the internet, I instantly found a few options and uploaded the same test file to all of them (as part of my free trials) but had really disappointing results ranging from 25 to about 59 per cent accuracy. Simple words and phrases were being interpreted as something completely different, more complex words like Fakarava Atoll came back as expletives! At first I thought that as the majority of interviews were heavy New Zealand accents that might be the problem but a snippet of the best British accented guest gave me similar results.

Through my work in the video world, I am aware that there is a lot of research and development in the transcription arena utilising Artificial Intelligence (AI). Machine learning works best when it is processing large analysable data sets like text. But most of the data being produced in the world right now is not text, it's the spoken word embedded in video and audio recording and thus the goal for AI developers to produce a reliable voice transcription process has intensified.

Tech companies like Apple, Google, Microsoft and Amazon are all actively involved in this space and have been researching voice recognition since the 90s and that research has only accelerated with the emergence of virtual assistants like Alexa, Cortana, and Google Voice and Siri. However, most people who use Siri or Alexa would agree that, while those tools do a reasonable job of understanding you, most of us wouldn't trust them with our lives. I asked Alexa where the Fakarava Atoll was and her response was, I would rather not answer that question. (Out of interest it is in Tahiti and is not a swear word!) A voice assistant like Alexa only needs to work out which, of a predetermined list of vocal commands is being asked, whereas a transcription programme needs to listen for and capture all spoken words and this wider variety of possible inputs and outputs makes it a more difficult task for AI.

Whilst stumbling around for my transcription answers I came across an article published by a team of Data Scientists and enthusiastic entrepreneurs, Ashutosh Trivedi and Anup Gosavi, who recently founded a company called Spext. Trivedi, based in Bangalore India, has deep interest and post graduate expertise in AI and has published his research in many IEEE journals. Gosavi is based in San Francisco and specialises in Design Thinking and Information Visualisation.

Spext describes their company name as a fusion of the words speech and text, and from the outset they looked like they could offer me exactly what I wanted and more. The service can best be described as a combined voice transcriber and media editor. You upload your audio files and the system automatically converts voice to text and displays the result in an edit window where it aligns the audio content with the text accurately and that means you can now do some amazing things with the resultant files. Not only do you get a full transcription of your work but you can edit the transcript, like you would on a word processor and then export the result as a new audio file. Obviously you can't create new sentences but the ability to edit and output the existing data as an audio file is a huge plus. It looks like a normal text editor and has familiar actions like copy-paste, cut-paste and I found editing by transcriptions on the fly extremely easy. When you are done you export your work as a word document, pdf and/or a new mp3 or wave file or even export your project to professional editing tools such as Adobe Audition, and Final Cut Pro for fine tuning.

The most important result is that the files I used to test other systems uploaded into the Spext system quickly and came back blazingly fast with a resultant accuracy of 96 per cent in my case. The system had even correctly punctuated the transcription, coped well with proper nouns like the names of fish species and fishing techniques but it too also battled with transcribing Fakarava (expletive) but at least recognised the word Atoll. It took no time at all to quickly manually edit any corrections. What could have taken weeks in production will easily get done in a matter of days now, artificial intelligence seems to have finally reached the point where transcription of audio by a machine works efficiently enough to make it viable and as researchers and companies improve and refine their algorithms, it seems evident that transcriptions will become even more accurate and the potential productivity savings of automated transcription will be hard to ignore. Someday soon, we might even be headed towards a world where audio files and text are thought of not as two distinct media types, but as two formats for the same content - as interchangeable - and as convertible as an .mp3 and a .wav file or a text file and a Word document.

The guys at Spext have used a unique combination of intuitive user experience design, to make it easy for the user, and advanced machine learning tha
LINK: http://www.screenafrica.com/2018/10/11/technology/ai-artificial-intell...
See more stories from screenafrica

Most recent headlines

09/11/2025

Dalet Unveils Agentic AI Media Workflows at IBC2025

Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...

06/10/2025

France Tlvisions Wins Prestigious 2025 EBU Technology & Innovation Award in Groundbreaking Collaboration with Dalet

France T l visions, France's leading broadcaster, has received the 2025 EBU ...

13/09/2025

Cox Media Group's Misti Turnbull Inducted into the NATAS Silver Circle

ATLANTA Cox Media Group has announced that the company's vice president of news, Misty Turnbull has been inducted into the National Academy of Television Ar...

13/09/2025

Shotoku Debuts Swoop Cranes for Studio Robotics at IBC2025

AMSTERDAM Shotoku Broadcast Systems, a major developer of robotic systems, has announced plans to take studio robotics to the next level at IBC2025 by debuting ...

13/09/2025

Riedel Unveils Ultra-Light Bolero Mini Wireless Intercom...

At IBC2025 in Amsterdam, Riedel Communications unveiled Bolero Mini, the company's lightest and flattest wireless intercom beltpack to date. Designed to del...

13/09/2025

Shotoku Takes Studio Robotics to New Heights with IBC Deb...

Shotoku Broadcast Systems, the international developer of dependable, userfriendly robotic systems, is taking studio robotics to the next level at IBC 2025 with...

13/09/2025

The Bitmovin Video Developer Report 2025-26 Reveals Cost...

Bitmovin, a leading provider of video streaming solutions, today released the 9th annual Video Developer Report 2025/26, offering an in-depth look at the evolvi...

13/09/2025

Bitmovin and StreamShark Partner to Deliver High Quality...

Bitmovin, the leading provider of video streaming solutions, today announced a strategic partnership with StreamShark, the trusted video platform for enterprise...

13/09/2025

Ikegami Announces VFE-P711AD 7-inch OLED Multiformat On-C...

Ikegami has chosen IBC 2025 in Amsterdam as the launch venue for a major addition to its range of viewfinders. The new VFE-P711AD is a 7-inch high resolution OL...

13/09/2025

KitBash3D and Greyscalegorilla Announce Merger

Founder-led Merger to Fast Track R&D, Asset Library Upgrades, Tools and More; No Disruption to Pricing or Support for Users Today, KitBash3D, a pioneer in 3D a...

13/09/2025

Mavis Puts Itself at the Heart of Mobile Production

With NDI certification, Atomos integration, Grass Valley collaboration, and a new Monitor app, at this year's IBC, Mavis is showcasing a series of powerful...

13/09/2025

Creamsource Expands Vortex Family with Vortex24 Soft

Creamsource, maker of artisan LED lighting for film and television, has unveiled the Vortex24 Soft (V24S), a 1950W native soft light and the largest soft source...

13/09/2025

DAZN streams 2025 FIFA Club World Cup to billions of fans...

When international sports streaming service DAZN secured the global rights to the 2025 FIFA Club World Cup football tournament, it set out to deliver an unmatch...

13/09/2025

Riedel Communications Acquires hi human interface

Riedel Communications today announced the acquisition of hi human interface from Broadcast Solutions, bringing a powerful, vendor-agnostic control system to it...

13/09/2025

RTW chooses Calrec as technology partner for its AI ready...

Building on its long-term relationship with audio metering specialist RTW, Calrec has integrated the company's brand new TMxCore metering platform across it...

13/09/2025

Calrec unveils 48 fader Argo M at IBC2025 and demonstrate...

Calrec is expanding its family of future-ready self-contained Argo M control surfaces at IBC2025, with the addition of a brand new powerful 48-fader console. Co...

13/09/2025

SKY Perfect Modernizes Playout-to-Delivery with Harmonic

Harmonic's Software-Based XOS Advanced Media Processor Provides Unparalleled Efficiency and Unlocks New Business Models SAN JOSE, Calif. - Sept. 13, 2025 -...

13/09/2025

September 11, 2025

Researchers find brain region that fuels compulsive drinking Study by Scripps Research scientists shows how the brain learns to seek alcohol for relief, not jus...

12/09/2025

College Football Kickoff 2025: Fox Sports Ups Look as Canon, Sony Power Shallow Focus Coverage

College Football Kickoff 2025: Fox Sports Ups Look as Canon, Sony Power Shallow ...

12/09/2025

ABC/ESPN Excited For WNBA Postseason Coverage In Revamped Format

ABC/ESPN Excited For WNBA Postseason Coverage In Revamped FormatThe Finals moves to a best-of-seven series in 2025By Mark J Burns, SVG Contributor Friday, Sep...

12/09/2025

Rabbit Trap Pulsates With Folklore Dread

(L-R) Jade Croot, Rosy McEwen, and Bryn Chainey attend the 2025 Sundance Film Festival premiere of Rabbit Trap at Eccles Theatre on January 24, 2025, in Park ...

12/09/2025

Spotify's The Drop Weekly' Brings You the Week in New Releases, Straight From Our Editors

For fans, we know how important it is to stay plugged into music culture and dis...

12/09/2025

Agama and Consult Red announce RDK Accelerator integration

Link ping, Sweden and Shipley, United Kingdom, September 12, 2025 - Agama, the expert in video observability and analytics for service quality and customer expe...

12/09/2025

IBC2025 Opens for Business

IBC2025 began on Sept. 12, with exhibits and conferences running through Sept. 15 at the RAI Amsterdam Convention Center. Explore the full TV Tech coverage of t...

12/09/2025

The Best Fictional Bands (and the Artists Who Make Them Great)

The Best Fictional Bands (and the Artists Who Make Them Great) With Spinal Tap II: The End Continues hitting theaters and songs from KPop Demon Hunters ruling...

12/09/2025

Tom Baldassare Joins Advanced Systems Group

Industry veteran Tom Baldassare has joined Advanced Systems Group, LLC (ASG), a technology and services provider for media creatives and content owners, as a Se...

12/09/2025

Maxon Unveils a Brand New Look for its Growing Family of...

Maxon, maker of powerful, approachable software solutions for creators working in 2D and 3D design, motion graphics, visual effects, and more, today announced a...

12/09/2025

PlayBox Neo US Partners with AI-Media to Deliver Scalable...

PlayBox Neo, a leading provider of media playout solutions, has partnered with AI-Media, pioneering developers of AI-powered captioning technology, to integrate...

12/09/2025

Dalet Unveils Agentic AI Media Workflows at IBC2025

Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...

12/09/2025

Keepit and Ingram Micro launch strategic sales agreement...

New alliance strengthens the IT channel in Germany and Switzerland in protecting business-critical SaaS data. Keepit, the world s only independent, cloud-nativ...

12/09/2025

Mediaset selects Fincons Group AllRights to evolve rights...

Fincons Group, an international IT business consultancy and systems integrator company with more than 40 years of experience in the market, is proud to announce...

12/09/2025

EVS Acquires XD motion

Following its acquisition of Telemetrics, EVS continues its push into robotics with an announcement at IBC2025 that it is acquiring XD motion....

12/09/2025

Televisa Executive Joins NABA Board

TORONTO The North American Broadcasters Association (NABA) has announced the appointment of Eduardo Ruiz Sanchez, deputy director, broadcast operations at Telev...

12/09/2025

Ed Miller, Former SBE President, Has Died

Ed Miller, a longtime broadcast engineer in Ohio and a former national president of the Society of Broadcast Engineers, has died....

12/09/2025

IBC2025: Dynamic HDR Gains Traction

AMSTERDAM At this year's IBC2025, the Advanced HDR by Technicolor initiative will be pushing broadcasters to adopt a more dynamic, frame-by-frame conversion...

12/09/2025

Granville opens up one last time for U&GOLD in Open All Hours: Inside Out

Feature-length retrospective from Studio Crook to air in 2026 Sir David Jason returns to the nation's favourite comedy channel, U&GOLD, for Open All Hours:...

12/09/2025

Bob Geldof to receive Lifetime Achievement Award at the Sky Arts Awards 2025

Friday 12 September 2025 The Boomtown Rats, Nyah Grace, Soweto Kinch, Royal Ballet and Madness also announced to perform at the ceremony on Tuesday Sky today ...

12/09/2025

Riedel Unveils Ultra-Light Bolero Mini Wireless Intercom Beltpack

Wuppertal September 12, 2025 Riedel Unveils Ultra-Light Bolero Mini Wireless Intercom BeltpackAt IBC2025 in Amsterdam, Riedel Communications unveiled Bolero M...

12/09/2025

Riedel Communications Acquires hi human interface

Wuppertal September 12, 2025 Riedel Communications Acquires hi human interfaceRiedel Communications today announced the acquisition of hi human interface fro...

12/09/2025

New International Crime Series Road (WT)' Explores Twisted Murders Across Borders

Back to All News New International Crime Series Road (WT)' Explores Twiste...

12/09/2025

First Look: Thai Crime Drama Everybody Loves Me When I'm Dead' Premieres October 14

Back to All News First Look: Thai Crime Drama Everybody Loves Me When I'm ...

12/09/2025

Netflix Marks 10 Years in Japan, Announces Three New Series That Will Keep You Hitting The Next Episode

Back to All News Netflix Marks 10 Years in Japan, Announces Three New Series Th...

12/09/2025

What Is CORE+ Technologyand How Does It Elevate Church Sound?

CORE+ virtually removes distortion, setting a new standard for church sound and giving worship teams the clarity and confidence they need. Read the full artic...

12/09/2025

Margot Robbie, Colin Farrell, Mary Robinson and Conor Murray amongst guests on Late Late Show season opener

The Late Late Show is back with a bang after the summer break, and Patrick Kielt...

12/09/2025

Another jam-packed weekend of live, free-to-air Sport across RT

The World Athletics Championships, Ireland v France in the Women's Rugby World Cup quarter-final, the Irish Champions Festival, and two Sports Direct Men...

12/09/2025

Katie Hannon explores the shelves of Ireland's National Archives in new series

The Records Show starts Sunday at 6.30pm on RT One and RT Player. Katie Hanno...

11/09/2025

Report: Busy Live Sports Streaming Execs Have Low-hanging Fruit' in Front of Them

Report: Busy Live Sports Streaming Execs Have Low-hanging Fruit' in Front o...

11/09/2025

Inside Game Creek Video's Big Week as Ovation, Flagship Make NFL Debuts

Inside Game Creek Video's Big Week as Ovation, Flagship Make NFL DebutsBy Ken Kerschbaumer, Editorial Director Thursday, September 11, 2025 - 7:00 am Pr...

11/09/2025

NFL Kickoff 2025: Prime Sports Starts New Season at Lambeau Field; Sets Sights on Holiday Matchups, Second-Ever Playoff Game

NFL Kickoff 2025: Prime Sports Starts New Season at Lambeau Field; Sets Sights o...