
-- --
Facebook
Twitter
Google
Pinterest
SCREEN AFRICA EXCLUSIVE: A while back I was handed a bunch of very long audio files and was tasked with cutting about 60 hours of interviews, sound bites and voice over down to a 30-minute radio piece. Easy I thought, as long as I could get the material transcribed quickly and at reasonable cost and there my journey of discovery began.
Voice to text transcriptions have long been used in the media, medical and legal industries, traditionally done by human transcription teams. It's big business, but turn arounds can be slow and files often need a second error check to make sure the content is accurate. The cost of transcription actually wasn't a factor in my case - it was speed that I needed, I needed a machine to plough through my audio files and spit out a transcript so that I could search for key words and edit my story together.
-- --
Trolling the internet, I instantly found a few options and uploaded the same test file to all of them (as part of my free trials) but had really disappointing results ranging from 25 to about 59 per cent accuracy. Simple words and phrases were being interpreted as something completely different, more complex words like Fakarava Atoll came back as expletives! At first I thought that as the majority of interviews were heavy New Zealand accents that might be the problem but a snippet of the best British accented guest gave me similar results.
Through my work in the video world, I am aware that there is a lot of research and development in the transcription arena utilising Artificial Intelligence (AI). Machine learning works best when it is processing large analysable data sets like text. But most of the data being produced in the world right now is not text, it's the spoken word embedded in video and audio recording and thus the goal for AI developers to produce a reliable voice transcription process has intensified.
Tech companies like Apple, Google, Microsoft and Amazon are all actively involved in this space and have been researching voice recognition since the 90s and that research has only accelerated with the emergence of virtual assistants like Alexa, Cortana, and Google Voice and Siri. However, most people who use Siri or Alexa would agree that, while those tools do a reasonable job of understanding you, most of us wouldn't trust them with our lives. I asked Alexa where the Fakarava Atoll was and her response was, I would rather not answer that question. (Out of interest it is in Tahiti and is not a swear word!) A voice assistant like Alexa only needs to work out which, of a predetermined list of vocal commands is being asked, whereas a transcription programme needs to listen for and capture all spoken words and this wider variety of possible inputs and outputs makes it a more difficult task for AI.
Whilst stumbling around for my transcription answers I came across an article published by a team of Data Scientists and enthusiastic entrepreneurs, Ashutosh Trivedi and Anup Gosavi, who recently founded a company called Spext. Trivedi, based in Bangalore India, has deep interest and post graduate expertise in AI and has published his research in many IEEE journals. Gosavi is based in San Francisco and specialises in Design Thinking and Information Visualisation.
Spext describes their company name as a fusion of the words speech and text, and from the outset they looked like they could offer me exactly what I wanted and more. The service can best be described as a combined voice transcriber and media editor. You upload your audio files and the system automatically converts voice to text and displays the result in an edit window where it aligns the audio content with the text accurately and that means you can now do some amazing things with the resultant files. Not only do you get a full transcription of your work but you can edit the transcript, like you would on a word processor and then export the result as a new audio file. Obviously you can't create new sentences but the ability to edit and output the existing data as an audio file is a huge plus. It looks like a normal text editor and has familiar actions like copy-paste, cut-paste and I found editing by transcriptions on the fly extremely easy. When you are done you export your work as a word document, pdf and/or a new mp3 or wave file or even export your project to professional editing tools such as Adobe Audition, and Final Cut Pro for fine tuning.
The most important result is that the files I used to test other systems uploaded into the Spext system quickly and came back blazingly fast with a resultant accuracy of 96 per cent in my case. The system had even correctly punctuated the transcription, coped well with proper nouns like the names of fish species and fishing techniques but it too also battled with transcribing Fakarava (expletive) but at least recognised the word Atoll. It took no time at all to quickly manually edit any corrections. What could have taken weeks in production will easily get done in a matter of days now, artificial intelligence seems to have finally reached the point where transcription of audio by a machine works efficiently enough to make it viable and as researchers and companies improve and refine their algorithms, it seems evident that transcriptions will become even more accurate and the potential productivity savings of automated transcription will be hard to ignore. Someday soon, we might even be headed towards a world where audio files and text are thought of not as two distinct media types, but as two formats for the same content - as interchangeable - and as convertible as an .mp3 and a .wav file or a text file and a Word document.
The guys at Spext have used a unique combination of intuitive user experience design, to make it easy for the user, and advanced machine learning tha
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
06/10/2025
France T l visions, France's leading broadcaster, has received the 2025 EBU ...
13/09/2025
ATLANTA Cox Media Group has announced that the company's vice president of news, Misty Turnbull has been inducted into the National Academy of Television Ar...
13/09/2025
AMSTERDAM Shotoku Broadcast Systems, a major developer of robotic systems, has announced plans to take studio robotics to the next level at IBC2025 by debuting ...
13/09/2025
At IBC2025 in Amsterdam, Riedel Communications unveiled Bolero Mini, the company's lightest and flattest wireless intercom beltpack to date. Designed to del...
13/09/2025
Shotoku Broadcast Systems, the international developer of dependable, userfriendly robotic systems, is taking studio robotics to the next level at IBC 2025 with...
13/09/2025
Bitmovin, a leading provider of video streaming solutions, today released the 9th annual Video Developer Report 2025/26, offering an in-depth look at the evolvi...
13/09/2025
Bitmovin, the leading provider of video streaming solutions, today announced a strategic partnership with StreamShark, the trusted video platform for enterprise...
13/09/2025
Ikegami has chosen IBC 2025 in Amsterdam as the launch venue for a major addition to its range of viewfinders. The new VFE-P711AD is a 7-inch high resolution OL...
13/09/2025
Founder-led Merger to Fast Track R&D, Asset Library Upgrades, Tools and More; No Disruption to Pricing or Support for Users
Today, KitBash3D, a pioneer in 3D a...
13/09/2025
With NDI certification, Atomos integration, Grass Valley collaboration, and a new Monitor app, at this year's IBC, Mavis is showcasing a series of powerful...
13/09/2025
Creamsource, maker of artisan LED lighting for film and television, has unveiled the Vortex24 Soft (V24S), a 1950W native soft light and the largest soft source...
13/09/2025
When international sports streaming service DAZN secured the global rights to the 2025 FIFA Club World Cup football tournament, it set out to deliver an unmatch...
13/09/2025
Riedel Communications today announced the acquisition of hi human interface from Broadcast Solutions, bringing a powerful, vendor-agnostic control system to it...
13/09/2025
Building on its long-term relationship with audio metering specialist RTW, Calrec has integrated the company's brand new TMxCore metering platform across it...
13/09/2025
Calrec is expanding its family of future-ready self-contained Argo M control surfaces at IBC2025, with the addition of a brand new powerful 48-fader console. Co...
13/09/2025
Harmonic's Software-Based XOS Advanced Media Processor Provides Unparalleled Efficiency and Unlocks New Business Models SAN JOSE, Calif. - Sept. 13, 2025 -...
13/09/2025
Researchers find brain region that fuels compulsive drinking Study by Scripps Research scientists shows how the brain learns to seek alcohol for relief, not jus...
12/09/2025
College Football Kickoff 2025: Fox Sports Ups Look as Canon, Sony Power Shallow ...
12/09/2025
ABC/ESPN Excited For WNBA Postseason Coverage In Revamped FormatThe Finals moves to a best-of-seven series in 2025By Mark J Burns, SVG Contributor
Friday, Sep...
12/09/2025
(L-R) Jade Croot, Rosy McEwen, and Bryn Chainey attend the 2025 Sundance Film Festival premiere of Rabbit Trap at Eccles Theatre on January 24, 2025, in Park ...
12/09/2025
For fans, we know how important it is to stay plugged into music culture and dis...
12/09/2025
Link ping, Sweden and Shipley, United Kingdom, September 12, 2025 - Agama, the expert in video observability and analytics for service quality and customer expe...
12/09/2025
IBC2025 began on Sept. 12, with exhibits and conferences running through Sept. 15 at the RAI Amsterdam Convention Center. Explore the full TV Tech coverage of t...
12/09/2025
The Best Fictional Bands (and the Artists Who Make Them Great) With Spinal Tap II: The End Continues hitting theaters and songs from KPop Demon Hunters ruling...
12/09/2025
Industry veteran Tom Baldassare has joined Advanced Systems Group, LLC (ASG), a technology and services provider for media creatives and content owners, as a Se...
12/09/2025
Maxon, maker of powerful, approachable software solutions for creators working in 2D and 3D design, motion graphics, visual effects, and more, today announced a...
12/09/2025
PlayBox Neo, a leading provider of media playout solutions, has partnered with AI-Media, pioneering developers of AI-powered captioning technology, to integrate...
12/09/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
12/09/2025
New alliance strengthens the IT channel in Germany and Switzerland in protecting business-critical SaaS data.
Keepit, the world s only independent, cloud-nativ...
12/09/2025
Fincons Group, an international IT business consultancy and systems integrator company with more than 40 years of experience in the market, is proud to announce...
12/09/2025
Following its acquisition of Telemetrics, EVS continues its push into robotics with an announcement at IBC2025 that it is acquiring XD motion....
12/09/2025
TORONTO The North American Broadcasters Association (NABA) has announced the appointment of Eduardo Ruiz Sanchez, deputy director, broadcast operations at Telev...
12/09/2025
Ed Miller, a longtime broadcast engineer in Ohio and a former national president of the Society of Broadcast Engineers, has died....
12/09/2025
AMSTERDAM At this year's IBC2025, the Advanced HDR by Technicolor initiative will be pushing broadcasters to adopt a more dynamic, frame-by-frame conversion...
12/09/2025
Feature-length retrospective from Studio Crook to air in 2026
Sir David Jason returns to the nation's favourite comedy channel, U&GOLD, for Open All Hours:...
12/09/2025
September 12th, 2025
Tribeca X and The Female Quotient to Host Powerhouse Em...
12/09/2025
Friday 12 September 2025
The Boomtown Rats, Nyah Grace, Soweto Kinch, Royal Ballet and Madness also announced to perform at the ceremony on Tuesday
Sky today ...
12/09/2025
Wuppertal September 12, 2025
Riedel Unveils Ultra-Light Bolero Mini Wireless Intercom BeltpackAt IBC2025 in Amsterdam, Riedel Communications unveiled Bolero M...
12/09/2025
Wuppertal September 12, 2025
Riedel Communications Acquires hi human interfaceRiedel Communications today announced the acquisition of hi human interface fro...
12/09/2025
Back to All News
New International Crime Series Road (WT)' Explores Twiste...
12/09/2025
Back to All News
First Look: Thai Crime Drama Everybody Loves Me When I'm ...
12/09/2025
Back to All News
Netflix Marks 10 Years in Japan, Announces Three New Series Th...
12/09/2025
CORE+ virtually removes distortion, setting a new standard for church sound and giving worship teams the clarity and confidence they need.
Read the full artic...
12/09/2025
The Late Late Show is back with a bang after the summer break, and Patrick Kielt...
12/09/2025
The World Athletics Championships, Ireland v France in the Women's Rugby World Cup quarter-final, the Irish Champions Festival, and two Sports Direct Men...
12/09/2025
The Records Show starts Sunday at 6.30pm on RT One and RT Player.
Katie Hanno...
11/09/2025
Report: Busy Live Sports Streaming Execs Have Low-hanging Fruit' in Front o...
11/09/2025
Inside Game Creek Video's Big Week as Ovation, Flagship Make NFL DebutsBy Ken Kerschbaumer, Editorial Director
Thursday, September 11, 2025 - 7:00 am
Pr...
11/09/2025
NFL Kickoff 2025: Prime Sports Starts New Season at Lambeau Field; Sets Sights o...