
-- --
Facebook
Twitter
Google
Pinterest
SCREEN AFRICA EXCLUSIVE: A while back I was handed a bunch of very long audio files and was tasked with cutting about 60 hours of interviews, sound bites and voice over down to a 30-minute radio piece. Easy I thought, as long as I could get the material transcribed quickly and at reasonable cost and there my journey of discovery began.
Voice to text transcriptions have long been used in the media, medical and legal industries, traditionally done by human transcription teams. It's big business, but turn arounds can be slow and files often need a second error check to make sure the content is accurate. The cost of transcription actually wasn't a factor in my case - it was speed that I needed, I needed a machine to plough through my audio files and spit out a transcript so that I could search for key words and edit my story together.
-- --
Trolling the internet, I instantly found a few options and uploaded the same test file to all of them (as part of my free trials) but had really disappointing results ranging from 25 to about 59 per cent accuracy. Simple words and phrases were being interpreted as something completely different, more complex words like Fakarava Atoll came back as expletives! At first I thought that as the majority of interviews were heavy New Zealand accents that might be the problem but a snippet of the best British accented guest gave me similar results.
Through my work in the video world, I am aware that there is a lot of research and development in the transcription arena utilising Artificial Intelligence (AI). Machine learning works best when it is processing large analysable data sets like text. But most of the data being produced in the world right now is not text, it's the spoken word embedded in video and audio recording and thus the goal for AI developers to produce a reliable voice transcription process has intensified.
Tech companies like Apple, Google, Microsoft and Amazon are all actively involved in this space and have been researching voice recognition since the 90s and that research has only accelerated with the emergence of virtual assistants like Alexa, Cortana, and Google Voice and Siri. However, most people who use Siri or Alexa would agree that, while those tools do a reasonable job of understanding you, most of us wouldn't trust them with our lives. I asked Alexa where the Fakarava Atoll was and her response was, I would rather not answer that question. (Out of interest it is in Tahiti and is not a swear word!) A voice assistant like Alexa only needs to work out which, of a predetermined list of vocal commands is being asked, whereas a transcription programme needs to listen for and capture all spoken words and this wider variety of possible inputs and outputs makes it a more difficult task for AI.
Whilst stumbling around for my transcription answers I came across an article published by a team of Data Scientists and enthusiastic entrepreneurs, Ashutosh Trivedi and Anup Gosavi, who recently founded a company called Spext. Trivedi, based in Bangalore India, has deep interest and post graduate expertise in AI and has published his research in many IEEE journals. Gosavi is based in San Francisco and specialises in Design Thinking and Information Visualisation.
Spext describes their company name as a fusion of the words speech and text, and from the outset they looked like they could offer me exactly what I wanted and more. The service can best be described as a combined voice transcriber and media editor. You upload your audio files and the system automatically converts voice to text and displays the result in an edit window where it aligns the audio content with the text accurately and that means you can now do some amazing things with the resultant files. Not only do you get a full transcription of your work but you can edit the transcript, like you would on a word processor and then export the result as a new audio file. Obviously you can't create new sentences but the ability to edit and output the existing data as an audio file is a huge plus. It looks like a normal text editor and has familiar actions like copy-paste, cut-paste and I found editing by transcriptions on the fly extremely easy. When you are done you export your work as a word document, pdf and/or a new mp3 or wave file or even export your project to professional editing tools such as Adobe Audition, and Final Cut Pro for fine tuning.
The most important result is that the files I used to test other systems uploaded into the Spext system quickly and came back blazingly fast with a resultant accuracy of 96 per cent in my case. The system had even correctly punctuated the transcription, coped well with proper nouns like the names of fish species and fishing techniques but it too also battled with transcribing Fakarava (expletive) but at least recognised the word Atoll. It took no time at all to quickly manually edit any corrections. What could have taken weeks in production will easily get done in a matter of days now, artificial intelligence seems to have finally reached the point where transcription of audio by a machine works efficiently enough to make it viable and as researchers and companies improve and refine their algorithms, it seems evident that transcriptions will become even more accurate and the potential productivity savings of automated transcription will be hard to ignore. Someday soon, we might even be headed towards a world where audio files and text are thought of not as two distinct media types, but as two formats for the same content - as interchangeable - and as convertible as an .mp3 and a .wav file or a text file and a Word document.
The guys at Spext have used a unique combination of intuitive user experience design, to make it easy for the user, and advanced machine learning tha
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
02/05/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
01/05/2026
January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...
01/04/2026
January 4 2026, 18:00 (PST) DOLBY AND DOUYIN EMPOWER THE NEXT GENERATON OF CREATORS WITH DOLBY VISION
Douyin Users Can Now Create And Share Videos With Stun...
13/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Chyron unveils PRIME 5.3, the latest software release of the company's powerful engine for live production graphics. PRIME 5.3 delivers the first official i...
12/02/2026
The vendor's VP of Product Management explains how quality assurance, monito...
12/02/2026
LTN announces the appointment of three experienced executives to lead its new Technology organization: Michal Miskin-Amir as EVP and Head of Technology, Jonatha...
12/02/2026
Riedel Communications has officially opened a new office in Kuala Lumpur, Malays...
12/02/2026
Grass Valley has won a competitive NATO-wide tender to provide the new camera system for NATO's main broadcast studio at its Brussels headquarters. The proj...
12/02/2026
Canon U.S.A announces that the vast majority of broadcast lenses utilized on the NBC live broadcast for the Big Game between New England and Seattle on Sunday w...
12/02/2026
The National Basketball Association (NBA) and NBC Sports announce the entertainm...
12/02/2026
The International Olympic Committee (IOC) announces that beIN MEDIA GROUP ( beIN ), the leading global sports, entertainment and media organisation, has secured...
12/02/2026
The Big 12 Conference and ASB GlassFloor introduces a full LED video sports floor that will debut at the 2026 Phillips 66 Big 12 Men's and Women's Baske...
12/02/2026
ESPN announces Year of the Super Bowl, a sweeping 12-month, multi-platform cel...
12/02/2026
Continuing its commitment to serving the faith-based broadcast and live event community, mobile production company TNDV, a division of Live Media Group, will hi...
12/02/2026
The production team of the long-running German investigative series Achtung Abz...
12/02/2026
Vizrt announces the launch of four Campus Stadium Production Bundles, designed t...
12/02/2026
At NAB Show, LiveU will showcase its broadest IP-video EcoSystem to date, design...
12/02/2026
Welcome to the Sports Video Group's new interview series, Follow the Money, ...
12/02/2026
400 Gbps of bandwidth, layered redundancy, and mobile-first connectivity powered...
12/02/2026
Valentine's Day often comes with a soundtrack. In fact, Spotify data shows that more people used Blend, our shared playlist feature, on February 14, 2025, t...
12/02/2026
Some days you want your music to reflect a specific feeling, memory, or vibe that goes beyond a single artist or genre. You want to do more than listen. You wan...
12/02/2026
Our Medicine S2: Frontline Medicine Through A Blak Lens
12 February, 2026
Media releases
A Bigger, Bolder Second Series showcasing First Nations Frontline ...
12/02/2026
L3Harris' VAMPIRE system fires Thales Belgian-made 70 MM rocket from an FZ60...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
The production team of the long-running German investigative series Achtung Abzocke recently upgraded its cameras for the show's 12th season. The objectiv...
12/02/2026
Leading provider of video streaming solutions, Bitmovin, has appointed Ian Baglow as Co-CEO alongside existing CEO and Co-Founder Stefan Lederer. Under this str...
12/02/2026
Vizrt, a leading viewer engagement platform and a trusted expert in live production technologies, today announces the launch of four Campus Stadium Production B...
12/02/2026
Strategic agreement to deliver S3 cloud storage in Switzerland with full data sovereignty and local control including at the level of individual cantons plu...
12/02/2026
Mad About Video is a leading specialist in video for live events and installations throughout Malta. In operation since 2011, it has evolved from a company focu...
12/02/2026
JAGGAER, a global leader in digital procurement and supplier collaboration solutions, today announced the successful delivery of a procurement digitalization pr...
12/02/2026
At NAB Show, LiveU will showcase its broadest IP-video EcoSystem to date, designed to help broadcasters and content creators embrace digital first operations, d...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/02/2026
The six-part crime drama, created by Claire Oakley and produced by Little Door P...
12/02/2026
Wuppertal February 12, 2026
Riedel Opens Kuala Lumpur Office to Strengthen Glo...
12/02/2026
Back to All News
Netflix unveils the trailer for That Night
Entertainment
12 February 2026
GlobalSpain
Link copied to clipboard
WATCH THE TRAILER
DOWNLOA...