
Monday, September 20, 2021 - 10:41
Print This Story
Salsa Sound co-founder and CEO, Rob Oldfield
Regular readers of SVG Europe will be well aware of how innovative technology companies working in sports broadcasting can be, but MIXaiR from Salsa Sound looks likely to set a new bar for live audio and the craft of sound mixing.
When the pandemic caused football games to be played in silent stadiums Salsa Sound came to the rescue of broadcasters with its vCrowd real time virtual crowd atmosphere system, which rightly won plaudits. However, MIXaiR is based on technology that the company has been working on for several years, through academia, R&D, beta versions and a soft launch of the system a year ago.
Our company was founded in 2017, but prior to that we were part of the University of Salford, where we were working on a lot of innovative audio techniques, says co-founder and CEO, Rob Oldfield. In particular we were looking at how to leverage artificial intelligence (AI) to recognise sound events and then create the best possible mix, not based on tracking but just based on what is actually [captured by] those microphones.
Fast forward to the future, next month in fact, and the new version of MIXaiR, an AI-driven system that automatically creates and enhances audio mixes for live broadcast, is set to be released.
MIXaiR 2.0 is jam packed with new features and a much more intuitive, easy to use interface, says Oldfield. The idea for v2.0 is you put all your microphones feeds in and then MIXaiR creates different submixes. So you might have a crowd, a commentary, or a pitch mix or aux in, and MIXaiR will automatically balance the levels between them, apply some processing, then create the best possible mix out of it without any human interaction, other than setting it up in the first place.
Hard tackle
According to Oldfield, the hardest mix, by a country mile, is the pitch mix. Historically, it's such a dynamic process by engineers and difficult to replicate, he explains. To make sure you've always got the nearest microphone active in the mix at any one point, requires constant attention and raising and lowering of faders. It's a really clever balancing act.
Rather like the players whose kicks it tracks, MIXaiR's AI has gone through an extensive training regime. Using machine learning technology, Salsa Sound has been training MIXaiR with many hours of content, microphone recordings and mixes across leagues.
People don't realise how hard it is to mix really well. That's why we've gone through pains to get an AI solution that takes the strain from some of the really difficult grunt work of mixing
We know what makes a great mix, says Oldfield. The AI is constantly analysing all of the sounds it is hearing. We tell it what to make of these sound events and it learns what's a kick, what's a whistle, what's a ball bounce, so that when it sees or hears sound in the wild, it can make an intelligent decision based on it.
Oldfield says MIXaiR delivers an even pitch mix, without any slightly awkward transitions and unbalanced crowd noise.
Our approach is to have the AI listen to those live microphone feeds, and when it hears a sound that's interesting like a kick, whistle, ball bounce or hitting the crossbar, you name it it will automatically add that microphone feed into the mix. It tracks the game around, always choosing the microphone that's closest to the action, and [performs] a seamless transition between them.
People don't realise how hard it is to mix really well, he adds. That's why we've gone through pains to get an AI solution that takes the strain from some of the really difficult grunt work of mixing. It can enable these guys to explore creative avenues and to create more innovative content. When you're not completely locked into your screen trying to create the best mix, you can actually lean in to MIXaiR a little bit more, let that create the stems and then you can have a bit of cognitive space to craft a mix, rather than chasing a mix.
Salsa Sound's MIXaiR GUI
The commentary and crowd noise are dynamically processed to provide mix components. These sub mixes are basically ingredients that you can throw into the output mixes, explains Oldfield. Then it's just a drag and drop process. You can create whatever output mixes you want, as many as you want, in different formats. You can have international sound, a French mix, a German mix, a Spanish mix, or whatever, and within that you can have different flavours: the stereo mix, the 5.1 mix, the mono mix. The only limitation is the number of channels that you've got available on your output board. You can also apply VST plug-ins to add your own compression or EQ or effects to the submixes, so it becomes a creative tool, allowing you to craft your mixes.
Importantly, [all mixes] are loudness normalised, he adds. If you're streaming on YouTube, or your own video on demand platform, or when it is for broadcast, they all have different loudness requirements. So within MIXaiR, you just decide which requirement you want, click on it and it will ensure that the mix adheres to those standards, so you don't end up with a fine from the regulator for making too much noise, or making not enough noise.
MIXaiR users also have an option to enhance the pitch mix with pre-recorded sounds. When somebody whacks the ball in the middle of the park, the [sound supervisor] can't even hear it because it's a long way away from the microphones, says Oldfield. However the AI can pick up a kick in the middle of the pitch. The algorithm can detect a kick, go to a bank of pre-recorded kicks and pick out the most appropriate one, and then