How the BBCs R&D team developed open source machine-learning framework network YOLO (You Only Look Once) for the BBCs AutumnwatchBy Philip Stevens
Published: October 22, 2020
How the BBC's R&D team developed open source machine-learning framework network YOLO (You Only Look Once) for the BBC's Autumnwatch
target=_blank title=Share on LinkedIn class=share-linkedin> How the BBC's R&D team developed open source machine-learning framework network YOLO (You Only Look Once) for the BBC's Autumnwatch
https://www.tvbeurope.com/production-post/bird-watching-without-binoculars title=Share via Email class=share-email>
One of the major problems with making natural history programmes is the fact that the stars' will not take direction from the production team. Try getting a badger to appear on cue, or a bird to feed its young when the programme goes live, and you will appreciate the difficulty. And yet it is these principal characters that the audience wants to see. And the viewers are interested - programmes such as the BBC's Springwatch and Autumnwatch prove the point.
So-called camera traps have, of course, been around for some time. Basically, these are fitted with motion sensors that enable the camera to start shooting when action is detected. However, that action might be a tree moving in the background - and that could mean a memory card filled with brilliant shots of branches moving to and fro - but little sign of wildlife.
Beyond that, many hours are needed for checking through the footage from upwards of 30 Springwatch cameras to determine what could and could not be used.
Using technology To make the work of natural history production teams a little easier, the BBC's Research and Development team has been working on a solution. And it involves Artificial Intelligence (AI) and Machine Learning (ML). Trials took place during the 2019 Springwatch and Autumnwatch productions, with the technology being used more fully for this year's programming.
We became involved in this project when a natural history producer approached us some time ago, explains Robert Dawes, senior research engineer, BBC R&D. He asked us to look into AI technology to see how we could improve the performance of those sensors and use these resources more effectively.
Dawes continues, We did some initial Artificial Intelligence work using computer vision processing techniques. This involved building a rig that had a camera attached to a small Raspberry Pi computer. This computer monitored continuously the camera's output and enabled us to determine when birds or animals appeared in shot. By using the computer vision techniques, we were able to filter out unwanted triggers such as moving trees. But because ML was not involved, it still did not tell us what kind of animal or bird was there.
Obviously, more was needed and when the R&D team became involved with the Springwatch operation it provided the opportunity they had been looking for. By cabling the local remote wildlife cameras to the OB unit, power was immediately available. This opened up possibilities for using higher-powered computer technology to help with the monitoring purposes.
We wanted to create a system of tools that will keep an eye on the multiple cameras at the same time. This needed to be quite sophisticated and required a method to trigger recordings. This was not only helpful for the live programme production teams, but also for the digital output that allows viewers at home to watch cameras 24 hours a day.
As it turned out, the solution that Dawes and his team devised not only worked for the BBC cameras, but for those operated by third party wildlife teams - such as the RSBP (Royal Society for the Protection of Birds) across the United Kingdom - to which the broadcaster has access. Our answer removes the need for someone to be monitoring all these cameras on a continual basis.
So, how does it work? Dawes explains that an open source machine-learning framework network called YOLO (You Only Look Once) was employed. This technology enables the system to recognise objects. For example, if it is used in an office, it can be taught to identify a chair, a monitor, a fridge or a person. In the natural history application, we can teach it to recognise different types of creatures and then put a box around that object. Once that box is in place, it is possible to track where that animal, bird or whatever wanders around the screen.
To enable the system to learn' about the animals, multiple stills of the creatures in question are fed into the computer. These still can run into many thousands as it is important for the system to recognise the subject from many different angles. The computer uses the images to train a system known as a neural network , loosely modelled on the structure of the brain, to recognise what those kind of objects look like. This is a good example of Machine Learning. When the system sees' an object that it recognises as being an animal, it tracks that creature in real time on live video. The set-up means that it will be a creature that is being tracked rather than something else that is moving within the camera's view. In other words, producers can advance from knowing something has happened in the scene' to a creature has moved in the scene'.
In each case, it takes up to three days to import all this basic data to train the neural network. One benefit is that all this can be carried out on a powerful domestic PC - it doesn't require a million-dollar system, emphasises Dawes. It also means it is easy to change the images the system needs to recognise as and when the circumstances change. Clearly, if that process took six months, the system would not be viable.
Creating the data The technology also generates data about the actions so t










