Combining artificial intelligence and citizen science to improve wildlife surveys

Migratory species play a key role in the health of the Serengeti ecosystem in East Africa, but monitoring their populations is a time- and labor-intensive task.
Scientists studying these wildebeest populations compared expert observer counts of aerial imagery to corresponding counts by both volunteer citizen scientists and deep learning algorithms.
Both novel methods were able to produce accurate wildebeest counts from the images with minor modifications, the algorithms doing so faster than humans.
Use of automated object detection algorithms requires prior “training” with specific data sets, which in this case came from the volunteer counts, suggesting that the two methods are both useful and complementary.

A research team testing the capacity of both citizen scientists and machine learning algorithms to help survey the annual wildebeest migration in Serengeti National Park in Tanzania found that both methods could produce accurate animal counts, a boon for park managers.

The iconic migration of 1.3 million blue wildebeest (Connochaetes taurinus) and 250,000 common zebra (Equus quagga) between Serengeti and the Masaai Mara National Reserve in Kenya is the largest terrestrial animal migration on Earth.

Hundreds of thousands of wildebeest (), plus tens of thousands of common zebra and other grazing antelope migrate seasonally across Serengeti National Park in Tanzania to find fresh grasses. — Over one million wildebeest (*Connochaetes taurinus*), plus tens of thousands of common zebra (*Equus quagga*) and other grazing antelope migrate seasonally across Serengeti National Park in Tanzania to find fresh grasses. Image by Sue Palminteri/Mongabay.

The migration of so many herbivorous animals affects drives other biological process in the grassland ecosystem, including soil nutrient cycles, the balance of trees and grasses, and the abundance of insects, birds and carnivores. The population trend of the wildebeest in particular reflects levels of bushmeat poaching, disease, and other human disturbance. Understanding the health and dynamics of the migration is thus of key interest to both researcher and Park managers, yet the sheer numbers of animals have challenged monitoring efforts.

“The major driving force in the Serengeti’s ecosystem is the abundance of wildebeest,” said senior author Grant Hopcraft of the University of Glasgow’s Institute of Biodiversity Animal Health & Comparative Medicine, in a statement. “[The wildebeest] influence almost every variable in the ecosystem – everything from the return rate of fires, since they eat the grass, to the amount of insects that are available to migrating birds. Without wildebeest, the ecosystem would shift into a completely different state, and therefore it’s important to know how many there are.”

Scientists have most commonly estimated the population size of these migrating species by flying aerial transects, taking thousands of photographs, and counting the animals seen in the images. From these counts, they statistically estimate the density of animals in the region to come up with an overall population size.

However, the labor-intensive task of manually counting the aerial images can take three or four skilled counters several weeks to complete, limiting team’s ability to make a timely, accurate population estimate.

Where the wildebeests are

The research team, comprised of researchers from the University of Glasgow in Scotland, the University of Cape Town in South Africa, the Field Museum of Natural History in the USA and the Tanzania Wildlife Research Institute, tested whether two newer strategies for analyzing images – citizen scientists and automated object detection algorithms – could speed the counting process while maintaining accurate wildebeest population estimates.

During the migration, wildebeest herds stretch as far as the eye can see, yet individuals also rest under trees, complicating aerial counts. Image by Sue Palminteri/Mongabay.

The researchers applied each method to the task of counting the animals in images from the 2015 Serengeti wildebeest count. A camera mounted in the floor of a small survey airplane took images every 10 seconds for over 2,000 kilometers (1,243 miles) of transects, producing nearly 1,600 images.

The team first engaged more than 2,200 volunteers through the online Zooniverse platform, which pairs remote volunteers with research projects. Some 1.6 million people from around the world have registered to help collect, assess and verify information for studies that might not be possible or practical otherwise, such as counting penguins or detecting rainforest trees photographed from drone-mounted cameras. The platform thus educates volunteers while providing crowd-sourced data to research projects in the form of thousands of citizens willing to perform basic data processing tasks en masse.

Zooniverse’s Serengeti wildebeest count project provided an information page, a field guide to help volunteers identify wildebeest and other animals from above, and a group of images. Each volunteer could click on the images to indicate where they thought a wildebeest was present and the platform recorded the pixel location of each click. Fifteen volunteers counted each image: aggregating multiple estimates of a quantity, even by non-experts, has been found to converge on the true value.

Volunteers could view the overall project’s progress on a statistics bar on its home page, as well as how many wildebeest they counted, and the pixel locations of each identification.

How to mark a wildebeest – the research project’s site on the Zooniverse platform provides volunteers with guidelines for reviewing and processing the image data. Image courtesy of Zooniverse.

Availability of training data sets, such as those from citizen science projects, have helped researchers apply automated computing methods to count objects. This study used a subset of images to “train” a pair of deep convolutional neural networks (DCNNs), which extract relevant features in the images directly from training data, to identify potential wildebeest locations.

The researchers also used the citizen science data from Zooniverse as a filter, discarding any objects considered by the algorithms to be a wildebeest that were not also identified by the volunteers. After several modifications to the algorithm code, they ran the algorithm to count the animals in 1,000 randomly selected survey images.

They compared the results of the two methods for 1,000 images to counts made by an expert wildebeest counter, considered the “gold standard” for population estimation. Despite recognizing potential bias in a single expert’s count, the research team assumed it to be the true number of wildebeest in each image.

Without a clear understanding of population abundance, said lead author Colin Torney of the University of Glasgow’s School of Mathematics and Statistics in the statement, “it’s much harder for [wildlife managers] to see the early warning signs of a decline caused by changing habitats or increased levels of poaching and start to take the proper steps to redress the imbalance.”

Producing “highly accurate” counts

The test showed that both citizen science and deep learning algorithms can produce accurate image counts.

As a group, the Zooniverse volunteers showed a systematic tendency to undercount the wildebeest in the images, indicating that volunteers were more likely to miss a wildebeest than incorrectly identify some other animal or object as a wildebeest.

In their paper, the researchers suggested that providing volunteers with a field guide to identifying wildebeest helped prevent overcounting, but that distraction or losses in concentration that lead to undercounting were more difficult to prevent. The researchers were able to address the undercounts by excluding (filtering) the lowest five of the 15 counts of each image. The average of just the 10 highest counts closely approximated the expert’s count.

They cautioned, however, that this same count bias and the approach that eliminated it might not apply to other studies and “there will need to be a rigorous process of validation before a citizen science count could be used as the sole basis for a population estimate.”

The cumulative image counts for three wildebeest counting methods. (a) The mean, median and filtered mean (just the top 10 counts) for the Zooniverse count data compared to the expert count. The shaded region indicates the cumulative count that would have been recorded if the highest or lowest counts for each image were used – in other words, substantial error. (b) The DCNN count compared with the expert count, as its 1.7 wildebeest per image miscount was not systematic in any direction. Image is Figure 3 in Torney, et al (2019). “A comparison of deep learning and citizen science techniques for counting wildlife in aerial survey images,” published in Methods in Ecology and Evolution.

With the minor modifications and training, the computer algorithm produced a “highly accurate” wildebeest count, recording 20,631 animals compared to the expert’s count of 20,489. The algorithm did miscount by an average of 1.7 wildebeest per image, though it lacked a systematic counting bias that resulted in a total within one percent of an expert count.

Moreover, the researchers wrote in their paper, it worked far faster than humans can review so many images. “The 1,000 images can be processed in under 2 hours, meaning every future census could be counted within 24 hours. Hence, a process that currently takes 3–6 weeks, involving 3–4 wildlife professionals and countless cups of tea, can potentially be replaced with an automated system that runs overnight.”

Complementary methods

Nevertheless, DCNNs, like other machine learning algorithms, need to be “trained” to understand the task at hand. Finding sufficiently large training data sets is, the authors write, the “greatest challenge” for implementing these algorithms for specific conservation tasks, such as counting wildebeest, penguins, or flowering trees. In this case, the Zooniverse citizen scientists provided the necessary wildebeest training data that made the use of the algorithm possible.

Wildebeest migrating through Serengeti National Park move at various densities. Counting the individuals in dense groups or reviewing seemingly empty images can be a challenge for non-expert reviewers. Image by Sue Palminteri/Mongabay.

The authors say in their paper that new data collection technologies, such as camera traps and drone-borne cameras, might also be able to help scientists build training data sets, expanding the niche of automated image data processing. “Our results show that deep learning algorithms are now at a state where they can legitimately replace manual counters and remove a large burden from conservation organisations.”

Citation

Torney, C. J., Lloyd‐Jones, D. J., Chevallier, M., Moyer, D. C., Maliti, H. T., Mwita, M., … & Hopcraft, G. C. (2019). A comparison of deep learning and citizen science techniques for counting wildlife in aerial survey images. Methods in Ecology and Evolution. https://doi.org/10.1111/2041-210X.13165