Site icon Conservation news

Machine learning tool helps prioritize plants for conservation

  • In a first global plant conservation assessment, a multi-institutional research team used the power of open-access databases and machine learning to predict the conservation status of more than 150,000 plants.
  • They paired geographic, environmental, climatic, and morphological trait information of plant species of known risk of extinction from the IUCN Red List with information on plants of unknown risk in a machine learning model. The model calculated the likelihood that a given unassessed plant species was actually at risk of extinction and identified the variables that best predicted conservation risk.
  • More than 15,000 of the species–roughly 10 percent of the total assessed by the team—had characteristics similar to those already categorized as at least near-threatened by IUCN and thus at a high likelihood of extinction.
  • The protocol could provide a first cut in identifying unassessed species likely at risk of extinction and suggest how to allocate scarce conservation resources.

If you know the animals in your neighborhood but not the plants, you’re not alone.

Scientists have documented nearly 400,000 plant species and expect to identify many more. But unlike well-known endangered animals, such as elephants, tigers, and parrots, we don’t currently understand the conservation status of more than 90 percent of the world’s plant species. Plant growth and communities drive the ecosystems, food chains, and agriculture on every continent, yet we don’t know the conditions that what causes them to thrive or disappear.

Unique desert plants, as well as giant redwood trees, help make California a region of globally high plant diversity. Image by Sue Palminteri/Mongabay.

Understanding how threatened a specific plant species is requires broad information on where it lives and what it looks like. But finding plants in the wild to determine where they are and where they aren’t requires time, money, and expertise.

A multi-institutional research team used the power of open-access databases and machine learning to predict the conservation status of more than 150,000 plants. In their study published last month in the Proceedings of the National Academy of Sciences, the team tested whether their machine learning algorithm could track patterns in plant locations, climatic patterns, habitat features, and morphologies – their form and structure – and use that information to identify species that were likely at risk of extinction.

“There is an urgent need for more efficient methods of identifying at-risk species,” the authors said in their paper. “To meet this need, we developed and evaluated a predictive protocol that permits a rapid initial assessment of conservation status for understudied plant taxa.”

“The basic habitat that all species rely on”

As humans convert grasslands, forests, and even deserts to food crops, plantations, and grasses for livestock, populations of native plants diminish, and species disappear. Scientists estimate that more than 20 percent of all (land) plant species are likely threatened with extinction. But this figure is still a guess.

The International Union for Conservation of Nature (IUCN)’s Red List of Threatened Species, which inventories threatened species and serves as a tool to prioritize species for conservation action, has so far assessed less than 10 percent of the more than 390,000 recognized plant species. As comparison, it has assessed all recognized bird and mammal species.

“Plants form the basic habitat that all species rely on, so it made sense to start with plants,” co-author Bryan Carstens, a professor of evolution, ecology and organismal biology at Ohio State University, said in a statement. “A lot of times in conservation, people focus on big, charismatic animals, but it’s actually habitat that matters. We can protect all the lions, tigers and elephants we want, but they have to have a place to live in.”

Flowers from southwestern Australia, a plant diversity and endemism hotspot. Image by Sue Palminteri/Mongabay.

The Red List assesses species’ likelihood of extinction and uses specific criteria to categorize them as of Least Concern, Near Threatened, Vulnerable, Endangered, Critically Endangered, Extinct in the Wild, and Extinct. Data Deficient species are potentially at risk but are too poorly studied to be categorized.

Scientists consider these IUCN Red List categories in setting conservation priorities and conducting environmental impact assessments used by industry. However, evaluating the conservation status of each additional species takes time and money, as well as expertise, resulting in many species with a high risk of extinction not being listed.

Plants typically receive less attention and support than large charismatic big cats and colorful birds, despite their importance to agriculture and ecosystems and their particular sensitivity to loss of habitat.

“Not having plants in those analyses means that people are working with incomplete datasets,” Anne Frances, a botanist who coordinates Red List efforts in North America, told Wired. “We’re determining key biodiversity areas without a big chunk of the biodiversity being taken into account.”

Scientists expect the loss of plant species, due to direct elimination or climate change, will lead to bottom-up cascading losses of the animals that depend on them.

Scientists succeed in describing roughly 2,000 new plant species every year, according to the “State of the World’s Plants” report, which just intensifies the problem of assessing their conservation status.

Random forests for plants

The researchers wanted to find a way to use new data processing technology to speed up the assessment process and make it more cost-effective.

They compiled open-source data collected by scientists over decades from the Global Biodiversity Information Facility (GBIF) and the TRY Plant Trait Database on the location, climate, environment, and appearance of 150,000 plants from across the world. These represent the nearly 95 percent of plant species in the GBIF for which we have data but have never been evaluated on by IUCN Red List.

The huge data base included observations of plants for which location coordinates were available in the GBIF but which had not yet been assessed by the IUCN Red List.

Many plants make their home in a prairie and adjacent tree hammock in the background in Kissimmee, Florida. Image by Sue Palminteri/Mongabay.

The researchers built a machine learning model to determine the traits associated with different levels (categories) of extinction risk, using the relatively small number of plant species already categorized by the IUCN Red List (so their conservation status was known) to “train” it. They then applied the model to predict the extinction risk of the 150,000 unassessed species, for which risk was unknown.

The researchers built their model using the Random Forest (RF) technique (perfectly named for plant assessments!). RF is a supervised machine learning algorithm, meaning it must be taught relationships between object attributes and outcomes; once it understands the relationships, it can predict outcomes using new input data.

In the case of plants, the model tested whether a given attribute — such as the plant’s latitude or longitude; elevation; soil type; rainfall; temperature; or distance from a road, town, or protected area — was associated with plants that were endangered. Based on the outcome (how endangered is the plant?), the researchers could decide which characteristics best predicted a plant’s risk of disappearing.

By comparing characteristics of the IUCN Red Listed plants having a known risk of extinction with characteristics of plants of unknown risk, the model calculated the likelihood that a given uncategorized plant species was actually at risk of extinction and thus in need of more in-depth evaluation. It also identified the variables that are the most important in predicting conservation risk.

Thousands of species at risk

The model indicated that more than 15,000 unlisted species, roughly 10 percent of those in the analysis, were at some risk of extinction and thus of conservation concern.

Globally, spatial characteristics, such as the size and extent of latitude in the species’ range, were better predictors of extinction risk than climatic or morphological ones, such as height or woodiness. Species with smaller ranges, for example, typically have smaller populations, which are more likely to go extinct than larger ones. Nevertheless, no one single global variable predicted conservation status.

Southwestern Madagasar’s spiny forest supports a variety of endemic plants. Image by Sue Palminteri/Mongabay.

The researchers used the results to identify areas with large numbers of at-risk plant species and suggested tools to help conserve these areas. For each observation in the analysis, they related the plant’s probability of being at some risk of extinction with its GPS coordinates. They calculated the average probability of risk for all GPS coordinates within each cell of a 1° × 1° grid covering the world and gave each cell a risk value.

Mapping the plant data revealed several major geographical trends in their model’s predictions. At-risk plant species tended to cluster in regions already known for their high plant diversity, including California, Central America, Madagascar, the southeastern U.S. and southwestern Australia. Several of these also harbor many endemic species, those that naturally occur nowhere else.

It also identified a few lesser-known areas for biodiversity, such as Tasmania and the coastal fog desert of the southern Arabian Peninsula. According to According to co-author Anahí Espíndola, an assistant professor at the University of Maryland, some of the most imperiled regions have received very limited attention, but the new method could help identify regions and species in need of further study.

The map shows predicted levels of extinction risk to more than 150,000 plant species. Warmer colors denote areas with larger numbers of potentially at-risk species, while cooler colors denote areas with low overall predicted risk. To inform conservation on a global scale, the researchers associated the probability of plants in the analysis being at some risk of extinction with their GPS coordinates. They calculated the average probability of risk for all GPS coordinates within each cell of a 1° × 1° grid covering the world. Image by Tara Pelletier and Anahí Espíndola.

In a statement, Espíndola said that the machine learning predictions couldn’t replace formal assessments from on-the-ground observations but could help identify at-risk species and regions for more in-depth study.

“This isn’t a substitute for more-detailed assessments,” Carsten echoed in a statement, “but it’s a first pass that might help identify species that should be prioritized and where people should focus their attention.”

When I first started thinking about this project, I suspected that many regions with high diversity would be well-studied and protected,” Espíndola said. “But we found the opposite to be true.”

The researchers hope the model will help target limited resources for habitat protection. And the map can assist future researchers in locating regions needing conservation efforts by pairing GPS coordinates with map’s risk probabilities.

“The model can be adapted for use at any geographic scale,” Espíndola said. “Everything we’ve done is 100 percent open access, highlighting the power of publicly-available data. We hope people will use our model–and we hope they point out errors and help us fix them, to make it better.”


Pelletier, T. A., Carstens, B. C., Tank, D. C., Sullivan, J., & Espíndola, A. (2018). Predicting plant conservation priorities on a global scale. Proceedings of the National Academy of Sciences, 115(51), 13027-13032.

FEEDBACK: Use this form to send a message to the editor of this post. If you want to post a public comment, you can do that at the bottom of the page.

Exit mobile version