Response to critique on Conservation Effectiveness series (commentary)

Zuzana Burivalova

6 years ago

The team behind Mongabay’s Conservation Effectiveness series appreciates the feedback on our series offered by Madeleine McKinnon and her colleagues. We believe that we and the authors of the commentary share the common goals of encouraging and enabling conservation actions based on the available scientific evidence, and increasing the standard of scientific studies that evaluate the impact of conservation.
Importantly, our goal was not to carry out a systematic review — an intensive, sometimes years-long process beyond the scope of our resources. We believe that systematic reviews are invaluable and crucial for answering specific, relatively narrow research questions. At the same time, they are not suitable for providing an overview of evidence of a wide range of outcomes, across a broad spectrum of evidence types, as we have tried to do with this series.
We cannot identify an example of our series challenging the findings of existing systematic reviews, as McKinnon and co-authors imply it does. We strongly agree that there are opportunities for improvement. One of the main improvements we hope to make next is turning our database into a dynamic, growing, open contribution platform.

The other members of the team behind Mongabay’s Conservation Effectiveness series and I appreciate the feedback on our series offered by Madeleine McKinnon and her colleagues. We believe that we and the authors of the commentary share the common goals of encouraging and enabling conservation actions based on the available scientific evidence, and increasing the standard of scientific studies that evaluate the impact of conservation.

Before addressing the specific points made by McKinnon and her co-authors, we would also like to emphasize that our series and visualizations have additional goals:

• To make scientific evidence accessible to non-scientists.

• To increase the ease with which practitioners can orient themselves in and interact with scientific evidence in order to make informed opinions given the limited time they have.

• To demonstrate to a broad audience the complexity of scientific evidence and the different ways in which conservation success can be viewed.

• To inspire discussion about what conservation success means for different stakeholders, beyond scientists.

Importantly, our goal was not to carry out a systematic review — an intensive, sometimes years-long process beyond the scope of our resources. We believe that systematic reviews are invaluable and crucial for answering specific, relatively narrow research questions. At the same time, they are not suitable for providing an overview of evidence of a wide range of outcomes, across a broad spectrum of evidence types, as we have tried to do with this series.

Bias

First, we disagree that the alternative to a systematic review is “cherry-picking results to fit a desired narrative.” There are many known and unknown biases in scientific research and publication; some can be addressed, others cannot. Reviews, including systematic ones, can suffer from different degrees of bias. Our series is absolutely not a collection of studies cherry-picked to fit a certain narrative. When we did go beyond our review methodology and subjectively selected specific studies to include, such as in the story on Environmental Advocacy, we acknowledged it openly and clearly conveyed our reasons for doing so. One of the main conclusions of all pieces in the series was that “more evidence is needed.”

The authors of the commentary highlight the non-exhaustiveness of our database as a bias. Our approach of sampling the literature rather than attempting to gather every last relevant study — that is, our non-exhaustiveness — does not equal being biased per se, although it, like any other sampling, can introduce biases. For example, as we acknowledge on our methods page, we may have introduced bias by only including English-language and peer-reviewed publications. Smaller samples are more prone to biases than larger ones, and we believe that our target of 1,000 search results was a reasonable sample size that would lead to an acceptable level of bias. Moreover our goal was not to carry out systematic, exhaustive reviews, and we’ve clearly stated that our databases are not exhaustive in all of our stories.

Amazon rainforest tree in Peru. Photo by Rhett Butler.

It’s worth pointing out that even an exhaustive review of all literature is still likely to suffer from biases. For example, publication bias — where journals tend to publish studies with highly significant results rather than ones that, equally importantly, find no substantial change — can be quantified, but not truly eliminated.

Our criteria for inclusion of individual studies are described on the methods page. They included things like the study being peer-reviewed; the methodology being clearly described so that the study can be classified as one of the seven types of evidence; the study containing information on the country it examined, the outcomes it measured, what the outcomes were, what it compared the intervention in question to, etc.; the study fitting within our geographic scope; and others.

We have read the systematic review on decentralized forest management (DFM) that the McKinnon et. al. commentary suggests our methodological bias may have led us to overlook (Samii et al. 2015). However, this systematic review appears to be, in many parts of the text, a word-by-word copy of an earlier systematic review on payments for ecosystem services, apart from the acronym PES being replaced by DFM. The authors even failed to correct the number of studies found, leaving incorrect numbers in their abstract that did not correspond to the main text. We appreciate the work that went into this review, but we were worried about its rigor given the copy-and-paste warning signs. Nevertheless, we did go through this study in detail and included relevant individual studies that fit our inclusion criteria.

Transparency

We detail our methodology and criteria for including studies here. When two researchers reached different conclusions about whether to include a study, we mentioned it in the infographic within the squares corresponding to the study in question.

We acknowledge that Google Scholar is not an ideal search platform, due to the lack of transparency in its search algorithm and the recent change with regard to the use of Boolean operators. Until Google Scholar clarifies its search processes, we would recommend that researchers, scholars and journalists use additional databases, providing they have access to them. If they use Google Scholar, we recommend using “private” or “incognito” search settings to avoid potential biases.

We are hoping to open our platform to contributions by researchers, so that our database can be dynamic and grow at the same pace as the evidence base. We have already tested the platform’s documentation for making contributions on several scientists, and will continue to improve it so that anyone can transparently contribute.

Subjectivity

We agree that research synthesis is useful for translating large bodies of data into broad insights. Before we respond to the comments on the infographic, we want to emphasize that an important capability of our infographic is the ability to convey specific, geographically local insights. For example, for an NGO in Indonesia hoping to implement a PES project, it’s useful to consult a systematic review to see whether PES has worked overall. But it’s also important to be able to quickly access regional evidence, for example just from Malaysia and Indonesia, or evidence on a particular outcome of PES projects, for example the effect on biodiversity. Our infographic allows both of these functions.

We thank the authors for their comments on the visualization. It is difficult to represent conservation evidence and there are numerous pitfalls to avoid. The commentary raises two important points that we will address separately, one about interpreting outcomes as positive, neutral, or negative; the other about evidence types.

In our visualization, “vote counting” by adding up the number of green/positive, yellow/neutral, and red/negative squares is discouraged: at no point in the series do we engage in “vote counting,” the unequal weights of individual studies is emphasized in the caveats section in the methods, and we specifically warn against vote counting in the summary PDF documents:

“The majority of extracted data points do not imply causation, only correlation. Studies vary in the rigor of design, sample size, methodology, and scope. Therefore, data points (individual squares) cannot be summed or used to calculate overall effect! One red square does NOT cancel out one green square. Please use as a non-exhaustive map of existing scientific evidence rather than as a final verdict on whether PES is effective.”

That is indeed why we chose to portray each outcome as an individual square, rather than something like a bar chart or percentage, which would imply that vote counting or averaging were possible. We hope to encourage readers to explore individual results by clicking on squares, which should further bring home the message that not all squares are equal.

Additionally, the authors imply that we ignore “the wide array of impacts occurring within and between populations and time frames within a single study.” We do not. Where a single study examined different populations or different time frames, it is represented as multiple outcomes in our database and visualization. We emphasize again that this leads to individual squares in the visualization not being independent and underscores the inappropriateness of vote counting.

Finally, the authors argue that we are “giving equal weight to studies whether poorly designed or rigorously executed.” We do not. An important function of our visualization is to communicate that there are different types of evidence, and that these need to be treated and interpreted differently.

Rainforest in Borneo. Photo by Rhett Butler.

One level of distinction is represented by the light and dark shades of the squares (see legend). At a finer level, the drop-down menu “Select type of evidence” lets users separate different types of evidence within the visualization based on the rigor of the study design. However, our “types of evidence” categories capture only two aspects of the study design variability (the ability to show correlation versus causation, and the ability to generalize). It was beyond our scope to also distinguish between different sample sizes, durations, geographic areas, funding sources, etc.

Our visualization is not perfect. However, we hope that it is a step toward better communication of science to a broad audience, and we will continue improving and testing it.

False confidence

One of the main concerns the commentary raises is that we are overly and falsely confident in our results. In all our articles, we conclude that there is either not enough rigorous evidence, or not enough evidence overall, while acknowledging the different levels of rigor with which individual studies have been designed. Even so, we believe that potential overconfidence in this particular conclusion could lead to productive channeling of funding and research effort to fill these knowledge gaps.

Again, we do not do vote counting anywhere in the series, and we discourage readers from vote counting using our database. In terms of the forest certification example, we disagree that the article or visualization make conclusions that would be substantially different to conclusions made by the individual, rigorously designed studies mentioned in the critique. The handful of quasi-experimental studies (which can be displayed separately by selecting “Study III” in the “Select types of evidence” drop-down menu) find some positive and some neutral environmental and social outcomes. This reflects very well the findings of less-rigorous studies, which find some positive, some neutral, and very few negative environmental and social outcomes of certification and reduced impact logging.

Opportunities for improvement

We cannot identify an example of our series challenging the findings of existing systematic reviews, as McKinnon and co-authors imply it does. We strongly agree that there are opportunities for improvement. One of the main improvements we hope to make next is turning our database into a dynamic, growing, open contribution platform. We will certainly keep in mind the potential biases and pitfalls that such approaches present and will continue informing users about the limitations, with the vision of further narrowing the gap between science and practice in conservation in a rigorous and transparent way.