New toolkit identifies multiple species from environmental DNA

Researchers have developed a DNA analysis toolkit designed to speed the identification of the multiple species in a biological community by analyzing environmental DNA from a sample of water or soil.
To confirm the presence of a species at a site, the tool compares its genetic barcode (short DNA sequence) to barcodes of known species in one of several reference databases.
The toolkit’s advantage is its ability to quickly process many barcode sequences, at multiple analysis locations on the gene, that enable it to identify the species of the DNA sequences of many organisms at the same time.

Researchers from the University of California Los Angeles (UCLA) and CALeDNA have developed a toolkit designed to quickly identify the species in a biological community by simultaneously analyzing the environmental DNA (eDNA) from multiple species from a single analysis of a sample of water or soil. Their aim is to eliminate the need for researchers to sort and process multiple eDNA sequences independently, thus saving time and money.

They published a description of the open-source software tool, called the Anacapa Toolkit, as well as results of a field test in the kelp forests off southern California.

Kelp forest at Anacapa Island off southern California. Image by Dana Roeber Murray via Flickr. CC-BY-NC-SA 2.0.

eDNA is the genetic material shed by animals and plants into the surrounding ecosystem, usually water or soil, through their skin, scales, feces or pollen. eDNA has proved increasingly useful for identifying particularly aquatic species found at a given site. A single one-liter (one quart) sample of water can contain eDNA for many species and is a non-invasive means of collecting data.

The software toolkit is a series of modules that can analyze DNA sequences from multiple locations (loci) on the genes extracted from the eDNA in the sample and compare them to a customized reference database of sequences of known species. It produces a spreadsheet of all the species found in the sample for which it has a known reference sequence.

Lead author Emily Curd said the outputs of eDNA research in the Anacapa toolkit are standardized and eliminate many of the human steps and potential missteps that previous tools include. “When you compare our results against previous studies, we do a lot better capturing the biodiversity that’s out there,” Curd said in a statement.

Scientists use genetic barcoding, analyses of short DNA sequences from a specific point on the gene, to identify a species by comparing its barcode to a database of known barcodes. Research teams have since developed metabarcording analyses that allow them to analyze the barcodes of many species at the same time and determine which species are present in the sample.

A treefish, a California native, at Anacapa Island. eDNA from water samples allow researchers to detect the presence of individual species from the scales or skin they leave behind, even if the animal is no longer in the area. Image by Dana Roeber Murray via Flickr. CC-BY-NC-SA 2.0.

In developing the new tool, the researchers recognized three main challenges to accurately and reliably identifying species using eDNA:

eDNA studies often sequence multiple loci on the genes of a given sample because plants, fungi, and various animal species are each best detected using different loci, but researchers currently must process each of these independently;
a lack of curated reference databases for all of the loci the researchers want to analyze for all the potential species in a water or soil sample of a given site hinders identification;
current metabarcode pipelines (a series of steps, or workflow) often discard large portions of sequence data that are potentially useful for identifying the taxon (e.g. species) of a sequence that doesn’t fully align with reference sequences.

To use the toolkit, a research team first collects a water or soil sample and extracts DNA from it using standard techniques that produce genetic sequences of the various life forms in the sample.

Marine invertebrates of California’s Channel Islands. The toolkit analyzes the eDNA collected in water samples of the multiple organisms that form a biological community. Image by Ed Bierman, CC 2.0.

“It’s amazing how sensitive this technique is,” said co-author Zack Gold, referring to the team’s experience that DNA from fewer than a few dozen cells is enough to detect an organism’s presence in a sample.

The users upload these genetic sequences of yet-unknown species to the toolkit, which compares them to a genetic reference library of sequences with known identities. This comparison allows the tool to process the barcode sequences from the eDNA in the sample and identify the species associated with each barcode.

The tool customizes the reference database for each analysis using information that the research team provides on the organisms that might be in their sample.

The researchers do this by inputting primers for species or higher taxa of interest. A primer is a short nucleic acid (DNA or RNA) sequence from a particular location on the gene that provides a starting point for DNA amplification and synthesis. Synthesis of this existing strand of nucleotides primes, or provides a foundation for, synthesizing the DNA collected in the study sample.

The toolkit’s Creating Reference libraries Using eXisting tools (CRUX) module generates custom reference databases based on these user-defined primers by querying public databases, such as GenBank and the European Molecular Biology Lab (EMBL) nucleotide database, to find known sequences for the organisms associated with the user’s selected primers.

The toolkit, which is freely available, offers access to several reference databases to complement the user-customized reference database, and users can add their own sequences to their database.

The toolkit’s advantage is its ability to quickly process the many barcode sequences, at multiple analysis locations on the gene (multiple loci), that allow it to identify the species of the DNA sequences of many organisms at the same time.

Environmental DNA is often dilute or partially degraded, so the toolkit trims and processes sequences, eliminating poor-quality sections and separating sequence files from the various loci within each sample. It categorizes the sequence files by quality, and its classifier identifies the species associated with each sequence by comparing them to the sequences with known identities in the reference database. It produces a spreadsheet of sequences and species, plus reports on the identification.

The researchers tested the toolkit on 30 samples of seawater from southern California’s kelp forests and found it captured a greater diversity of sequences and species than published reference databases.

The taxonomic assignments (identifications) from the research team’s test samples collected from seawater off southern California, highlighting the Anacapa Island kelp forest vertebrate families identified from the 12S metabarcodes (primers). Families in bold are featured in the photographs. Image is Figure 2 of Curd et al (2019).

Although all components of the toolkit are open and available to the public, researchers wanting to use the toolkit must have sufficient DNA analysis experience to select appropriate primers for their research site and to use standard polymerase chain reaction (PCR) techniques to copy and extract DNA to produce the sequences that they input into the toolkit.

Gold called the new tool a “really big game-changer,” though he recognized it has limitations.

Using eDNA, it cannot, for example, determine how many individuals of a particular species are in a certain area, just that a species is present. “It’s not going to replace all of the surveys and monitoring efforts,” Gold said, “but doing an eDNA survey is the most sensitive method to find where species are living.”

FEEDBACK: Use this form to send a message to the editor of this post. If you want to post a public comment, you can do that at the bottom of the page.