A multinational biological information consortium, the Universal Protein Resource (UniProt), has now added a new database repository of DNA sequences obtained from oceanic microbes to its family of protein sequence databases. The data are publicly available. Information accumulated in this database is central to fundamental biological research, because of the functions that these molecules carry out in cells.
Proteomics research, the large-scale study of proteins and their interactions, has accelerated in recent years because of technological advances in protein science and the large amounts of genomic data pouring out of the Human Genome Project (HGP). The UniProt consortium aims to support biological research by maintaining a high quality database that serves as a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledge base, with extensive cross-references and querying interfaces freely accessible to the scientific community.
In a major leap forward for researchers everywhere, UniProt has added the new database repository for metagenomic and environmental data to its existing family of protein sequence databases, the largest in the world. Metagenomics is the large-scale genomic analysis of microbes recovered from environmental samples, as opposed to laboratory-grown organisms which represent only a small proportion of the microbial world.
Secrets of the deep
The UniProt Metagenomic and Environmental Sequences (UniMES) database contains the data from the Global Ocean Sampling Expedition(GOS), which was originally submitted to the International Nucleotide Sequence Databases (INSDC). The GOS expedition was led by Dr. J Craig Venter, driving force behind the Human Genome Project and a leading scientist in the field of synthetic biology, which opens new doors to the bioeconomy (earlier post, here and here).
The initial GOS dataset is composed of 28 million DNA sequences from oceanic microbes and it predicts nearly 6 million proteins:
energy :: sustainability :: biomass :: bioenergy :: biofuels :: microbes :: proteins :: enzymes :: proteomics :: genomics :: molecular biology :: synthetic biology ::bioconversion :: bioeconomy ::
By combining the predicted protein sequences with automatic classification by InterPro, the EBI’s integrated resource for protein families, domains and functional sites, UniMES uniquely provides free access to the array of genomic information gathered from sampling expeditions, enhanced by links to further analytical resources. Genomics holds the key to understanding a significant part of the world around us, and the metagenomic and environmental data represent a step forward in further charting genomic diversity.
With the increasing volume and variety of protein sequences and functional information that has become available, UniProt effectively serves as the central database of protein sequence and function. It has become a cornerstone for a wide range of scientists active in modern biological research, especially in the field of proteomics. Researchers working at the PIR site have also made great strides in automating the use of computers to analyse proteins.
As a publicly funded project, UniProt's data is freely accessible and all data is released in a timely manner. The website created for UniProt effectively fulfils this role.
The UniProt Consortium comprises the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR) hosted by the National Biomedical Research Foundation (NBRF) at the Georgetown University Medical Center in Washington, D.C., USA.
Image: Sample of oceanic bacteria as seen using epifluorescence microscopy. Credit: Microbiologist Dr. Ed DeLong.
European Research Headlines: Maritime secrets added to biological repository - August 22, 2007.
J. Craig Venter Institute: Global Ocean Sampling Expedition.
Biopact: Investigating life in extreme environments may yield applications in the bioeconomy - July 05, 2007