100 hundred billion bases of the genetic code sequenced
August 22, 2005
Public Collections of DNA and RNA Sequence Reach 100 Gigabases
|
These 100,000,000,000 bases, or letters of the genetic
code, represent both individual genes and partial and
complete genomes of over 165,000 organisms. While a
single gene from organisms as diverse as humans, elephants,
earthworms, fruitflies, apple trees, and bacteria can
range from less than one hundred to over several thousand
bases long, an organisms genome can be longer than
one billion bases. The free access to this information
allows scientists to study and compare the same data
as their colleagues nearly anywhere in the world, and
makes possible collaborative research that will lead
ultimately to cures for diseases and improved health.
Thanks to their data exchange policy, the three members
of the International Nucleotide Sequence Database Collaboration:
GenBank (Bethesda, Maryland USA), European Molecular
Biology Laboratorys European Bioinformatics Institute
(EMBL-Bank in Hinxton, UK), and the DNA Data Bank of
Japan (Mishima, Japan) all reached this milestone together.
GenBank is maintained by the National Center for Biotechnology
Information (NCBI), a part of the National Library of
Medicine, National Institutes of Health. Submitters
to GenBank currently contribute over 3 million new DNA
sequences per month to the database. More information
about GenBank may be found on the NCBI Web site at http://www.ncbi.nlm.nih.gov.
David Lipman, Director of the National Center for Biotechnology
Information, commented that Todays nucleotide sequence
databases allow researchers to share completed genomes,
the genetic make-up of entire ecosystems, and sequences
associated with patents. The International Nucleotide
Sequence Database Collaboration (INSDC) has realized
the vision of the researchers who initiated the sequence
database projects by making the global sharing of nucleotide
sequence information possible.
|
Graham Cameron, Associate Director of EMBLs European
Bioinformatics Institute, added This is an important
milestone in the history of the nucleotide sequence
databases. From the first EMBL Data Library entry made
available in 1982 to todays provision of over 55 million
sequence entries from at least 200,000 different organisms,
these resources have anticipated the needs of molecular
biologists and addressed them — often in the face of a
serious lack of resources. More information about EMBL-Bank
is on the Web at http://www.ebi.ac.uk/embl.
Takashi Gojobori, Director of the Center for Information
Biology and DNA Data Bank of Japan, said: The INSDC
has laid the foundations for the exchange of many types
of biological information. As we enter the era of systems
biology and researchers begin to exchange complex types
of information such as the results of experiments that
measure the activities of thousands of genes, or computational
models of entire processes, it is important to celebrate
the achievements of the three databases that pioneered
the open exchange of biological information. More information
about the DNA Data Bank of Japan is on the Web at http://www.ddbj.nig.ac.jp/.
Background
In the late 1970s, as researchers started to study organisms
at the level of their genetic code, several groups
began to explore the possibility of developing a public
repository for sequence information. In the early
1980s this led to the launch of two databases: the
first was the EMBL Data Library, based at the European
Molecular Biology Laboratory in Heidelberg, Germany
(the Data Library is now known as EMBL-Bank and is
based at EMBLs European Bioinformatics Institute,
Hinxton, UK). Hot on its heels came GenBank, initially
hosted by the Los Alamos National Laboratory and now
based at the National Center for Biotechnology Information,
Bethesda, Maryland, USA. By the time the International
Nucleotide Sequence Consortium became formalized in
February 1987, a third partner, the DNA Data Bank
of Japan, had been launched at the National Institute
of Genetics in Mishima, and collaborated with its
European and US counterparts right from the start.
Much has changed since the days when sequences were manually
keyed in from the literature or sent on floppy disc and
distributed to users on 9-track magnetic tapes, but the
purpose of the databases — to make every nucleotide sequence
in the public domain freely available to the scientific
community as rapidly as possible — remains as strong now
as it was in the beginning.
About NCBI
The National Center for Biotechnology Information is
part of the National Library of Medicine. Established
in 1988 as a national resource for molecular biology
information, NCBI creates public databases, conducts
research in computational biology, develops software
tools for analyzing genome data, and disseminates
biomedical information — all for the better understanding
of molecular processes affecting human health and
disease. NCBI is host to the GenBank nucleotide sequence
database.
The National Library of Medicine, the worlds largest
library of the health sciences, is a component of
the National Institutes of Health, U.S. Department
of Health and Human Services.
The National Institutes of Health (NIH) — The
Nation’s Medical Research Agency — is comprised
of 27 Institutes and Centers and is a component of
the U. S. Department of Health and Human Services.
It is the primary Federal agency for conducting and
supporting basic, clinical, and translational medical
research, and investigates the causes, treatments,
and cures for both common and rare diseases. For more
information about NIH and its programs, visit http://www.nih.gov.
This is a NIH news release. The original version appears here