 |
Poster Session
The E-MSD relational database and search system
By John Tate
Dr, European Bioinformatics Institute
Structural genomics promises
to deliver experimentally
determined 3-D structures for
many thousands of protein
domains and will put each
protein within a comparative
modeling distance of a known
protein structure. Since 1979
the Protein Data Bank (PDB)
has been the central
repository of macromolecular
structure data, but the
present flat file archive is
incapable of supporting the
complex tools that are
required for drug discovery,
molecular medicine and
bioinformatics. In order to
fully exploit the volume of
structural data that will soon
become available, new
technologies must be employed.
The Macromolecular Structure
Database (MSD) group has
developed a relational
database for storing,
validating and retrieving the
complex structural information
in the PDB. A comprehensive
cleaning procedure has been
carried out, to ensure data
uniformity across the whole
archive, and an extensive set
of derived properties and
goodness-of-fit indicators
have been added. The MSD also
includes links to many other
bioinformatics databases
including InterPro, GO,
SwissProt, SCOP, CATH, PFAM
and PROSITE.
We have developed a flexible
search system which exposes
the power of the relational
database without requiring the
user to understand the
complexity of the underlying
schema. This search system
provides a single access point
for the MSD and associated
databases, allowing searches
on a wide range of
bio-molecular properties, such
as sequence, structure
similarity and active site
conformation.
The database, and several
network based-services that
are built on top of it, will
be available in January
2003. Our poster will describe
the basic design of the
database, outline some of the
improvements in data quality
that it provides, and will
describe the services and
search systems that are
currently available and
planned for the near future.
|