Per Jambeck, University of California, San Diego
Track: Techniques
Date: Tuesday, February 04
Time: 1:30pm - 2:15pm
Location: California Ballroom C
The Protein Data Bank is growing. While its rate of growth is not as rapid as that of sequence databases, structural data presents some unique challenges in searching and querying. For example, in addition to the linear amino acid sequence, structural data contains information about a protein's backbone trajectory, topology, surface, and geometry of spatially-neighboring sidechains.
Jambeck’s presentation outlines some methods for organizing and mining data based on vector space representations of that data. Such methods describe each data point (in this case, proteins) as an ordered set of real numbers, each representing that protein's value for a given feature. This formulation allows the the proteins to be represented as points in a high-dimensional space. Vector space methods have already found applications in machine learning, information retrieval, and pattern recognition; as we will see, they can also readily be used to address questions in structural bioinformatics. In particular, Jambeck looks at methods for extracting and comparing features of interest from protein structures, such as similarity to known folds. Tools implementing these methods will be made available.
Download presentation file