Robert Grossman, Laboratory for Advanced Computing, University of Illinois at Chicago
Track: Techniques
Date: Wednesday, February 05
Time: 10:45am - 11:30am
Location: California Ballroom A & B
As the number of biological databases continues to grow, tools for mining distributed bioinformatics data are becoming more and more important. In this talk, Grossman surveys tools and techniques that can be used for this, including web services, data webs, data grids, and semantic webs.
Grossman also describes current and emerging standards for data mining remote and distributed data. In particular, he examines the Predictive Model Markup Language (PMML), which is an XML language for data mining and statistical modeling and illustrate its use with examples involving microarray data and protein folding.
His presentation also covers Molecular DataSpace, part of the open source DataSpace Project being developed by the Laboratory for Advanced Computing at the University of Illinois at Chicago and its partners to work with remote and distributed data using data webs. Data webs are web based infrastructures for working with data which provides support for meta-data, distributed keys, data transformations, sampling, and other specialized data services.
Molecule DataSpace supports SOAP-based web services when working with relatively modest distributed data sets over commodity networks as well as specialized protocols for working with large data sets over high performance networks.
Grossman illustrates his talk with examples and case studies, including mining distributed protein data, visualizing remote data, and docking compounds from chemical libraries at one site with proteins at another site.
Download presentation file