The Center for Computational Genomics and Bioinformatics (CCGB) at the University of Minnesota serves as an integration and data warehousing site for (on average) more than a dozen genetic sequencing projects at any given time. These range from fairly well characterized EST and BAC based efforts to SAGE and other newer protocols. Additionally, we are beginning to incorporate microarray, plant breeding, earth science, and other wide-ranging information.
We have constructed a distributed computing environment using dedicated servers, special purpose hardware, opportunistic computation (via Condor), and a metascheduling resource implemented under a relational database. This system is continually evolving to meet the needs of our analysts, and to glean the most processing power from our systems.
- Submission mechanisms that bring data and metadata in a structured format (XML) onto our servers
- Batchable, pipelined execution of common operations such as base calling, vector and artifact filtering, and BLAST searches.
- Automatic presentation of data and analysis in an interactive set of web pages.
- Experiences in scaling batch queuing systems to reliably handle up to 10,000,000 jobs at a time
- Experiences in designing interfaces for users with little experience (or interest) in distributed and high throughput computing. Of particular interest is the problem of presenting all of the available options for a tool such as BLAST, without overwhelming the novice user.
We also present plans to take the further step of offering (through a grid portal) these computational resources to the world community.