The O'Reilly Peer-to-Peer and Web Services Conference
oreilly.comO'Reilly Network
ConferencesInternationalSafari: Books Online

Arrow Home
Arrow Registration
Arrow Speakers
Arrow Keynotes
Arrow Tutorials
Arrow Sessions
Arrow BOFs
Arrow Community Meetings
Arrow Events
Arrow Exhibitors
Arrow Sponsors
Arrow Hotel/Travel
Arrow See & Do
Arrow Press
Arrow Mail List

Practical Tools For Innovation
O'Reilly Bioinformatics Technology Conference
January 28-31, 2002 -- Tucson, AZ
Chambered Nautilus


A Distributed Computing Environment for High Throughput Genomics

Christopher Dwan, Research Engineer, University of Minnesota

Track: Fundamentals
Date: Wednesday, January 30
Time: 2:15pm - 3:00pm
Location: Canyon IV

The Center for Computational Genomics and Bioinformatics (CCGB) at the University of Minnesota serves as an integration and data warehousing site for (on average) more than a dozen genetic sequencing projects at any given time. These range from fairly well characterized EST and BAC based efforts to SAGE and other newer protocols. Additionally, we are beginning to incorporate microarray, plant breeding, earth science, and other wide-ranging information.

We have constructed a distributed computing environment using dedicated servers, special purpose hardware, opportunistic computation (via Condor), and a metascheduling resource implemented under a relational database. This system is continually evolving to meet the needs of our analysts, and to glean the most processing power from our systems.

We present:

  • Submission mechanisms that bring data and metadata in a structured format (XML) onto our servers
  • Batchable, pipelined execution of common operations such as base calling, vector and artifact filtering, and BLAST searches.
  • Automatic presentation of data and analysis in an interactive set of web pages.
  • Experiences in scaling batch queuing systems to reliably handle up to 10,000,000 jobs at a time
  • Experiences in designing interfaces for users with little experience (or interest) in distributed and high throughput computing. Of particular interest is the problem of presenting all of the available options for a tool such as BLAST, without overwhelming the novice user.
We also present plans to take the further step of offering (through a grid portal) these computational resources to the world community. Home | Conferences Home | Bioinformatics Conference Home
Registration | Speakers | Keynotes | Tutorials | Sessions | BOFs
Exhibitors | Sponsors | Hotel/Travel | Press | Mail List

© 2001, O'Reilly Media, Inc.