In the interests of saving time, money, and their sanity when analyzing biological data, many life scientists have chosen to use powerful, open source software like Linux and Perl. But what happens when your analysis requires more processing power than one workstation can handle? After all, data sets exhibiting exponential growth rates are both boon and bane to life science research these days. More data points only provide more accurate results if you can effectively analyze the data in a timely manner. Many life scientists faced with this challenge are finding it helpful, and sometimes necessary, to engage multiple computer processors for efficient data analysis. That’s a relatively simple task for the lucky ones out there working for companies or universities with massive computing resources manned by plenty of experienced admins. But most of us aren’t that lucky, and face the daunting task of creating some sort of high(er) performance computing solution. So what type of tools does one invest in or invent then? Once again, Linux and open source software provide excellent options to the life scientist. Since many bioinformatics applications like sequence analysis and molecular modeling are well suited to run on clusters of Linux computers, many researchers have taken advantage of commodity computer prices, and free software that doesn’t suck, to build Linux Beowulf clusters—virtual supercomputers with unbeatable price/performance ratios. But how difficult is it to roll your own Beowulf?
Not that difficult it turns out. I couldn’t have said that a year ago. But many recent advances in hardware and clustering software have made it no more difficult to install a Beowulf cluster than it is to install Linux. With computer vendors focusing on cluster building, the physical assembly and reliability of cluster hardware has been greatly improved. More importantly, second generation Beowulf cluster software and documentation are now available in a format simple enough for anyone to build a Beowulf. It’s now possible to get all the software you need to build and run a Beowulf cluster on a single CD.
The “Linux Clusters for Biologists 101” tutorial will be a practical introduction to building, administering, and using Beowulf clusters for life scientists. I’ll cover the physical and environmental factors associated with designing Beowulf clusters, as well as the not-so-physical software layers responsible for running applications, load balancing, job scheduling, resource management, message-passing, security, and other essentials. The format will include lecture, discussion, Q&A, live Beowulf assembly and installation, and running applications (time permitting). We’ll be using the Rocks cluster distribution from the San Diego Supercomputer Center when building our Beowulf, but we’ll also discuss other excellent Beowulf cluster distributions including Scyld, MSC.Linux (OSCAR), and SCE. I’m planning to have plenty of free Beowulf software CDs to give away, and a special guest star appearance--so don’t hesitate!