O'Reilly Bioinformatics Technology Conference.
Books Safari Bookshelf Conferences O'Reilly Network

Arrow Home
Arrow Registration
Arrow Speakers
Arrow Keynotes
Arrow Tutorials
Arrow Sessions
Arrow At-a-Glance
Arrow BOFs
Arrow Posters
Arrow Community
Arrow Events
Arrow Exhibitors
Arrow Sponsors
Arrow Hotel/Travel
Arrow Venue Map
Arrow See & Do
Arrow Press
Arrow Join Mailing List 
Arrow Related Reading

Practical Innovation at BioCon 2003


Using Latent Semantic Analysis in Bioinformatics
John Cuadrado
Maciej Ceglowski, National Institute for Technology and Liberal Education

Track: Techniques
Date: Thursday, February 06
Time: 1:30pm - 3:00pm
Location: California Ballroom A & B

Latent semantic indexing (LSI) is an information retrieval technique known to substantially improve recall in full-text search engines. LSI works by applying a dimensionality reduction technique called singular value decomposition (SVD) to a vector space data model, reducing noise and bringing out latent relationships within the data. While most of the research on LSI has been done in the domain of text searches, where LSI search engines can actually retrieve relevant documents that do not match any keyword in a query, the linear algebra implementation of the technique makes it applicable to a wide range of problems in bioinformatics, including gene and protein sequencing, gene regulatory networks, and medical imaging. Many of these potential applications remain completely unexplored.

Ceglowski and Cuadrado have been working with LSI on both text and scientific data collections, including news stories, journal articles, and mass and NMR spectra, and have created a suite of open source Perl modules for use in creating LSI search engines. Their tutorial presents the basic algorithms behind LSI, with an emphasis on their practical application to real-world data sets, followed by a detailed demonstration of how to index, visualize, and search actual biological data. The tutorial ends with a discussion of open problems in the field, a brief introduction to doing distributed indexing on large data collections, and techniques for effectively searching large heterogeneous data sets.

Participants will come away with the concepts and software they need to immediately begin using LSI in their research.

Download presentation file

oreilly.com Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies | Privacy Policy

© 2002, O'Reilly Media, Inc.