O'Reilly Bioinformatics Technology Conference.
Books Safari Bookshelf Conferences O'Reilly Network

Arrow Home
Arrow Registration
Arrow Speakers
Arrow Keynotes
Arrow Tutorials
Arrow Sessions
Arrow At-a-Glance
Arrow BOFs
Arrow Posters
Arrow Community
Arrow Events
Arrow Exhibitors
Arrow Sponsors
Arrow Hotel/Travel
Arrow Venue Map
Arrow See & Do
Arrow Press
Arrow Join Mailing List 
Arrow Related Reading

Practical Innovation at BioCon 2003


Introduction to Whole Genome Analysis
Darrell Ricke, Torrey Mesa Research Institute

Track: Bioinformatics 2003 Tutorial
Date: Monday, February 03
Time: 8:30am - 12:00pm
Location: California Ballroom C

In this tutorial, Ricke teaches how to discover genes in genomic data and genomes. The tutorial spans the topics of gene discovery, from a review of how genomic sequencing is done to the final gene/protein sequence analysis steps.

Topics covered include:
  • Data quality issues that arise during genomic sequencing: vector contamination, E. coli contamination, gaps, frameshift errors, etc.
  • Raw sequence trace processing
  • Sequence assembly
  • Sequence analysis, including similarity, gene prediction algorithms, frameshift detection, etc.
  • Commonly used software: BLAST, FASTA, Smith-Waterman, GeneMark.HMM, FgeneSH, HMMGene, GENSCAN, etc.
  • Commonly used hardware for high-throughput sequence analysis Linux servers, Unix servers, server clusters (Beowulf clusters), and special-purpose sequence analysis hardware
  • Contamination of genes with retroviral, mitochondrial, and chloroplast segments
  • Advanced gene modeling
  • Analysis of protein sequences

Adds Ricke: "This tutorial reviews how to analyze whole Eukaryotic genomes. It introduces the analysis of DNA and protein sequences with standard bioinformatics tools. Novel viewpoints for interpreting the results from these tools will be presented. Some of the common mistakes in interpreting results will also be pointed out. I'll then present how to use these tools applied in a high-throughput approach for whole genome analysis.

"For the rice genome [Science 296:92-100, 2002], I developed the novel approach of annotating rice genes with their supporting evidence: similarities, Prosite motifs, and Pfam domains. For the rice genome, I used evidence to select between overlapping gene models from FGeneSH, GeneMark.HMM, GENSCAN, and sequence similarity results derived models. The genes were then grouped into categories of high evidence, medium evidence, and low evidence.

"In this tutorial, biologists should gain new insights that they can use in the analysis of DNA and protein sequences. The tutorial is also aimed at computer scientists and bioinformaticians who want to learn more about bioinformatics and the application of bioinformatics tools to large scale analysis projects."

oreilly.com Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies | Privacy Policy

© 2002, O'Reilly Media, Inc.