In Silico Identification of Candidate Single Nucleotide Polymorphism in Model Organisms
By Victor Guryev
Dr., Hubrecht Laboratory
Sequence data from public domain such as expressed sequence tags (EST)
and whole genome sequencing (wgs) projects were explored to reveal
candidates SNPs using search for nearly exact homology. The candidate
SNPs were annotated with strain information, restriction endonuclease
site affected, scored for homology degree, basecalling quality,
aminoacid replacement and deposited in a publicly available database
which can be accessed at http://rat.niob.knaw.nl.
The project is tightly linked with other public databases such as
Genbank, UniGene, LocusLink and EST/wgs projects and enables search of
candidate SNPs by their IDs/accessions, gene name, map position, etc.
Candidate SNPs data based on EST sequences can be automaticly piped to
GENOTRACE (http://rat.niob.knaw.nl/genotrace) for building local genome
assembly, determination of exon/intron structure, and oligonucleotide
primer construction for genomic PCR amplification.
Currently, the database comprises model organisms like zebrafish
(Danio rerio, >1,700,000 entries) in order to facilitate mapping
projects and rat (Rattus norvegicus, >84, 000 entries) to help
discovering amino acid replacing SNPs or polymorphisms between different
strains that may account for phenotypic differences. Experimental
verification of a subset of candidate SNPs, incorporation of a score based on
prediction of SNP effect on protein structure/function, and enlargement
of database to cover other model organisms are in progress.