In the era of post-genomics, almost all the genes have been sequenced and enormous amounts of data have been generated. Hence, to mine useful information from these data is a very important topic. In this paper we propose a new approach for finding potential motifs in the regions located from the -2000 bp upstream to +1000 bp downstream of transcription start site (TSS). This new approach is developed based on the genetic algorithm (GA). The mutation in the GA is performed by using position weight matrices to reserve the completely conserved positions. The crossover is implemented with special-designed gap penalties to produce the optimal child pattern. We also present a rearrangement method based on position weight matrices to avoid the presence of a very stable local minimum, which may make it quite difficult for the other operators to generate the optimal pattern. Our approach shows superior results by comparing with multiple em for motif elicitation (MEME) and Gibbs sampler, which are two popular algorithms for finding motifs.
Relation:
IEEE Fourth Symposium on Bioinformatics and Bioengineering (BIBE 2004)