Very large-scale gene expression analysis, i.e., UniGene and dbEST, is provided to
find those genes with significantly differential expression in specific tissues. The differentially
expressed genes in a specific tissue are potentially regulated concurrently by a
combination of transcription factors. This study attempts to mine putative binding sites
on how combinations of the known regulatory sites homologs and over-represented repetitive
elements are distributed in the promoter regions of considered groups of differentially
expressed genes. We propose a data mining approach to statistically discover the
significantly tissue-specific combinations of known site homologs and over-represented
repetitive sequences, which are distributed in the promoter regions of differentially gene
groups. The association rules mined would facilitate to predict putative regulatory elements
and identify genes potentially co-regulated by the putative regulatory elements.
Relation:
Journal of Information Science and Engineering 19 (6): 923-942