簡單重複序列(simple sequence repeat, SSR)指的是一段以2 ~ 6 個鹼基對做為重複單位的DNA 片段,對於基因的調控網路扮演了極為重要的角色,並且廣泛的應用在基因體的相關研究。CG-SSR(comparative genomics database for SSR discovery)資料庫提供使用者查詢各個模型生物的SSR序列,並提供該SSR 與基因序列間精確的相對及絕對位置資訊。然而電腦預測SSR序列的結果仍舊是存有高度的偽陽性,因此我們採用了比較基因體學的方法針對判讀出的SSR 序列進行篩選,使用者可以自由選定物種和標的物種進行比對,座落在跨物種間保留區塊的SSR 才予以保留,此方法明顯的提升了預測結果的精確度。CG-SSR 資料庫提供友善的使用介面與服務,每一筆SSR 資料都有詳盡且實用的資料連結,充分的符合從事基因體研究者的需求。Simple sequence repeats (SSRs), also referred to as variable number of tandem repeats or micro-satellites, are valuable genetic markers which play a crucial role in genome mapping and various genetic studies.
In this study, we have set up a database which facilitates the search for SSRs and provide absolute and relative location information of corresponding genes.
However, performing in silico analysis of biological data sometimes attempts to result in high false positive rates. In order to promote the specificity of discovering important SSRs from our proposed system, we take advantage of evolutionarily conserved segments among sequences from various species. Users are able to choose specific species as targets to filter out SSRs which are not located in conserved regions.
Screening processes narrow down candidate SSRs and improve the performance of specificity of characteristics. In this database, there are eleven representative species collected for comparative genomics analysis. Taking the comparison between zebrasifh and fugu as an example, 38,773 SSRs from zebrafish genome were found located in conserved regions in which 9.35% SSRs are found in protein-coding regions, 0.30 % in 5’UTR, 1.27% in 3’UTR, 50.63% in intron, and 38.45% in intergenic region. Each SSR is precisely allocated and annotated in this database for further applications.