ASIA unversity:Item 310904400/25329
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 94286/110023 (86%)
Visitors : 21664698      Online Users : 553
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://asiair.asia.edu.tw/ir/handle/310904400/25329


    Title: SeqEntropy: Genome-wide assessment of repeats for short read sequencing
    Authors: 朱學亭;Chu, Hsueh-Ting;William, W.L;William, W.L.Hsiao;曹純惠;Theresa, T.H.Tsao;許德標;Hsu, D.Frank;陳朝欽;Chen, Chaur-Chin;李盛安;Lee, Sheng-An;高成炎;Kao, Cheng-Yan
    Contributors: 資訊工程學系
    Date: 2013-03
    Issue Date: 2013-07-11 06:17:55 (UTC+0)
    Abstract: Background: Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free) reads at different lengths. Methodology/Principal Findings: We define a metric H(k) to be the entropy of sequencing reads at a read length k and use the relative loss of entropy ΔH(k) to measure the impact of repeats for the reconstruction of whole-genome from sequences of length k. In our experiments, we found that entropy loss correlates well with de-novo assembly coverage of a genome, and a score of ΔH(k)>1% indicates a severe loss in genome reconstruction fidelity. The minimal read lengths to achieve ΔH(k)<1% are different for various organisms and are independent of the genome size. For example, in order to meet the threshold of ΔH(k)<1%, a read length of 60 bp is needed for the sequencing of human genome (3.2 109 bp) and 320 bp for the sequencing of fruit fly (1.8×108 bp). We also calculated the ΔH(k) scores for 2725 prokaryotic chromosomes and plasmids at several read lengths. Our results indicate that the levels of repeats in different genomes are diverse and the entropy of sequencing reads provides a measurement for the repeat structures. Conclusions/Significance: The proposed entropy-based measurement, which can be calculated in seconds to minutes in most cases, provides a rapid quantitative evaluation on the limitation of idealized short-read genome sequencing. Moreover, the calculation can be parallelized to scale up to large euakryotic genomes. This approach may be useful to tune the sequencing parameters to achieve better genome assemblies when a closely related genome is already available.
    Relation: PLoS One, V.8 N.3
    Appears in Collections:[Department of Computer Science and Information Engineering] Journal Artical

    Files in This Item:

    There are no files associated with this item.



    All items in ASIAIR are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback