메뉴 건너뛰기




Volumn 16, Issue 1, 2013, Pages 1-15

High-throughput DNA sequence data compression

Author keywords

Compression; Next generation sequencing; Reference based compression; Reference free compression

Indexed keywords

HIGH THROUGHPUT SEQUENCING; INFORMATION PROCESSING; MOLECULAR GENETICS; PROCEDURES;

EID: 84908236785     PISSN: 14675463     EISSN: 14774054     Source Type: Journal    
DOI: 10.1093/bib/bbt087     Document Type: Article
Times cited : (69)

References (77)
  • 1
    • 0017681196 scopus 로고
    • DNA sequencing with chain-terminating inhibitors
    • Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. ProcNatlAcadSci 1977;74:5463-7.
    • (1977) ProcNatlAcadSci , vol.74 , pp. 5463-5467
    • Sanger, F.1    Nicklen, S.2    Coulson, A.R.3
  • 2
    • 24044455869 scopus 로고    scopus 로고
    • Genome sequencing in microfabricated high-density picolitre reactors
    • Margulies M, Egholm M, Altman WE, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005;437:376-80.
    • (2005) Nature , vol.437 , pp. 376-380
    • Margulies, M.1    Egholm, M.2    Altman, W.E.3
  • 3
    • 53649106195 scopus 로고    scopus 로고
    • Next-generation DNA sequencing
    • Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol 2008;26:1135-45.
    • (2008) Nat Biotechnol , vol.26 , pp. 1135-1145
    • Shendure, J.1    Ji, H.2
  • 4
    • 72849144434 scopus 로고    scopus 로고
    • Sequencing technologies - the next generation
    • Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet 2010;11:31-46.
    • (2010) Nat Rev Genet , vol.11 , pp. 31-46
    • Metzker, M.L.1
  • 5
    • 53649108801 scopus 로고    scopus 로고
    • The potential and challenges of nanopore sequencing
    • Branton D, Deamer DW, Marziali A, et al. The potential and challenges of nanopore sequencing. NatBiotechnol 2008;26:1146-53.
    • (2008) NatBiotechnol , vol.26 , pp. 1146-1153
    • Branton, D.1    Deamer, D.W.2    Marziali, A.3
  • 6
    • 79951480044 scopus 로고    scopus 로고
    • Will computers crash genomics?
    • Pennisi E. Will computers crash genomics? Science 2011; 331:666-8.
    • (2011) Science , vol.331 , pp. 666-668
    • Pennisi, E.1
  • 7
    • 67649170975 scopus 로고    scopus 로고
    • Textual data compression in computational biology: a synopsis
    • Giancarlo R, Scaturro D, Utro F. Textual data compression in computational biology: a synopsis. Bioinformatics 2009;25:1575-86.
    • (2009) Bioinformatics , vol.25 , pp. 1575-1586
    • Giancarlo, R.1    Scaturro, D.2    Utro, F.3
  • 10
    • 0001941057 scopus 로고    scopus 로고
    • A compression algorithm for DNA sequences and its applications in genome comparison
    • Chen X, Kwong S, Li M. A compression algorithm for DNA sequences and its applications in genome comparison. Genome Informat Ser 1999;10:51-61.
    • (1999) Genome Informat Ser , vol.10 , pp. 51-61
    • Chen, X.1    Kwong, S.2    Li, M.3
  • 11
    • 0036947893 scopus 로고    scopus 로고
    • DNACompress: fast and effective DNA sequence compression
    • Chen X, Li M, Ma B, et al. DNACompress: fast and effective DNA sequence compression. Bioinformatics 2002;18:1696-8.
    • (2002) Bioinformatics , vol.18 , pp. 1696-1698
    • Chen, X.1    Li, M.2    Ma, B.3
  • 12
    • 85032762898 scopus 로고    scopus 로고
    • DNA sequence compression - based on the normalized maximum likelihood model
    • Korodi G, Tabus I, Rissanen J, et al. DNA sequence compression - based on the normalized maximum likelihood model. IEEE Sign ProcessMag 2007;24:47-53.
    • (2007) IEEE Sign ProcessMag , vol.24 , pp. 47-53
    • Korodi, G.1    Tabus, I.2    Rissanen, J.3
  • 15
    • 79951493627 scopus 로고    scopus 로고
    • On the future of genomic data
    • Kahn SD. On the future of genomic data. Science 2011;331:728-9.
    • (2011) Science , vol.331 , pp. 728-729
    • Kahn, S.D.1
  • 17
    • 84866743212 scopus 로고    scopus 로고
    • BIND - An algorithm for loss-less compression of nucleotide sequence data
    • Bose T, Mohammed MH, Dutta A, et al. BIND - An algorithm for loss-less compression of nucleotide sequence data. J Biosci 2012;37:785-9.
    • (2012) J Biosci , vol.37 , pp. 785-789
    • Bose, T.1    Mohammed, M.H.2    Dutta, A.3
  • 18
    • 67649855126 scopus 로고    scopus 로고
    • Data structures and compression algorithms for genomic sequence data
    • Brandon MC, Wallace DC, Baldi P. Data structures and compression algorithms for genomic sequence data. Bioinformatics 2009;25:1731-8.
    • (2009) Bioinformatics , vol.25 , pp. 1731-1738
    • Brandon, M.C.1    Wallace, D.C.2    Baldi, P.3
  • 19
    • 58349097721 scopus 로고    scopus 로고
    • Human genomes as email attachments
    • Christley S, Lu Y, Li C, et al. Human genomes as email attachments. Bioinformatics 2009;25:274-5.
    • (2009) Bioinformatics , vol.25 , pp. 274-275
    • Christley, S.1    Lu, Y.2    Li, C.3
  • 20
    • 79954595666 scopus 로고    scopus 로고
    • A novel compression tool for efficient storage of genome resequencing data
    • Wang C, Zhang D. A novel compression tool for efficient storage of genome resequencing data. Nucleic Acids Res 2011;39:E45-U74.
    • (2011) Nucleic Acids Res , vol.39 , pp. E45-U74
    • Wang, C.1    Zhang, D.2
  • 21
    • 84857860662 scopus 로고    scopus 로고
    • GReEn: a tool for efficient compression of genome resequencing data
    • Pinho AJ, Pratas D, Garcia SP. GReEn: a tool for efficient compression of genome resequencing data. NucleicAcidsRes 2012;40:e27.
    • (2012) NucleicAcidsRes , vol.40 , pp. e27
    • Pinho, A.J.1    Pratas, D.2    Garcia, S.P.3
  • 25
    • 80054918493 scopus 로고    scopus 로고
    • Robust relative compression of genomes with random access
    • Deorowicz S, Grabowski S. Robust relative compression of genomes with random access. Bioinformatics 2011;27:2979-86.
    • (2011) Bioinformatics , vol.27 , pp. 2979-2986
    • Deorowicz, S.1    Grabowski, S.2
  • 26
    • 84867285232 scopus 로고    scopus 로고
    • DELIMINATE-a fast and efficient method for loss-less compression of genomic sequences
    • Mohammed MH, Dutta A, Bose T, et al. DELIMINATE-a fast and efficient method for loss-less compression of genomic sequences. Bioinformatics 2012;28:2527-9.
    • (2012) Bioinformatics , vol.28 , pp. 2527-2529
    • Mohammed, M.H.1    Dutta, A.2    Bose, T.3
  • 28
    • 84871199924 scopus 로고    scopus 로고
    • Compression of nextgeneration sequencing reads aided by highly efficient de novo assembly
    • Jones DC, Ruzzo WL, Peng X, et al. Compression of nextgeneration sequencing reads aided by highly efficient de novo assembly. Nucleic Acids Res 2012;40:e171.
    • (2012) Nucleic Acids Res , vol.40 , pp. e171
    • Jones, D.C.1    Ruzzo, W.L.2    Peng, X.3
  • 29
    • 84871807049 scopus 로고    scopus 로고
    • NGC: lossless and lossy compression of aligned high-throughput sequencing data
    • Popitsch N, von Haeseler A. NGC: lossless and lossy compression of aligned high-throughput sequencing data. Nucleic Acids Res 2013;41:e27.
    • (2013) Nucleic Acids Res , vol.41 , pp. e27
    • Popitsch, N.1    von Haeseler, A.2
  • 30
    • 77955886068 scopus 로고    scopus 로고
    • G-SQZ: compact encoding of genomic sequence and quality data
    • Tembe W, Lowey J, Suh E. G-SQZ: compact encoding of genomic sequence and quality data. Bioinformatics 2010;26:2192-4.
    • (2010) Bioinformatics , vol.26 , pp. 2192-2194
    • Tembe, W.1    Lowey, J.2    Suh, E.3
  • 31
    • 79952580139 scopus 로고    scopus 로고
    • Compression of DNA sequence reads in FASTQ format
    • Deorowicz S, Grabowski S. Compression of DNA sequence reads in FASTQ format. Bioinformatics 2011;27:860-2.
    • (2011) Bioinformatics , vol.27 , pp. 860-862
    • Deorowicz, S.1    Grabowski, S.2
  • 33
    • 84875363204 scopus 로고    scopus 로고
    • Compression of FASTQ and SAM format sequencing data
    • Bonfield JK, Mahoney MV. Compression of FASTQ and SAM format sequencing data. PLoS One 2013;8:e59190.
    • (2013) PLoS One , vol.8 , pp. e59190
    • Bonfield, J.K.1    Mahoney, M.V.2
  • 34
    • 80053647283 scopus 로고    scopus 로고
    • ReCoil - an algorithm for compression of extremely large datasets of DNA data
    • Yanovsky V. ReCoil - an algorithm for compression of extremely large datasets of DNA data. Algorithms Mol Biol 2011;6:23.
    • (2011) Algorithms Mol Biol , vol.6 , pp. 23
    • Yanovsky, V.1
  • 35
    • 84861760100 scopus 로고    scopus 로고
    • Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
    • Cox AJ, Bauer MJ, Jakobi T, et al. Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform. Bioinformatics 2012;28:1415-9.
    • (2012) Bioinformatics , vol.28 , pp. 1415-1419
    • Cox, A.J.1    Bauer, M.J.2    Jakobi, T.3
  • 36
    • 84870429157 scopus 로고    scopus 로고
    • SCALCE: boosting sequence compression algorithms using locally consistent encoding
    • Hach F, Numanagic I, Alkan C, et al. SCALCE: boosting sequence compression algorithms using locally consistent encoding. Bioinformatics 2012;28:3051-7.
    • (2012) Bioinformatics , vol.28 , pp. 3051-3057
    • Hach, F.1    Numanagic, I.2    Alkan, C.3
  • 37
    • 0017493286 scopus 로고
    • A universal algorithm for sequential data compression
    • Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEETrans InformTheory 1977;23:337-43.
    • (1977) IEEETrans InformTheory , vol.23 , pp. 337-343
    • Ziv, J.1    Lempel, A.2
  • 40
    • 70449713952 scopus 로고    scopus 로고
    • The NIH human microbiome project
    • Peterson J, Garges S, Giovanni M, et al. The NIH human microbiome project. GenomeRes 2009;19:2317-23.
    • (2009) GenomeRes , vol.19 , pp. 2317-2323
    • Peterson, J.1    Garges, S.2    Giovanni, M.3
  • 41
    • 84975795680 scopus 로고    scopus 로고
    • An integrated map of genetic variation from 1,092 human genomes
    • Altshuler DM, Durbin RM, Abecasis GR, et al. An integrated map of genetic variation from 1,092 human genomes. Nature 2012;491:56-65.
    • (2012) Nature , vol.491 , pp. 56-65
    • Altshuler, D.M.1    Durbin, R.M.2    Abecasis, G.R.3
  • 42
    • 84865231395 scopus 로고    scopus 로고
    • Transforming clinical microbiology with bacterial genome sequencing
    • Didelot X, Bowden R, Wilson DJ, et al. Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 2012;13:601-12.
    • (2012) Nat Rev Genet , vol.13 , pp. 601-612
    • Didelot, X.1    Bowden, R.2    Wilson, D.J.3
  • 43
    • 84930881609 scopus 로고
    • Run-length encodings
    • Golomb S. Run-length encodings. IEEETrans InformTheory 1965;12:399-401.
    • (1965) IEEETrans InformTheory , vol.12 , pp. 399-401
    • Golomb, S.1
  • 44
    • 0016486577 scopus 로고
    • Universal codeword sets and representations of the integers
    • Elias P. Universal codeword sets and representations of the integers. IEEETrans InformTheory 1975;21:194-203.
    • (1975) IEEETrans InformTheory , vol.21 , pp. 194-203
    • Elias, P.1
  • 45
    • 84938015047 scopus 로고
    • A method for the construction of minimumredundancy codes
    • Huffman DA. A method for the construction of minimumredundancy codes. Proc IRE 1952;40:1098-101.
    • (1952) Proc IRE , vol.40 , pp. 1098-1101
    • Huffman, D.A.1
  • 48
    • 79959722141 scopus 로고    scopus 로고
    • On the representability of complete genomes by multiple competing finite-context (Markov) models
    • Pinho AJ, Ferreira PJSG, Neves AJR, et al. On the representability of complete genomes by multiple competing finite-context (Markov) models. PLoS One 2011;6:e21588.
    • (2011) PLoS One , vol.6 , pp. e21588
    • Pinho, A.J.1    Ferreira, P.J.S.G.2    Neves, A.J.R.3
  • 49
    • 69749124820 scopus 로고    scopus 로고
    • The first Korean genome sequence and analysis: full genome sequencing for a socioethnic group
    • Ahn S-M, Kim T-H, Lee S, et al. The first Korean genome sequence and analysis: full genome sequencing for a socioethnic group. GenomeRes 2009;19:1622-29.
    • (2009) GenomeRes , vol.19 , pp. 1622-1629
    • Ahn, S.-M.1    Kim, T.-H.2    Lee, S.3
  • 50
    • 0035180589 scopus 로고    scopus 로고
    • The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant
    • Huala E, Dickerman AW, Garcia-Hernandez M, et al. The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. NucleicAcidsRes 2001;29:102-5.
    • (2001) NucleicAcidsRes , vol.29 , pp. 102-105
    • Huala, E.1    Dickerman, A.W.2    Garcia-Hernandez, M.3
  • 51
    • 77951226627 scopus 로고    scopus 로고
    • The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
    • Cock PJA, Fields CJ, Goto N, et al. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 2010;38:1767-71.
    • (2010) Nucleic Acids Res , vol.38 , pp. 1767-1771
    • Cock, P.J.A.1    Fields, C.J.2    Goto, N.3
  • 52
    • 68549104404 scopus 로고    scopus 로고
    • The sequence alignment/map format and SAMtools
    • Li H, Handsaker B, Wysoker A, et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009;25:2078-9.
    • (2009) Bioinformatics , vol.25 , pp. 2078-2079
    • Li, H.1    Handsaker, B.2    Wysoker, A.3
  • 53
    • 79955554401 scopus 로고    scopus 로고
    • Efficient storage of high throughput DNA sequencing data using reference- based compression
    • Fritz MH-Y, Leinonen R, Cochrane G, et al. Efficient storage of high throughput DNA sequencing data using reference- based compression. GenomeRes 2011;21:734-40.
    • (2011) GenomeRes , vol.21 , pp. 734-740
    • Fritz, M.H.-Y.1    Leinonen, R.2    Cochrane, G.3
  • 54
    • 62349130698 scopus 로고    scopus 로고
    • Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
    • Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009;10:R25.
    • (2009) Genome Biol , vol.10 , pp. R25
    • Langmead, B.1    Trapnell, C.2    Pop, M.3
  • 55
    • 67649884743 scopus 로고    scopus 로고
    • Fast and accurate short read alignment with Burrows-Wheeler transform
    • Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754-60.
    • (2009) Bioinformatics , vol.25 , pp. 1754-1760
    • Li, H.1    Durbin, R.2
  • 56
    • 77957272611 scopus 로고    scopus 로고
    • A survey of sequence alignment algorithms for next-generation sequencing
    • Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinformatics 2010;11:473-83.
    • (2010) Brief Bioinformatics , vol.11 , pp. 473-483
    • Li, H.1    Homer, N.2
  • 57
    • 77957765256 scopus 로고    scopus 로고
    • Data structures and compression algorithms for high-throughput sequencing technologies
    • Daily K, Rigor P, Christley S, et al. Data structures and compression algorithms for high-throughput sequencing technologies. BMC Bioinformatics 2010;11:514.
    • (2010) BMC Bioinformatics , vol.11 , pp. 514
    • Daily, K.1    Rigor, P.2    Christley, S.3
  • 58
    • 37249011239 scopus 로고    scopus 로고
    • Lossless compression of chemical fingerprints using integer entropy codes improves storage and retrieval
    • Baldi P, Benz RW, Hirschberg DS, et al. Lossless compression of chemical fingerprints using integer entropy codes improves storage and retrieval. J Chem Inform Model 2007;47:2098-109.
    • (2007) J Chem Inform Model , vol.47 , pp. 2098-2109
    • Baldi, P.1    Benz, R.W.2    Hirschberg, D.S.3
  • 61
    • 0003522356 scopus 로고
    • Foster City, CA, USA:IDG Books Worldwide, Inc.
    • Nelson M. Data compression book. Foster City, CA, USA:IDG Books Worldwide, Inc., 1991.
    • (1991) Data compression book
    • Nelson, M.1
  • 65
    • 84857848401 scopus 로고    scopus 로고
    • Transformations for the compression of FASTQ quality scores of next-generation sequencing data
    • Wan R, Anh VN, Asai K. Transformations for the compression of FASTQ quality scores of next-generation sequencing data. Bioinformatics 2012;28:628-35.
    • (2012) Bioinformatics , vol.28 , pp. 628-635
    • Wan, R.1    Anh, V.N.2    Asai, K.3
  • 66
    • 84878634014 scopus 로고    scopus 로고
    • QualComp: a new lossy compressor for quality scores based on rate distortion theory
    • Ochoa I, Asnani H, Bharadia D, et al. QualComp: a new lossy compressor for quality scores based on rate distortion theory. BMC Bioinformatics 2013;14:187.
    • (2013) BMC Bioinformatics , vol.14 , pp. 187
    • Ochoa, I.1    Asnani, H.2    Bharadia, D.3
  • 68
    • 84891350227 scopus 로고    scopus 로고
    • Adaptive reference-free compression of sequence quality scores
    • Advance Access publication 9 May
    • Janin L, Rosone G, Cox AJ. Adaptive reference-free compression of sequence quality scores. Bioinformatics; doi:10.1093/bioinformatics/btt257 (Advance Access publication 9 May 2013).
    • (2013) Bioinformatics
    • Janin, L.1    Rosone, G.2    Cox, A.J.3
  • 71
    • 30544432152 scopus 로고    scopus 로고
    • Indexing compressed text
    • Ferragina P, Manzini G. Indexing compressed text. JACM 2005;52:552-81.
    • (2005) JACM , vol.52 , pp. 552-581
    • Ferragina, P.1    Manzini, G.2
  • 72
    • 77950788830 scopus 로고    scopus 로고
    • Storage and retrieval of highly repetitive sequence collections
    • Makinen V, Navarro G, Siren J, et al. Storage and retrieval of highly repetitive sequence collections. J Computatl Biol 2010;17:281-308.
    • (2010) J Computatl Biol , vol.17 , pp. 281-308
    • Makinen, V.1    Navarro, G.2    Siren, J.3
  • 74
    • 84855351717 scopus 로고    scopus 로고
    • Stronger Lempel-Ziv based compressed text indexing
    • Arroyuelo D, Navarro G, Sadakane K. Stronger Lempel-Ziv based compressed text indexing. Algorithmica 2012;62:54-101.
    • (2012) Algorithmica , vol.62 , pp. 54-101
    • Arroyuelo, D.1    Navarro, G.2    Sadakane, K.3
  • 75
    • 80755125553 scopus 로고    scopus 로고
    • Harnessing cloud computing with Galaxy Cloud
    • Afgan E, Baker D, Coraor N, et al. Harnessing cloud computing with Galaxy Cloud. Nat Biotechnol 2011;29:972-4.
    • (2011) Nat Biotechnol , vol.29 , pp. 972-974
    • Afgan, E.1    Baker, D.2    Coraor, N.3
  • 76
    • 77954526823 scopus 로고    scopus 로고
    • The case for cloud computing in genome informatics
    • Stein LD. The case for cloud computing in genome informatics. Genome Biol 2010;11:207.
    • (2010) Genome Biol , vol.11 , pp. 207
    • Stein, L.D.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.