메뉴 건너뛰기




Volumn 8, Issue 3, 2013, Pages

Compression of FASTQ and SAM Format Sequencing Data

Author keywords

[No Author keywords available]

Indexed keywords

ARTICLE; COMPUTER MEMORY; CONTROLLED STUDY; DATA ANALYSIS; DATA ANALYSIS SOFTWARE; FASTQ FILE FORMAT; GENETIC ALGORITHM; GENETIC DATABASE; REFERENCE DATABASE; SAM FILE FORMAT; SEQUENCE ANALYSIS; SEQUENCE DATABASE;

EID: 84875363204     PISSN: None     EISSN: 19326203     Source Type: Journal    
DOI: 10.1371/journal.pone.0059190     Document Type: Article
Times cited : (159)

References (33)
  • 2
    • 77951226627 scopus 로고    scopus 로고
    • The sanger FASTQ format for sequences with quality scores, and the Solexa/fillumina FASTQ variants
    • Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM, (2010) The sanger FASTQ format for sequences with quality scores, and the Solexa/fillumina FASTQ variants. Nucleic Acids Research 38: 1767-1771.
    • (2010) Nucleic Acids Research , vol.38 , pp. 1767-1771
    • Cock, P.J.A.1    Fields, C.J.2    Goto, N.3    Heuer, M.L.4    Rice, P.M.5
  • 3
    • 24044455869 scopus 로고    scopus 로고
    • Genome sequencing in microfabricated high-density picolitre reactors
    • Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376-380.
    • (2005) Nature , vol.437 , pp. 376-380
    • Margulies, M.1    Egholm, M.2    Altman, W.E.3    Attiya, S.4    Bader, J.S.5
  • 4
    • 84889495087 scopus 로고    scopus 로고
    • Berlin, Germany: Wiley- VCH, chapter Applied Biosystems SOLiD System: Ligation-Based Sequencing
    • Pandey V, Nutter RC, Prediger E (2008) Next-Generation Genome Sequencing, Berlin, Germany: Wiley- VCH, chapter Applied Biosystems SOLiD System: Ligation-Based Sequencing. 29-41.
    • (2008) Next-Generation Genome Sequencing , pp. 29-41
    • Pandey, V.1    Nutter, R.C.2    Prediger, E.3
  • 5
    • 55549089660 scopus 로고    scopus 로고
    • Accurate whole human genome sequencing using reversible terminator chemistry
    • Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53-59.
    • (2008) Nature , vol.456 , pp. 53-59
    • Bentley, D.R.1    Balasubramanian, S.2    Swerdlow, H.P.3    Smith, G.P.4    Milton, J.5
  • 6
    • 84861760100 scopus 로고    scopus 로고
    • Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
    • Cox AJ, Bauer MJ, Jakobi T, Rosone G, (2012) Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform. Bioinformatics 28: 1415-1419.
    • (2012) Bioinformatics , vol.28 , pp. 1415-1419
    • Cox, A.J.1    Bauer, M.J.2    Jakobi, T.3    Rosone, G.4
  • 7
    • 84857860662 scopus 로고    scopus 로고
    • GReEn: a tool for efficient compression of genome resequencing data
    • Pinho AJ, Pratas D, Garcia SP, (2012) GReEn: a tool for efficient compression of genome resequencing data. Nucleic Acids Research 40: 27.
    • (2012) Nucleic Acids Research , vol.40 , pp. 27
    • Pinho, A.J.1    Pratas, D.2    Garcia, S.P.3
  • 9
    • 84857848401 scopus 로고    scopus 로고
    • Transformations for the compression of FASTQ quality scores of next-generation sequencing data
    • Wan R, Anh VN, Asai K, (2012) Transformations for the compression of FASTQ quality scores of next-generation sequencing data. Bioinformatics 28: 628-635.
    • (2012) Bioinformatics , vol.28 , pp. 628-635
    • Wan, R.1    Anh, V.N.2    Asai, K.3
  • 10
    • 77955886068 scopus 로고    scopus 로고
    • G-SQZ: compact encoding of genomic sequence and quality data
    • Tembe W, Lowey J, Suh E, (2010) G-SQZ: compact encoding of genomic sequence and quality data. Bioin- formatics 26: 2192-2194.
    • (2010) Bioin- Formatics , vol.26 , pp. 2192-2194
    • Tembe, W.1    Lowey, J.2    Suh, E.3
  • 12
    • 79958792741 scopus 로고    scopus 로고
    • SOLiDzipper: A high speed encoding method for the next-generation sequencing data
    • Jeon YJ, Park SH, Ahn SM, Hwang HJ, (2011) SOLiDzipper: A high speed encoding method for the next-generation sequencing data. Evol Bioinform Online 7: 1-6.
    • (2011) Evol Bioinform Online , vol.7 , pp. 1-6
    • Jeon, Y.J.1    Park, S.H.2    Ahn, S.M.3    Hwang, H.J.4
  • 13
    • 79952580139 scopus 로고    scopus 로고
    • Compression of DNA sequence reads in FASTQ format
    • Deorowicz S, Grabowski S, (2011) Compression of DNA sequence reads in FASTQ format. Bioinformatics 27: 860-862.
    • (2011) Bioinformatics , vol.27 , pp. 860-862
    • Deorowicz, S.1    Grabowski, S.2
  • 14
    • 84871199924 scopus 로고    scopus 로고
    • Compression of next-generation sequencing reads aided by highly efficient de novo assembly
    • in press
    • Jones DC, Ruzzo WL, Peng X, Katze MG (2012) Compression of next-generation sequencing reads aided by highly efficient de novo assembly. Nucleic Acids Research in press.
    • (2012) Nucleic Acids Research
    • Jones, D.C.1    Ruzzo, W.L.2    Peng, X.3    Katze, M.G.4
  • 15
    • 84870429157 scopus 로고    scopus 로고
    • SCALCE: boosting sequence compression algorithms using locally consistent encoding
    • Hach F, Numanagic I, Alkan C, Sahinalp SC (2012) SCALCE: boosting sequence compression algorithms using locally consistent encoding. Bioinformatics.
    • (2012) Bioinformatics
    • Hach, F.1    Numanagic, I.2    Alkan, C.3    Sahinalp, S.C.4
  • 18
    • 79955554401 scopus 로고    scopus 로고
    • Efficient storage of high throughput dna sequencing data using reference-based compression
    • Fritz MHY, Leinonen R, Cochrane G, Birney E, (2011) Efficient storage of high throughput dna sequencing data using reference-based compression. Genome Research 21: 734-740.
    • (2011) Genome Research , vol.21 , pp. 734-740
    • Fritz, M.H.Y.1    Leinonen, R.2    Cochrane, G.3    Birney, E.4
  • 19
    • 82555175823 scopus 로고    scopus 로고
    • Improving transmission efficiency of large Sequence Alignment/Map (SAM) files
    • Sakib MN, Tang J, Zheng WJ, Huang CT, (2011) Improving transmission efficiency of large Sequence Alignment/Map (SAM) files. PLoS ONE 6: e28251.
    • (2011) PLoS ONE , vol.6
    • Sakib, M.N.1    Tang, J.2    Zheng, W.J.3    Huang, C.T.4
  • 20
    • 84875301800 scopus 로고    scopus 로고
    • NGC: lossless and lossy compression of aligned high-throughput sequencing data
    • Popitsch N, von Haeseler A (2012) NGC: lossless and lossy compression of aligned high-throughput sequencing data. Nucl Acids Res.
    • (2012) Nucl Acids Res.
    • Popitsch, N.1    von Haeseler, A.2
  • 21
    • 67649170975 scopus 로고    scopus 로고
    • Textual data compression in computational biology: a synopsis
    • Giancarlo R, Scaturro D, Utro F, (2009) Textual data compression in computational biology: a synopsis. Bioinformatics 25: 1575-1586.
    • (2009) Bioinformatics , vol.25 , pp. 1575-1586
    • Giancarlo, R.1    Scaturro, D.2    Utro, F.3
  • 22
    • 77957765256 scopus 로고    scopus 로고
    • Data structures and compression algorithms for high-throughput sequencing technologies
    • Daily K, Rigor P, Christley S, Xie X, Baldi P (2010) Data structures and compression algorithms for high-throughput sequencing technologies. BMC Bioinformatics 11.
    • (2010) BMC Bioinformatics , vol.11
    • Daily, K.1    Rigor, P.2    Christley, S.3    Xie, X.4    Baldi, P.5
  • 24
    • 84937652953 scopus 로고
    • Logical basis for information theory and probability theory
    • Kolmogorov A, (1968) Logical basis for information theory and probability theory. IEEE Transactions on Information Theory 14: 662-664.
    • (1968) IEEE Transactions on Information Theory , vol.14 , pp. 662-664
    • Kolmogorov, A.1
  • 25
    • 84975742565 scopus 로고    scopus 로고
    • A map of human genome variation from population-scale sequencing
    • The 1000 Genomes Consortium
    • The 1000 Genomes Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061-1073.
    • (2010) Nature , vol.467 , pp. 1061-1073
  • 26
    • 0031978181 scopus 로고    scopus 로고
    • Base-calling of automated sequencer traces using phred. II. Error probabilities
    • Ewing B, Green P, (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8: 186-194.
    • (1998) Genome Res , vol.8 , pp. 186-194
    • Ewing, B.1    Green, P.2
  • 27
    • 0023364261 scopus 로고
    • Arithmetic coding for data compression
    • Witten IH, Neal RM, Cleary JG, (1987) Arithmetic coding for data compression. Commun ACM 30: 520-540.
    • (1987) Commun ACM , vol.30 , pp. 520-540
    • Witten, I.H.1    Neal, R.M.2    Cleary, J.G.3
  • 29
    • 80455126001 scopus 로고    scopus 로고
    • Evaluation of genomic high-throughput sequencing data generated on illumina hiseq and genome analyzer systems
    • Minoche AE, Dohm JC, Himmelbauer H, (2011) Evaluation of genomic high-throughput sequencing data generated on illumina hiseq and genome analyzer systems. Genome Biology 12: R112.
    • (2011) Genome Biology , vol.12
    • Minoche, A.E.1    Dohm, J.C.2    Himmelbauer, H.3
  • 32
    • 67649884743 scopus 로고    scopus 로고
    • Fast and accurate short read alignment with Burrows-Wheeler transform
    • Li H, Durbin R, (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754-1760.
    • (2009) Bioinformatics , vol.25 , pp. 1754-1760
    • Li, H.1    Durbin, R.2
  • 33
    • 79953177468 scopus 로고    scopus 로고
    • Aligning short sequencing reads with bowtie
    • Langmead B, (2010) Aligning short sequencing reads with bowtie. Current Protocols in Bioinformatics 32: 11.7.1-11.7.14.
    • (2010) Current Protocols in Bioinformatics , vol.32 , pp. 1-14
    • Langmead, B.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.