메뉴 건너뛰기




Volumn 14, Issue 1, 2013, Pages

QualComp: A new lossy compressor for quality scores based on rate distortion theory

Author keywords

Compression; FASTQ format; Mean squared error; Next generation sequencing; Quality scores; Rate distortion

Indexed keywords

DOWNSTREAM APPLICATIONS; FASTQ FORMAT; MEAN SQUARED ERROR; NEXT-GENERATION SEQUENCING; NUCLEOTIDE SEQUENCES; NUMERICAL EXPERIMENTS; RATE DISTORTION-THEORY; RATE DISTORTIONS;

EID: 84878634014     PISSN: None     EISSN: 14712105     Source Type: Journal    
DOI: 10.1186/1471-2105-14-187     Document Type: Article
Times cited : (45)

References (49)
  • 2
    • 84879601223 scopus 로고    scopus 로고
    • Genome sequencing cost
    • Genome sequencing cost. http://www.genome.gov/sequencingcosts/.
  • 5
    • 78651301328 scopus 로고    scopus 로고
    • The Sequence Read Archive
    • 10.1093/nar/gkq768, 3017612, 20805242
    • Leinonen R, Sugawara H, Shumway M. The Sequence Read Archive. Nucleic Acids Res 2011, 39:19-21. 10.1093/nar/gkq768, 3017612, 20805242.
    • (2011) Nucleic Acids Res , vol.39 , pp. 19-21
    • Leinonen, R.1    Sugawara, H.2    Shumway, M.3
  • 6
    • 77951226627 scopus 로고    scopus 로고
    • The Sanger FASTQ format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
    • 2847217, 20015970
    • Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 2009, 38:1767-1771. 2847217, 20015970.
    • (2009) Nucleic Acids Res , vol.38 , pp. 1767-1771
    • Cock, P.J.A.1    Fields, C.J.2    Goto, N.3    Heuer, M.L.4    Rice, P.M.5
  • 7
    • 84864483625 scopus 로고    scopus 로고
    • RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics
    • 10.1093/nar/gks540, 3394330, 22684630
    • Lohse M, Bolger A, Nagel A, Fernie A, Lunn J, Stitt M, Usadel B. RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res 2012, 40(W1):W622-627. 10.1093/nar/gks540, 3394330, 22684630.
    • (2012) Nucleic Acids Res , vol.40 , Issue.W1
    • Lohse, M.1    Bolger, A.2    Nagel, A.3    Fernie, A.4    Lunn, J.5    Stitt, M.6    Usadel, B.7
  • 8
    • 77957151956 scopus 로고    scopus 로고
    • SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data
    • 10.1186/1471-2105-11-485, 2956736, 20875133
    • Cox M, Peterson D, Biggs P. SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics 2010, 11:485. 10.1186/1471-2105-11-485, 2956736, 20875133.
    • (2010) BMC Bioinformatics , vol.11 , pp. 485
    • Cox, M.1    Peterson, D.2    Biggs, P.3
  • 9
    • 55549097836 scopus 로고    scopus 로고
    • Mapping short DNA sequencing reads and calling variants using mapping quality scores
    • 10.1101/gr.078212.108, 2577856, 18714091
    • Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008, 18(11):1851-1858. 10.1101/gr.078212.108, 2577856, 18714091.
    • (2008) Genome Res , vol.18 , Issue.11 , pp. 1851-1858
    • Li, H.1    Ruan, J.2    Durbin, R.3
  • 10
    • 62349130698 scopus 로고    scopus 로고
    • Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
    • 10.1186/gb-2009-10-3-r25, 2690996, 19261174
    • Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10(3):R25. 10.1186/gb-2009-10-3-r25, 2690996, 19261174.
    • (2009) Genome Biol , vol.10 , Issue.3
    • Langmead, B.1    Trapnell, C.2    Pop, M.3    Salzberg, S.4
  • 11
    • 67649884743 scopus 로고    scopus 로고
    • Fast and accurate short read alignment with Burrows-Wheeler transform
    • 10.1093/bioinformatics/btp324, 2705234, 19451168
    • Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25(14):1754-1760. 10.1093/bioinformatics/btp324, 2705234, 19451168.
    • (2009) Bioinformatics , vol.25 , Issue.14 , pp. 1754-1760
    • Li, H.1    Durbin, R.2
  • 12
    • 79956307251 scopus 로고    scopus 로고
    • Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads
    • 10.1101/gr.111120.110, 3106326, 20980556
    • Lunter G, Goodson M. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 2011, 21(6):936-939. 10.1101/gr.111120.110, 3106326, 20980556.
    • (2011) Genome Res , vol.21 , Issue.6 , pp. 936-939
    • Lunter, G.1    Goodson, M.2
  • 14
    • 34548084253 scopus 로고    scopus 로고
    • SNPdetector: a software tool for sensitive and accurate SNP detection
    • 10.1371/journal.pcbi.0010053, 1274293, 16261194
    • Zhang J, Wheeler D, Yakub I, Wei S, Sood R, Rowe W, Liu P, Gibbs R, Buetow K. SNPdetector: a software tool for sensitive and accurate SNP detection. PLoS Comput Biol 2005, 1(5):e53. 10.1371/journal.pcbi.0010053, 1274293, 16261194.
    • (2005) PLoS Comput Biol , vol.1 , Issue.5
    • Zhang, J.1    Wheeler, D.2    Yakub, I.3    Wei, S.4    Sood, R.5    Rowe, W.6    Liu, P.7    Gibbs, R.8    Buetow, K.9
  • 15
    • 34547630480 scopus 로고    scopus 로고
    • A simple statistical algorithm for biological sequence compression
    • Snowbird, UT, USA: IEEE
    • Cao M, Dix T, Allison L, Mears C. A simple statistical algorithm for biological sequence compression. Data Compression Conference, 2007. DCC'07 2007, 43-52. Snowbird, UT, USA: IEEE.
    • (2007) Data Compression Conference, 2007. DCC'07 , pp. 43-52
    • Cao, M.1    Dix, T.2    Allison, L.3    Mears, C.4
  • 17
    • 0036947893 scopus 로고    scopus 로고
    • DNACompress: Fast and effective DNA sequence compression
    • 10.1093/bioinformatics/18.12.1696, 12490460
    • Chen X, Li M, Ma B, Tromp J. DNACompress: Fast and effective DNA sequence compression. Bioinformatics 2002, 18:1696-1698. 10.1093/bioinformatics/18.12.1696, 12490460.
    • (2002) Bioinformatics , vol.18 , pp. 1696-1698
    • Chen, X.1    Li, M.2    Ma, B.3    Tromp, J.4
  • 18
    • 79959722141 scopus 로고    scopus 로고
    • On the representability of complete genomes by multiple competing finite-context (Markov) models
    • 10.1371/journal.pone.0021588, 3128062, 21738720
    • Pinho AJ, Ferreira P, Neves A, Bastos C. On the representability of complete genomes by multiple competing finite-context (Markov) models. PLoS ONE 2011, 6(6):e21588. 10.1371/journal.pone.0021588, 3128062, 21738720.
    • (2011) PLoS ONE , vol.6 , Issue.6
    • Pinho, A.J.1    Ferreira, P.2    Neves, A.3    Bastos, C.4
  • 20
    • 58349097721 scopus 로고    scopus 로고
    • Human Genomes as email attachments
    • Christley S, Lu Y, Li C, Xie X. Human Genomes as email attachments. Genome Inf 2008, 25:274-275.
    • (2008) Genome Inf , vol.25 , pp. 274-275
    • Christley, S.1    Lu, Y.2    Li, C.3    Xie, X.4
  • 21
    • 84857860662 scopus 로고    scopus 로고
    • GReEn: a tool for efficient compression of genome resequencing data
    • 3287168, 22139935
    • Pinho AJ, Pratas D, Garciaa SP. GReEn: a tool for efficient compression of genome resequencing data. Nucleic Acids Res 2011, 40(4):e27-27. 3287168, 22139935.
    • (2011) Nucleic Acids Res , vol.40 , Issue.4
    • Pinho, A.J.1    Pratas, D.2    Garciaa, S.P.3
  • 22
    • 78449295543 scopus 로고    scopus 로고
    • Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval
    • Los Cabos, Mexico: Springer
    • Kuruppu S, Puglisi SJ, Zobel J. Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. String Processing and Information Retrieval 2010, 201-206. Los Cabos, Mexico: Springer.
    • (2010) String Processing and Information Retrieval , pp. 201-206
    • Kuruppu, S.1    Puglisi, S.J.2    Zobel, J.3
  • 23
    • 84894592948 scopus 로고    scopus 로고
    • Optimized relative Lempel-Ziv compression of genomes
    • Perth, Australia: Australasian Computer Science Conference (ACSC)
    • Kuruppu S, Puglisi SJ, Zobel J. Optimized relative Lempel-Ziv compression of genomes. Proceeding of ACSC 2011, Perth, Australia: Australasian Computer Science Conference (ACSC).
    • (2011) Proceeding of ACSC
    • Kuruppu, S.1    Puglisi, S.J.2    Zobel, J.3
  • 26
    • 79954595666 scopus 로고    scopus 로고
    • A novel compression tool for efficient storage of genome resequencing data
    • 10.1093/nar/gkr009, 3074166, 21266471
    • Wang C, Zhang D. A novel compression tool for efficient storage of genome resequencing data. Nucleic Acids Res 2011, 39(7):e45-45. 10.1093/nar/gkr009, 3074166, 21266471.
    • (2011) Nucleic Acids Res , vol.39 , Issue.7
    • Wang, C.1    Zhang, D.2
  • 28
    • 45249110222 scopus 로고    scopus 로고
    • Compressing DNA sequence databases with coil
    • 2426707, 18489794
    • Timothy W, White J, Hendy MD. Compressing DNA sequence databases with coil. Bioinformatics 2008, 9(1):242. 2426707, 18489794.
    • (2008) Bioinformatics , vol.9 , Issue.1 , pp. 242
    • Timothy, W.1    White, J.2    Hendy, M.D.3
  • 29
    • 79952580139 scopus 로고    scopus 로고
    • Compression of genomic sequences in FASTQ format
    • 10.1093/bioinformatics/btr014, 21252073
    • Deorowicz S, Grabowski S. Compression of genomic sequences in FASTQ format. Bioinformatics 2011, 27(6):860-862. 10.1093/bioinformatics/btr014, 21252073.
    • (2011) Bioinformatics , vol.27 , Issue.6 , pp. 860-862
    • Deorowicz, S.1    Grabowski, S.2
  • 30
    • 77955886068 scopus 로고    scopus 로고
    • G-SQZ: compact encoding of genomic sequence and quality data
    • 10.1093/bioinformatics/btq346, 20605925
    • Tembe W, Lowey J, Suh E. G-SQZ: compact encoding of genomic sequence and quality data. Bioinformatics 2010, 26:2192-2194. 10.1093/bioinformatics/btq346, 20605925.
    • (2010) Bioinformatics , vol.26 , pp. 2192-2194
    • Tembe, W.1    Lowey, J.2    Suh, E.3
  • 31
    • 84871199924 scopus 로고    scopus 로고
    • Compression of next-generation sequencing reads aided by highly efficient de novo assembly
    • 10.1093/nar/gks754, 3526293, 22904078
    • Jones DC, Ruzzo WL, Peng X, Katze MG. Compression of next-generation sequencing reads aided by highly efficient de novo assembly. Nucleic Acids Res 2012, 40(22):e171-171. 10.1093/nar/gks754, 3526293, 22904078.
    • (2012) Nucleic Acids Res , vol.40 , Issue.22
    • Jones, D.C.1    Ruzzo, W.L.2    Peng, X.3    Katze, M.G.4
  • 33
    • 79955554401 scopus 로고    scopus 로고
    • Efficient storage of high throughput sequencing data using reference-based compression
    • 10.1101/gr.114819.110, 3083090, 21245279
    • Fritz MH, Leinonen R, Cochrane G, Birney E. Efficient storage of high throughput sequencing data using reference-based compression. Genome Res 2011, 21:734-774. 10.1101/gr.114819.110, 3083090, 21245279.
    • (2011) Genome Res , vol.21 , pp. 734-774
    • Fritz, M.H.1    Leinonen, R.2    Cochrane, G.3    Birney, E.4
  • 34
    • 84879603795 scopus 로고    scopus 로고
    • Fastqz
    • fastqz. http://mattmahoney.net/dc/fastqz/.
  • 35
    • 84870429157 scopus 로고    scopus 로고
    • SCALCE: boosting sequence compression algorithms using locally consistent encoding
    • 10.1093/bioinformatics/bts593, 23047557
    • Hach F, Numanagić I, Alkan C, Sahinalp SC. SCALCE: boosting sequence compression algorithms using locally consistent encoding. Bioinformatics 2012, 28(23):3051-3057. 10.1093/bioinformatics/bts593, 23047557.
    • (2012) Bioinformatics , vol.28 , Issue.23 , pp. 3051-3057
    • Hach, F.1    Numanagić, I.2    Alkan, C.3    Sahinalp, S.C.4
  • 36
    • 84879601949 scopus 로고    scopus 로고
    • Cramtools
    • Cramtools. https://github.com/vadimzalunin/crammer.
  • 37
    • 84879602378 scopus 로고    scopus 로고
    • The Pistoia Alliance
    • The Pistoia Alliance. http://www.sequencesqueeze.org/.
  • 38
    • 84871826374 scopus 로고    scopus 로고
    • The future of DNA sequence archiving
    • 10.1186/2047-217X-1-2, 3617450, 23587147
    • Cochrane G, Cook C, Birney E. The future of DNA sequence archiving. GigaScience 2012, 1:2. http://www.gigasciencejournal.com/content/1/1/2, 10.1186/2047-217X-1-2, 3617450, 23587147.
    • (2012) GigaScience , vol.1 , pp. 2
    • Cochrane, G.1    Cook, C.2    Birney, E.3
  • 39
    • 84857848401 scopus 로고    scopus 로고
    • Transformations for the compression of FASTQ quality scores of next generation sequencing data
    • Wan R, Anh VN, Asai K. Transformations for the compression of FASTQ quality scores of next generation sequencing data. Bioinformatics 2011, 28(5):628-635.
    • (2011) Bioinformatics , vol.28 , Issue.5 , pp. 628-635
    • Wan, R.1    Anh, V.N.2    Asai, K.3
  • 40
    • 0030737449 scopus 로고    scopus 로고
    • On the role of mismatch in rate distortion theory
    • Lapidoth A. On the role of mismatch in rate distortion theory. Inf Theory, IEEE Trans 1997, 43(1):38-47.
    • (1997) Inf Theory, IEEE Trans , vol.43 , Issue.1 , pp. 38-47
    • Lapidoth, A.1
  • 42
    • 0020102027 scopus 로고
    • Least squares quantization in PCM
    • Lloyd S. Least squares quantization in PCM. Inf Theory, IEEE Trans on 1982, 28(2):129-137.
    • (1982) Inf Theory, IEEE Trans on , vol.28 , Issue.2 , pp. 129-137
    • Lloyd, S.1
  • 44
    • 84879603419 scopus 로고    scopus 로고
    • SRR032209 data
    • SRR032209 data. http://trace.ddbj.nig.ac.jp/DRASearch/run?acc=SRR032209.
  • 45
    • 84879607295 scopus 로고    scopus 로고
    • SRR089526 data
    • SRR089526 data. http://trace.ddbj.nig.ac.jp/DRASearch/run?acc=SRR089526.
  • 46
    • 84879606965 scopus 로고    scopus 로고
    • PhiX data
    • PhiX data. http://bix.ucsd.edu/projects/singlecell/nbt\_data.html.
  • 47
    • 84879602408 scopus 로고    scopus 로고
    • QualComp website
    • QualComp website. https://sourceforge.net/projects/qualcomp/.
  • 48
    • 84879607561 scopus 로고    scopus 로고
    • PhiX174 Genome
    • PhiX174 Genome. http://www.ncbi.nlm.nih.gov/nuccore/NC_001422.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.