메뉴 건너뛰기




Volumn 6, Issue 1, 2012, Pages 1-25

Textual data compression in computational biology: Algorithmic techniques

Author keywords

Algorithms; Alignment free sequence comparison; Bioinformatics; Data Compression Theory and Practice; Entropy; Hidden Markov Models; Huffman coding; Kolmogorov complexity; Lempel Ziv compressors; Minimum Description Length principle; Pattern discovery in bioinformatics; Reverse engineering of biological networks; Sequence alignment

Indexed keywords


EID: 84856608719     PISSN: 15740137     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.cosrev.2011.11.001     Document Type: Review
Times cited : (20)

References (225)
  • 1
    • 67649170975 scopus 로고    scopus 로고
    • Textual data compression in computational biology: a synopsis
    • Giancarlo R., Scaturro D., Utro F. Textual data compression in computational biology: a synopsis. Bioinformatics 2009, 25:1575-1586.
    • (2009) Bioinformatics , vol.25 , pp. 1575-1586
    • Giancarlo, R.1    Scaturro, D.2    Utro, F.3
  • 2
    • 77953484795 scopus 로고    scopus 로고
    • Data compression concepts and algorithms and their applications to bioinformatics
    • Nalbantoglu O.U., Russell D.J., Sayood K. Data compression concepts and algorithms and their applications to bioinformatics. Entropy 2010, 12:34-52.
    • (2010) Entropy , vol.12 , pp. 34-52
    • Nalbantoglu, O.U.1    Russell, D.J.2    Sayood, K.3
  • 3
    • 14544296008 scopus 로고    scopus 로고
    • Bioinformatics an introduction for computer scientists
    • Cohen J. Bioinformatics an introduction for computer scientists. ACM Computing Surveysl 2004, 36(2):122-158.
    • (2004) ACM Computing Surveysl , vol.36 , Issue.2 , pp. 122-158
    • Cohen, J.1
  • 4
  • 5
    • 39649084637 scopus 로고    scopus 로고
    • Bioinformatics challenges of new sequencing technology
    • Pop M., Salzberg S.L. Bioinformatics challenges of new sequencing technology. Trends in Genetics 2008, 24:142-149.
    • (2008) Trends in Genetics , vol.24 , pp. 142-149
    • Pop, M.1    Salzberg, S.L.2
  • 6
    • 67651225610 scopus 로고    scopus 로고
    • The need for speed
    • Flicek P. The need for speed. Genome Biology 2009, 10:212.
    • (2009) Genome Biology , vol.10 , pp. 212
    • Flicek, P.1
  • 9
    • 0000100455 scopus 로고
    • A new challenge for compression algorithms: Genetic sequences
    • Grümbach S., Tahi F. A new challenge for compression algorithms: Genetic sequences. Information Processing & Management 1994, 30:875-886.
    • (1994) Information Processing & Management , vol.30 , pp. 875-886
    • Grümbach, S.1    Tahi, F.2
  • 14
    • 0036947893 scopus 로고    scopus 로고
    • DNACompress: fast and effective DNA sequence compression
    • Chen X., Li M., Ma B., Tromp J. DNACompress: fast and effective DNA sequence compression. Bioinformatics 2002, 18:1696-1698.
    • (2002) Bioinformatics , vol.18 , pp. 1696-1698
    • Chen, X.1    Li, M.2    Ma, B.3    Tromp, J.4
  • 15
    • 0031675019 scopus 로고    scopus 로고
    • Some theory and practice of greedy off-line textual substitution
    • IEEE Computer Society
    • Apostolico A., Lonardi S. Some theory and practice of greedy off-line textual substitution. Proc. of the IEEE Data Compression Conference 1998, 119-128. IEEE Computer Society.
    • (1998) Proc. of the IEEE Data Compression Conference , pp. 119-128
    • Apostolico, A.1    Lonardi, S.2
  • 19
    • 84876743533 scopus 로고    scopus 로고
    • A DNA sequence compression algorithm based on LUT and LZ77, CoRR.
    • S. Bao, S. Chen, Z. Jing, R. Ren, A DNA sequence compression algorithm based on LUT and LZ77, CoRR http://abs/cs/0504100.
    • Bao, S.1    Chen, S.2    Jing, Z.3    Ren, R.4
  • 20
    • 26444479436 scopus 로고    scopus 로고
    • DNA compression challenge revisited: a dynamic programming approach
    • Springer
    • Behzadi B., Fessant F.L. DNA compression challenge revisited: a dynamic programming approach. CPM 2005, 190-200. Springer.
    • (2005) CPM , pp. 190-200
    • Behzadi, B.1    Fessant, F.L.2
  • 21
    • 34547630306 scopus 로고    scopus 로고
    • DNA sequence compression using the normalized maximum likelihood model for discrete regression
    • IEEE Computer Society
    • Tabus I., Korodi G., Rissanen J. DNA sequence compression using the normalized maximum likelihood model for discrete regression. Proc. of the IEEE Data Compression Conference 2003, 253-262. IEEE Computer Society.
    • (2003) Proc. of the IEEE Data Compression Conference , pp. 253-262
    • Tabus, I.1    Korodi, G.2    Rissanen, J.3
  • 23
    • 13844281512 scopus 로고    scopus 로고
    • An efficient normalized maximum likelihood algorithm for DNA sequence compression
    • Korodi G., Tabus I. An efficient normalized maximum likelihood algorithm for DNA sequence compression. ACM Transactions on Information Systems 2005, 23:3-34.
    • (2005) ACM Transactions on Information Systems , vol.23 , pp. 3-34
    • Korodi, G.1    Tabus, I.2
  • 24
    • 77955886068 scopus 로고    scopus 로고
    • G-SQZ: compact encoding of genomic sequence and quality data
    • Tembe W., Lowey J., Suh E. G-SQZ: compact encoding of genomic sequence and quality data. Bioinformatics 2010.
    • (2010) Bioinformatics
    • Tembe, W.1    Lowey, J.2    Suh, E.3
  • 28
    • 0003573193 scopus 로고
    • A block-sorting lossless data compression algorithm
    • Tech. Rep. 124
    • Burrows M., Wheeler D. A block-sorting lossless data compression algorithm. Digital Equipment Corporation 1994, Tech. Rep. 124.
    • (1994) Digital Equipment Corporation
    • Burrows, M.1    Wheeler, D.2
  • 31
    • 0018019231 scopus 로고
    • Compression of individual sequences via variable-rate coding
    • Ziv J., Lempel A. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 1978, 24:530-536.
    • (1978) IEEE Transactions on Information Theory , vol.24 , pp. 530-536
    • Ziv, J.1    Lempel, A.2
  • 32
    • 0034188132 scopus 로고    scopus 로고
    • Grammar-based codes: a new class of universal lossless source codes
    • Kieffer J.C., Yang E.-H. Grammar-based codes: a new class of universal lossless source codes. IEEE Transactions on Information Theory 2000, 46:737-754.
    • (2000) IEEE Transactions on Information Theory , vol.46 , pp. 737-754
    • Kieffer, J.C.1    Yang, E.-H.2
  • 34
    • 0000523223 scopus 로고    scopus 로고
    • Compression and explanation using hierarchical grammars
    • Nevill-Manning C.G., Witten I.H. Compression and explanation using hierarchical grammars. The Computer Journal 1997, 40:103-116.
    • (1997) The Computer Journal , vol.40 , pp. 103-116
    • Nevill-Manning, C.G.1    Witten, I.H.2
  • 35
    • 0024035590 scopus 로고
    • Source encoding using syntactic information source models
    • Cameron R.D. Source encoding using syntactic information source models. IEEE Transactions on Information Theory 1988, 34:843-850.
    • (1988) IEEE Transactions on Information Theory , vol.34 , pp. 843-850
    • Cameron, R.D.1
  • 40
    • 42549148061 scopus 로고    scopus 로고
    • RNACompress: grammar-based compression and informational complexity measurement of RNA secondary structure
    • Liu Q., Yang Y., Chen C., Bu J., Zhang Y., Ye X. RNACompress: grammar-based compression and informational complexity measurement of RNA secondary structure. BMC Bioinformatics 2008, 9:176+.
    • (2008) BMC Bioinformatics , vol.9
    • Liu, Q.1    Yang, Y.2    Chen, C.3    Bu, J.4    Zhang, Y.5    Ye, X.6
  • 41
    • 0034522921 scopus 로고    scopus 로고
    • RNA secondary structure: physical and computational aspects
    • Higgs P.G. RNA secondary structure: physical and computational aspects. Journal Quarterly Reviews of Biophysics 2000, 33:199-253.
    • (2000) Journal Quarterly Reviews of Biophysics , vol.33 , pp. 199-253
    • Higgs, P.G.1
  • 43
    • 0011063126 scopus 로고
    • Data Compression: Methods and Theory
    • Computer Science Press, Rockville, MD
    • Storer J. Data Compression: Methods and Theory. Principles of Computer Science Series 1988, Computer Science Press, Rockville, MD.
    • (1988) Principles of Computer Science Series
    • Storer, J.1
  • 44
    • 0016486577 scopus 로고
    • Universal codeword sets and representations of the integers
    • Elias P. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 1975, 21:194-203.
    • (1975) IEEE Transactions on Information Theory , vol.21 , pp. 194-203
    • Elias, P.1
  • 46
    • 84938015047 scopus 로고
    • A Method for the Construction of Minimum-Redundancy Codes
    • Huffman D.A. A Method for the Construction of Minimum-Redundancy Codes. IRE 1952, vol. 40:1098-1101.
    • (1952) IRE , vol.40 , pp. 1098-1101
    • Huffman, D.A.1
  • 48
    • 67649855126 scopus 로고    scopus 로고
    • Data structures and compression algorithms for genomic sequence data
    • Brandon M.C., Wallace D.C., Baldi P. Data structures and compression algorithms for genomic sequence data. Bioinformatics 2009.
    • (2009) Bioinformatics
    • Brandon, M.C.1    Wallace, D.C.2    Baldi, P.3
  • 49
    • 77957765256 scopus 로고    scopus 로고
    • Data structures and compression algorithms for high-throughput sequencing technologies
    • Daily K., Rigor P., Christley S., Xie X., Baldi P. Data structures and compression algorithms for high-throughput sequencing technologies. BMC Bioinformatics 2010, 11:514.
    • (2010) BMC Bioinformatics , vol.11 , pp. 514
    • Daily, K.1    Rigor, P.2    Christley, S.3    Xie, X.4    Baldi, P.5
  • 50
  • 52
    • 4243175869 scopus 로고    scopus 로고
    • Improving table compression with combinatorial optimization
    • Buchsbaum A.L., Fowler G.S., Giancarlo R. Improving table compression with combinatorial optimization. Journal of the ACM 2003, 50:825-851.
    • (2003) Journal of the ACM , vol.50 , pp. 825-851
    • Buchsbaum, A.L.1    Fowler, G.S.2    Giancarlo, R.3
  • 54
    • 35448929046 scopus 로고    scopus 로고
    • Compressing table data with column dependency
    • Vo B.D., Vo K.-P. Compressing table data with column dependency. Theoretical Computer Science 2007, 387:273-283.
    • (2007) Theoretical Computer Science , vol.387 , pp. 273-283
    • Vo, B.D.1    Vo, K.-P.2
  • 56
    • 84876710852 scopus 로고    scopus 로고
    • Genome, 1000 genome project. Available at:
    • Genome, 1000 genome project. Available at: , 2008. http://www.1000genomes.org/.
    • (2008)
  • 57
    • 45249110222 scopus 로고    scopus 로고
    • Compressing DNA sequence databases with coil
    • White W.T., Hendy M. Compressing DNA sequence databases with coil. BMC Bioinformatics 2008, 9:242.
    • (2008) BMC Bioinformatics , vol.9 , pp. 242
    • White, W.T.1    Hendy, M.2
  • 59
    • 0026466830 scopus 로고
    • Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods
    • Gutell R.R., Power A., Hertz G.Z., Putz E.J., Stormo G.D. Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Research 1992, 20:5785-5795.
    • (1992) Nucleic Acids Research , vol.20 , pp. 5785-5795
    • Gutell, R.R.1    Power, A.2    Hertz, G.Z.3    Putz, E.J.4    Stormo, G.D.5
  • 61
    • 2642530436 scopus 로고    scopus 로고
    • DNA sequence analysis linguistic tools: contrast vocabularies, compositional spectra and linguistic complexity
    • Bolshoy A. DNA sequence analysis linguistic tools: contrast vocabularies, compositional spectra and linguistic complexity. Applied Bioinformatics 2003, 2:103-112.
    • (2003) Applied Bioinformatics , vol.2 , pp. 103-112
    • Bolshoy, A.1
  • 62
    • 37549034468 scopus 로고    scopus 로고
    • Information theories in molecular biology and genomics
    • Konopka A.K. Information theories in molecular biology and genomics. Nature Encyclopedia of the Human Genome 2005, 3:464-469.
    • (2005) Nature Encyclopedia of the Human Genome , vol.3 , pp. 464-469
    • Konopka, A.K.1
  • 65
    • 0032919622 scopus 로고    scopus 로고
    • Significantly lower entropy estimates for natural DNA sequences
    • Loewenstern D., Yianilos P.N. Significantly lower entropy estimates for natural DNA sequences. Journal of Computational Biology 1999, 6:125-142.
    • (1999) Journal of Computational Biology , vol.6 , pp. 125-142
    • Loewenstern, D.1    Yianilos, P.N.2
  • 67
    • 0032554318 scopus 로고    scopus 로고
    • Correlations in protein sequences and property codes
    • Weiss O., Herzel H. Correlations in protein sequences and property codes. Journal of Theoretical Biology 1998, 190:341-353.
    • (1998) Journal of Theoretical Biology , vol.190 , pp. 341-353
    • Weiss, O.1    Herzel, H.2
  • 71
    • 0347494019 scopus 로고
    • Linear algorithm for data compression via string matching
    • Rodeh M., Pratt V.R., Even S. Linear algorithm for data compression via string matching. Journal of the ACM 1981, 28:16-24.
    • (1981) Journal of the ACM , vol.28 , pp. 16-24
    • Rodeh, M.1    Pratt, V.R.2    Even, S.3
  • 74
    • 0002139526 scopus 로고
    • The myriad virtues of subword trees
    • Springer-Verlag, Combinatorial Algorithms on Words
    • Apostolico A. The myriad virtues of subword trees. NATO ISI Series 1985, 85-96. Springer-Verlag.
    • (1985) NATO ISI Series , pp. 85-96
    • Apostolico, A.1
  • 76
    • 0041427690 scopus 로고    scopus 로고
    • New text indexing functionalities of compressed suffix arrays
    • Sadakane K. New text indexing functionalities of compressed suffix arrays. Journal of Algorithms 2003, 48:294-313.
    • (2003) Journal of Algorithms , vol.48 , pp. 294-313
    • Sadakane, K.1
  • 78
    • 0035747893 scopus 로고    scopus 로고
    • Indexing huge genome sequences for solving various problems
    • Sadakane K., Shibyya T. Indexing huge genome sequences for solving various problems. Genome Informatics 2001, 12:175-183.
    • (2001) Genome Informatics , vol.12 , pp. 175-183
    • Sadakane, K.1    Shibyya, T.2
  • 81
    • 25644453578 scopus 로고    scopus 로고
    • A space-efficient construction of the Burrows-Wheeler transform for genomic data
    • Lippert R.A., Mobarry C.M., Walenz B.P. A space-efficient construction of the Burrows-Wheeler transform for genomic data. Journal of Computational Biology 2005, 12:943-951.
    • (2005) Journal of Computational Biology , vol.12 , pp. 943-951
    • Lippert, R.A.1    Mobarry, C.M.2    Walenz, B.P.3
  • 82
    • 18844405663 scopus 로고    scopus 로고
    • Space-efficient whole genome comparisons with Burrows-Wheeler transforms
    • Lippert R.A. Space-efficient whole genome comparisons with Burrows-Wheeler transforms. Journal of Computational Biology 2005, 12:407-415.
    • (2005) Journal of Computational Biology , vol.12 , pp. 407-415
    • Lippert, R.A.1
  • 83
    • 34047188666 scopus 로고    scopus 로고
    • Compressed suffix tree-a basis for genome-scale sequence analysis
    • Välimäki N., Gerlach W., Dixit K., Mäkinen V. Compressed suffix tree-a basis for genome-scale sequence analysis. Bioinformatics 2007, 23:629-630.
    • (2007) Bioinformatics , vol.23 , pp. 629-630
    • Välimäki, N.1    Gerlach, W.2    Dixit, K.3    Mäkinen, V.4
  • 84
    • 77957272611 scopus 로고    scopus 로고
    • A survey of sequence alignment algorithms for next-generation sequencing
    • Li H., Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 2010, 11:473-483.
    • (2010) Briefings in Bioinformatics , vol.11 , pp. 473-483
    • Li, H.1    Homer, N.2
  • 85
    • 62349130698 scopus 로고    scopus 로고
    • Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
    • Langmead B., Trapnell C., Pop M., Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 2009, 10:R25.
    • (2009) Genome Biology , vol.10
    • Langmead, B.1    Trapnell, C.2    Pop, M.3    Salzberg, S.4
  • 86
    • 67649884743 scopus 로고    scopus 로고
    • Fast and accurate short read alignment with Burrows-Wheeler transform
    • Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25:1754-1760.
    • (2009) Bioinformatics , vol.25 , pp. 1754-1760
    • Li, H.1    Durbin, R.2
  • 91
    • 0037342499 scopus 로고    scopus 로고
    • Alignment-free sequence comparison: A review
    • Vinga S., Almeida J.S. Alignment-free sequence comparison: A review. Bioinformatics 2003, 19:513-523.
    • (2003) Bioinformatics , vol.19 , pp. 513-523
    • Vinga, S.1    Almeida, J.S.2
  • 94
    • 0032891717 scopus 로고    scopus 로고
    • Transformation distances: a family of dissimilarity measures based on movements of segments
    • Varré J.-S., Delahaye J.P., Rivals É. Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics 1999, 15:194-202.
    • (1999) Bioinformatics , vol.15 , pp. 194-202
    • Varré, J.-S.1    Delahaye, J.P.2    Rivals, É.3
  • 95
    • 45849112904 scopus 로고    scopus 로고
    • A small trip in the untranquil world of genomes-a survey on the detection and analysis of rearrangment breakpoints
    • Lemaitre C., Sagot M.F. A small trip in the untranquil world of genomes-a survey on the detection and analysis of rearrangment breakpoints. Theoretical Computer Science 2008, 395:171-192.
    • (2008) Theoretical Computer Science , vol.395 , pp. 171-192
    • Lemaitre, C.1    Sagot, M.F.2
  • 96
    • 78650441692 scopus 로고    scopus 로고
    • Maximal words in sequence comparisons based on subword composition
    • Springer, Berlin, Heidelberg, T. Elomaa, H. Mannila, P. Orponen (Eds.) Algorithms and Applications
    • Apostolico A. Maximal words in sequence comparisons based on subword composition. Lecture Notes in Computer Science 2010, vol. 6060:34-44. Springer, Berlin, Heidelberg. T. Elomaa, H. Mannila, P. Orponen (Eds.).
    • (2010) Lecture Notes in Computer Science , vol.6060 , pp. 34-44
    • Apostolico, A.1
  • 98
    • 70349866695 scopus 로고    scopus 로고
    • Upcoming challenges for multiple sequence alignment methods in the high-throughput era
    • Kemena C., Notredame C. Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 2009, 25:2455-2465.
    • (2009) Bioinformatics , vol.25 , pp. 2455-2465
    • Kemena, C.1    Notredame, C.2
  • 99
    • 84856608242 scopus 로고    scopus 로고
    • The application of data compression-based distances to biological sequences
    • Springer, F. Emmert-Streib, M. Dehmer (Eds.)
    • Kertesz-Farkas A., Kocsor A., Pongor S. The application of data compression-based distances to biological sequences. Information Theory and Statistical Learning 2009, 83-100. Springer. F. Emmert-Streib, M. Dehmer (Eds.).
    • (2009) Information Theory and Statistical Learning , pp. 83-100
    • Kertesz-Farkas, A.1    Kocsor, A.2    Pongor, S.3
  • 102
    • 0242643741 scopus 로고    scopus 로고
    • A new sequence distance measure for phylogenetic tree construction
    • Otu H.H., Sayood K. A new sequence distance measure for phylogenetic tree construction. Bioinformatics 2003, 19:2122-2130.
    • (2003) Bioinformatics , vol.19 , pp. 2122-2130
    • Otu, H.H.1    Sayood, K.2
  • 103
    • 70350165224 scopus 로고    scopus 로고
    • Normalized Lempel-Ziv complexity and its application in bio-sequence analysis
    • Zhang Y., Hao J., Zhou C., Chang K. Normalized Lempel-Ziv complexity and its application in bio-sequence analysis. Journal of Mathematical Chemistry 2009, 46:1203-1212.
    • (2009) Journal of Mathematical Chemistry , vol.46 , pp. 1203-1212
    • Zhang, Y.1    Hao, J.2    Zhou, C.3    Chang, K.4
  • 106
    • 34547753523 scopus 로고    scopus 로고
    • Compression-based classification of biological sequences and structures via the universal similarity metric: experimental assessment
    • Ferragina P., Giancarlo R., Greco V., Manzini G., Valiente G. Compression-based classification of biological sequences and structures via the universal similarity metric: experimental assessment. BMC Bioinformatics 2007, 8:252.
    • (2007) BMC Bioinformatics , vol.8 , pp. 252
    • Ferragina, P.1    Giancarlo, R.2    Greco, V.3    Manzini, G.4    Valiente, G.5
  • 110
    • 0035102453 scopus 로고    scopus 로고
    • An information-based sequence distance and its application to whole mitochondrial genome phylogeny
    • Li M., Badger J.H., Chen X., Kwong S., Kearney P.E., Zhang H. An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 2001, 17:149-154.
    • (2001) Bioinformatics , vol.17 , pp. 149-154
    • Li, M.1    Badger, J.H.2    Chen, X.3    Kwong, S.4    Kearney, P.E.5    Zhang, H.6
  • 113
    • 34247141438 scopus 로고    scopus 로고
    • Evolutionary hierarchies of conserved blocks in 5'-noncoding sequences of dicot rbcS genes
    • Weeks K., Chuzhanova N., Donnison I., Scott I. Evolutionary hierarchies of conserved blocks in 5'-noncoding sequences of dicot rbcS genes. BMC Evolutionary Biology 2007, 7:51.
    • (2007) BMC Evolutionary Biology , vol.7 , pp. 51
    • Weeks, K.1    Chuzhanova, N.2    Donnison, I.3    Scott, I.4
  • 115
    • 77955613051 scopus 로고    scopus 로고
    • Clustering of protein families into functional subtypes using relative complexity measure with reduced amino acid alphabets
    • Albayrak A., Otu H., Sezerman U. Clustering of protein families into functional subtypes using relative complexity measure with reduced amino acid alphabets. BMC Bioinformatics 2010, 11:428.
    • (2010) BMC Bioinformatics , vol.11 , pp. 428
    • Albayrak, A.1    Otu, H.2    Sezerman, U.3
  • 117
    • 32544454688 scopus 로고    scopus 로고
    • Application of compression-based distance measures to protein sequence classification: a methodological study
    • Kocsor A., Kartesz-Farkas A., Kajan L., Pongor S. Application of compression-based distance measures to protein sequence classification: a methodological study. Bioinformatics 2005, 22:407-412.
    • (2005) Bioinformatics , vol.22 , pp. 407-412
    • Kocsor, A.1    Kartesz-Farkas, A.2    Kajan, L.3    Pongor, S.4
  • 118
    • 2442662802 scopus 로고    scopus 로고
    • Measuring the similarity of protein structures by means of the universal similarity metric
    • Krasnogor N., Pelta D.A. Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 2004, 20:1015-1021.
    • (2004) Bioinformatics , vol.20 , pp. 1015-1021
    • Krasnogor, N.1    Pelta, D.A.2
  • 119
    • 33748808998 scopus 로고    scopus 로고
    • Protein-based phylogenetic analysis by using hydropathy profile of amino acids
    • Liu N., Wang T. Protein-based phylogenetic analysis by using hydropathy profile of amino acids. FEBS Letters 2006, 580:5321-5327.
    • (2006) FEBS Letters , vol.580 , pp. 5321-5327
    • Liu, N.1    Wang, T.2
  • 120
    • 39149105621 scopus 로고    scopus 로고
    • Comparison of TOPS strings based on LZ complexity
    • Liu L., Wang T. Comparison of TOPS strings based on LZ complexity. Journal of Theoretical Biology 2008, 251:159-166.
    • (2008) Journal of Theoretical Biology , vol.251 , pp. 159-166
    • Liu, L.1    Wang, T.2
  • 121
    • 34547811565 scopus 로고    scopus 로고
    • Protein structure comparison through fuzzy contact maps and the universal similarity metric
    • Universitat Politécnica de Catalunya
    • Pelta D.A., Gonzales J.R., Krasnogor N. Protein structure comparison through fuzzy contact maps and the universal similarity metric. Proc. of the Joint 4th EUSFLAT & 11th LFA Conference 2005, 1124-1129. Universitat Politécnica de Catalunya.
    • (2005) Proc. of the Joint 4th EUSFLAT & 11th LFA Conference , pp. 1124-1129
    • Pelta, D.A.1    Gonzales, J.R.2    Krasnogor, N.3
  • 122
    • 84876713205 scopus 로고    scopus 로고
    • Compression ratios based on the universal similarity metric still yield protein distances far from CATH distances. CoRR.
    • F. Rosselló, J. Rocha, J. Segura, Compression ratios based on the universal similarity metric still yield protein distances far from CATH distances. CoRR http://abs/q-bio/0603007.
    • Rosselló, F.1    Rocha, J.2    Segura, J.3
  • 124
    • 38949177447 scopus 로고    scopus 로고
    • ProCKSI: a decision support system for protein (structure) comparison, knowledge, similarity and information
    • Barthel D., Hirst J.D., Blażewicz J., Burke E.K., Kransnogor N. ProCKSI: a decision support system for protein (structure) comparison, knowledge, similarity and information. BMC Bioinformatics 2008, 8:416.
    • (2008) BMC Bioinformatics , vol.8 , pp. 416
    • Barthel, D.1    Hirst, J.D.2    Blazewicz, J.3    Burke, E.K.4    Kransnogor, N.5
  • 125
    • 0037248694 scopus 로고    scopus 로고
    • A divide-and-conquer approach to fragment assembly
    • Otu H.H., Sayood K. A divide-and-conquer approach to fragment assembly. Bioinformatics 2003, 19:22-29.
    • (2003) Bioinformatics , vol.19 , pp. 22-29
    • Otu, H.H.1    Sayood, K.2
  • 129
    • 0019887799 scopus 로고
    • Identification of common molecular subsequences
    • Smith T., Waterman M. Identification of common molecular subsequences. Journal of Molecular Biology 1981, 147:195-197.
    • (1981) Journal of Molecular Biology , vol.147 , pp. 195-197
    • Smith, T.1    Waterman, M.2
  • 131
    • 84935113569 scopus 로고
    • Error bounds for convolution codes and an asymptotically optimum decoding algorithm
    • Viterbi A.J. Error bounds for convolution codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 1967, 13:260-269.
    • (1967) IEEE Transactions on Information Theory , vol.13 , pp. 260-269
    • Viterbi, A.J.1
  • 133
    • 0942266549 scopus 로고    scopus 로고
    • A sub-quadratic sequence alignment algorithm for unrestricted cost matrices
    • Crochemore M., Landau G., Ziv-Ukelson M. A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. SIAM Journal on Computing 2003, 32:1654-1673.
    • (2003) SIAM Journal on Computing , vol.32 , pp. 1654-1673
    • Crochemore, M.1    Landau, G.2    Ziv-Ukelson, M.3
  • 138
    • 0041664867 scopus 로고    scopus 로고
    • Finding haplotype block boundaries by using the Minimum-Description-Length principle
    • Anderson E.C., Novembre J. Finding haplotype block boundaries by using the Minimum-Description-Length principle. The American Journal of Human Genetics 2003, 73:336-354.
    • (2003) The American Journal of Human Genetics , vol.73 , pp. 336-354
    • Anderson, E.C.1    Novembre, J.2
  • 141
    • 0041989761 scopus 로고    scopus 로고
    • An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries
    • World Scientific
    • Koivisto M. An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries. Proc. of the Pacific Symposium on Biocomputing 2003, 502-513. World Scientific.
    • (2003) Proc. of the Pacific Symposium on Biocomputing , pp. 502-513
    • Koivisto, M.1
  • 144
    • 33645034499 scopus 로고    scopus 로고
    • Minimum Description Length tutorial
    • MIT Press, Cambridge, P.D. Grünwald, I.J. Myung, M.A. Pitt (Eds.)
    • Grünwald P.D. Minimum Description Length tutorial. Advances in Minimum Description Length: Theory and Applications 2005, 23-80. MIT Press, Cambridge. P.D. Grünwald, I.J. Myung, M.A. Pitt (Eds.).
    • (2005) Advances in Minimum Description Length: Theory and Applications , pp. 23-80
    • Grünwald, P.D.1
  • 145
    • 0033877395 scopus 로고    scopus 로고
    • Minimum Description Length induction, Bayesianism, and Kolmogorov complexity
    • Vitányi P.M.B., Li M. Minimum Description Length induction, Bayesianism, and Kolmogorov complexity. IEEE Transactions on Information Theory 2000, 46:446-464.
    • (2000) IEEE Transactions on Information Theory , vol.46 , pp. 446-464
    • Vitányi, P.M.B.1    Li, M.2
  • 147
    • 33847756846 scopus 로고    scopus 로고
    • Statistics on words with applications to biological sequences
    • Cambridge University Press, M. Lotaire (Ed.) Applied Combinatorics on Words
    • Reinert G., Schbath S., Waterman M. Statistics on words with applications to biological sequences. Encyclopedia of Mathematics and its Applications 2005, vol. 105:252-323. Cambridge University Press. M. Lotaire (Ed.).
    • (2005) Encyclopedia of Mathematics and its Applications , vol.105 , pp. 252-323
    • Reinert, G.1    Schbath, S.2    Waterman, M.3
  • 148
    • 34548784572 scopus 로고    scopus 로고
    • Evaluating protein motif significance measures: a case study on prosite patterns
    • IEEE Computer Society
    • Ferreira P.G., Azevedo P.J. Evaluating protein motif significance measures: a case study on prosite patterns. Proc. of the Computational Intelligence and Data Mining 2007, 34-43. IEEE Computer Society.
    • (2007) Proc. of the Computational Intelligence and Data Mining , pp. 34-43
    • Ferreira, P.G.1    Azevedo, P.J.2
  • 149
    • 0027194328 scopus 로고
    • Discovering simple DNA sequences by the algorithmic significance method
    • Milosavljevic A., Jurka J. Discovering simple DNA sequences by the algorithmic significance method. Computer Applications in the Biosciences 1993, 9:407-411.
    • (1993) Computer Applications in the Biosciences , vol.9 , pp. 407-411
    • Milosavljevic, A.1    Jurka, J.2
  • 150
    • 0011908356 scopus 로고
    • Discovering dependencies via algorithmic mutual information: A case study in DNA sequence comparisons
    • Milosavljevic A. Discovering dependencies via algorithmic mutual information: A case study in DNA sequence comparisons. Machine Learning 1995, 21:35-50.
    • (1995) Machine Learning , vol.21 , pp. 35-50
    • Milosavljevic, A.1
  • 155
    • 84856618500 scopus 로고    scopus 로고
    • Evaluating the significance of sequence motifs by the Minimum Description Length principle.
    • Q. Ma, J.T.L. Wang, Evaluating the significance of sequence motifs by the Minimum Description Length principle, 2000.
    • (2000)
    • Ma, Q.1    Wang, J.T.L.2
  • 157
    • 0000301097 scopus 로고
    • A greedy heuristic for the set-covering problem
    • Chvátal V. A greedy heuristic for the set-covering problem. Mathematics of Operations Research 1979, 4:233-235.
    • (1979) Mathematics of Operations Research , vol.4 , pp. 233-235
    • Chvátal, V.1
  • 158
    • 0030670589 scopus 로고    scopus 로고
    • Efficient discovery of conserved patterns using a pattern graph
    • Jonassen I. Efficient discovery of conserved patterns using a pattern graph. Computer Applications in the Biosciences 1997, 13:509-522.
    • (1997) Computer Applications in the Biosciences , vol.13 , pp. 509-522
    • Jonassen, I.1
  • 161
    • 33645732240 scopus 로고    scopus 로고
    • Modeling cellular machinery through biological network comparison
    • Sharan R., Ideker T. Modeling cellular machinery through biological network comparison. Nature Biotechnology 2006, 24:427-433.
    • (2006) Nature Biotechnology , vol.24 , pp. 427-433
    • Sharan, R.1    Ideker, T.2
  • 162
    • 39549096223 scopus 로고    scopus 로고
    • Biomolecular network querying: a promising approach in systems biology
    • Zhang S., Zhang X.-S., Chen L. Biomolecular network querying: a promising approach in systems biology. BMC Systems Biology 2008, 2:5.
    • (2008) BMC Systems Biology , vol.2 , pp. 5
    • Zhang, S.1    Zhang, X.-S.2    Chen, L.3
  • 164
    • 0033258311 scopus 로고    scopus 로고
    • Unsupervised knowledge discovery in medical databases using relevance networks
    • Hanley and Belfus
    • Butte A.J., Kohane I.S. Unsupervised knowledge discovery in medical databases using relevance networks. Proc. of the AMIA Symposium 1999, 711-715. Hanley and Belfus.
    • (1999) Proc. of the AMIA Symposium , pp. 711-715
    • Butte, A.J.1    Kohane, I.S.2
  • 165
    • 0033655775 scopus 로고    scopus 로고
    • Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements
    • World Scientific
    • Butte A.J., Kohane I.S. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Proc. of the Pacific Symposium on Biocomputing 2000, 415-426. World Scientific.
    • (2000) Proc. of the Pacific Symposium on Biocomputing , pp. 415-426
    • Butte, A.J.1    Kohane, I.S.2
  • 167
    • 84876725798 scopus 로고    scopus 로고
    • Reverse engineering of the yeast transcriptional network using the ARACNE algorithm, Manuscript.
    • A. Margolin, N. Banerjee, I. Nemenman, A. Califano, Reverse engineering of the yeast transcriptional network using the ARACNE algorithm, Manuscript.
    • Margolin, A.1    Banerjee, N.2    Nemenman, I.3    Califano, A.4
  • 169
    • 34547666722 scopus 로고    scopus 로고
    • Biological networks: Comparison, conservation, and evolution via relative description length
    • Chor B., Tuller T. Biological networks: Comparison, conservation, and evolution via relative description length. Journal of Computational Biology 2007, 14:817-834.
    • (2007) Journal of Computational Biology , vol.14 , pp. 817-834
    • Chor, B.1    Tuller, T.2
  • 170
    • 0033982936 scopus 로고    scopus 로고
    • KEGG: kyoto encyclopedia of genes and genomes
    • Kanehisa M., Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research 2000, 28:27-30.
    • (2000) Nucleic Acids Research , vol.28 , pp. 27-30
    • Kanehisa, M.1    Goto, S.2
  • 171
    • 84876735544 scopus 로고    scopus 로고
    • NCBI, NCBI taxonomy database. Available at:
    • NCBI, NCBI taxonomy database. Available at: , 2007. http://www.ncbi.nlm.nih.gov/entrez/linkout/tutorial/taxtour.html.
    • (2007)
  • 173
    • 85015077644 scopus 로고    scopus 로고
    • The digital code of DNA
    • Hood L., Galas D. The digital code of DNA. Nature 2003, 421:444-448.
    • (2003) Nature , vol.421 , pp. 444-448
    • Hood, L.1    Galas, D.2
  • 174
    • 0030282113 scopus 로고    scopus 로고
    • The power of amnesia: learning probabilistic automata with variable memory length
    • Springer, Netherlands
    • Ron D., Singer Y. The power of amnesia: learning probabilistic automata with variable memory length. Machine Learning 1996, 117-149. Springer, Netherlands.
    • (1996) Machine Learning , pp. 117-149
    • Ron, D.1    Singer, Y.2
  • 175
    • 0035109647 scopus 로고    scopus 로고
    • Variations on probabilistic suffix trees: statistical modeling and prediction of protein families
    • Bejerano G., Yona G. Variations on probabilistic suffix trees: statistical modeling and prediction of protein families. Bioinformatics 2001, 17:23-43.
    • (2001) Bioinformatics , vol.17 , pp. 23-43
    • Bejerano, G.1    Yona, G.2
  • 178
    • 41949122106 scopus 로고    scopus 로고
    • On finite memory universal data compression and classification of individual sequences
    • Ziv J. On finite memory universal data compression and classification of individual sequences. IEEE Transactions on Information Theory 2008, 54:1626-1636.
    • (2008) IEEE Transactions on Information Theory , vol.54 , pp. 1626-1636
    • Ziv, J.1
  • 179
    • 25144456056 scopus 로고    scopus 로고
    • Computational cluster validation in post-genomic data analysis
    • Handl J., Knowles J., Kell D.B. Computational cluster validation in post-genomic data analysis. Bioinformatics 2005, 21(15):3201-3212.
    • (2005) Bioinformatics , vol.21 , Issue.15 , pp. 3201-3212
    • Handl, J.1    Knowles, J.2    Kell, D.B.3
  • 184
    • 24344458137 scopus 로고    scopus 로고
    • Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy
    • Peng H., Long F., Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 2005, 27:1226-1238.
    • (2005) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.27 , pp. 1226-1238
    • Peng, H.1    Long, F.2    Ding, C.3
  • 185
    • 33749131719 scopus 로고    scopus 로고
    • Feature selection for microarray data analysis using mutual information and rough set theory
    • Springer, Boston
    • Zhou W., Zhou C., Liu G., Zhu H. Feature selection for microarray data analysis using mutual information and rough set theory. IFIP International Federation for Information Processing 2007, vol. 204:916-927. Springer, Boston.
    • (2007) IFIP International Federation for Information Processing , vol.204 , pp. 916-927
    • Zhou, W.1    Zhou, C.2    Liu, G.3    Zhu, H.4
  • 189
    • 0023979656 scopus 로고
    • On classification with empirically observed statistics and universal data compression
    • Ziv J. On classification with empirically observed statistics and universal data compression. IEEE Transactions on Information Theory 1988, 34:278-286.
    • (1988) IEEE Transactions on Information Theory , vol.34 , pp. 278-286
    • Ziv, J.1
  • 190
    • 0034318252 scopus 로고    scopus 로고
    • Asymptotically optimal low-complexity sequential lossless coding for piecewise-stationary memoryless sources
    • Shamir G.I., Costello D.J., Merhav N. Asymptotically optimal low-complexity sequential lossless coding for piecewise-stationary memoryless sources. IEEE Transactions on Information Theory 2000, 46:2244-2467.
    • (2000) IEEE Transactions on Information Theory , vol.46 , pp. 2244-2467
    • Shamir, G.I.1    Costello, D.J.2    Merhav, N.3
  • 191
    • 39049189269 scopus 로고    scopus 로고
    • A compression-based approach for coding sequences identification in prokaryotic genomes
    • Menconi G., Marangoni R. A compression-based approach for coding sequences identification in prokaryotic genomes. Journal of Computational Biology 2006, 13:1477-1488.
    • (2006) Journal of Computational Biology , vol.13 , pp. 1477-1488
    • Menconi, G.1    Marangoni, R.2
  • 192
    • 19044363196 scopus 로고    scopus 로고
    • Sublinear growth of information in DNA sequences
    • Menconi G. Sublinear growth of information in DNA sequences. Bulletin of Mathematical Biology 2004, 67:737-759.
    • (2004) Bulletin of Mathematical Biology , vol.67 , pp. 737-759
    • Menconi, G.1
  • 193
    • 52449094233 scopus 로고    scopus 로고
    • Short tandem repeats in human exons: a target for disease mutations
    • Madsen B.E., Villesen P., Wiuf C. Short tandem repeats in human exons: a target for disease mutations. BMC Genomics 2008, 9:410+.
    • (2008) BMC Genomics , vol.9
    • Madsen, B.E.1    Villesen, P.2    Wiuf, C.3
  • 196
    • 45149113022 scopus 로고    scopus 로고
    • Comparative analysis of long DNA sequences by per element information content using different contexts
    • Dix T.I., Powell D.R., Allison L., Bernal J., Jaeger S., Stern L. Comparative analysis of long DNA sequences by per element information content using different contexts. BMC Bioinformatics 2007, 8(Suppl. 2):S10.
    • (2007) BMC Bioinformatics , vol.8 , Issue.SUPPL. 2
    • Dix, T.I.1    Powell, D.R.2    Allison, L.3    Bernal, J.4    Jaeger, S.5    Stern, L.6
  • 197
    • 84856610566 scopus 로고    scopus 로고
    • Development of fast tandem repeat analysis and lossless compression method for DNA sequence
    • Universal Academy Press, Tokyo
    • Modegi T. Development of fast tandem repeat analysis and lossless compression method for DNA sequence. Proc. of Genome Informatics Workshop 2004, P088. Universal Academy Press, Tokyo.
    • (2004) Proc. of Genome Informatics Workshop
    • Modegi, T.1
  • 200
    • 67649138818 scopus 로고    scopus 로고
    • Review, Nature reviews collection on microRNAs, Nature Review. doi:10.1038/nrg2202
    • Review, Nature reviews collection on microRNAs, Nature Review. doi:10.1038/nrg2202.
  • 203
    • 84876737874 scopus 로고    scopus 로고
    • Data compression web site.
    • J. Abel, Data compression web site. , 2002. http://www.data-compression.info/.
    • (2002)
    • Abel, J.1
  • 204
    • 84876744440 scopus 로고    scopus 로고
    • Data compression programs.
    • M. Mahoney, Data compression programs. , 2008. http://www.cs.fit.edu/~mmahoney/compression/.
    • (2008)
    • Mahoney, M.1
  • 205
    • 84876734580 scopus 로고    scopus 로고
    • DNA corpus.
    • G. Manzini, M. Rastero, DNA corpus. , 2005. http://www.mfn.unipmn.it/~manzini/dnacorpus/index.html.
    • (2005)
    • Manzini, G.1    Rastero, M.2
  • 207
    • 84876697202 scopus 로고    scopus 로고
    • XM software.
    • M.D. Cao, T.I. Dix, L. Allison, C. Mears, XM software. , 2007. http://www.csse.monash.edu.au/~lloyd/tildeStrings/Compress/2007DCC/.
    • (2007)
    • Cao, M.D.1    Dix, T.I.2    Allison, L.3    Mears, C.4
  • 208
    • 84876729143 scopus 로고    scopus 로고
    • Bioinformatics Solutions, Bioinformatics solutions web site.
    • Bioinformatics Solutions, Bioinformatics solutions web site. , 2003. http://www.bioinformaticssolutions.com/products/ph/.
    • (2003)
  • 209
    • 84876697058 scopus 로고    scopus 로고
    • Pzip home page.
    • G. Fowler, Pzip home page. , 2003. http://www.research.att.com/~gsf/man/man1/pzip.html.
    • Fowler, G.1
  • 210
    • 84876713620 scopus 로고    scopus 로고
    • Vcodex home page.
    • K.-P. Vo, Vcodex home page. , 2002. http://www.research.att.com/~gsf/download/ref/vcodex/vcodex.html.
    • Vo, K.-P.1
  • 211
    • 84876722702 scopus 로고    scopus 로고
    • ProjectDNACompression home page.
    • M.C. Brandon, D.C. Wallace, P. Baldi, ProjectDNACompression home page. , 2009. http://www.mitomap.org/MITOWIKI/ProjectDNACompression.
    • (2009)
    • Brandon, M.C.1    Wallace, D.C.2    Baldi, P.3
  • 212
    • 84876717460 scopus 로고    scopus 로고
    • DNAzip home page.
    • S. Christley, Y. Lu, C. Li, X. Xie, DNAzip home page. , 2010. http://www.ics.uci.edu/~xhx/project/DNAzip.
    • (2010)
    • Christley, S.1    Lu, Y.2    Li, C.3    Xie, X.4
  • 214
    • 84876696692 scopus 로고    scopus 로고
    • G-SQZ home page.
    • W. Tembe, J. Lowey, E. Suh, G-SQZ home page. , 2010. http://public.tgen.org/sqz.
    • (2010)
    • Tembe, W.1    Lowey, J.2    Suh, E.3
  • 215
    • 84876721029 scopus 로고    scopus 로고
    • CDNA home page.
    • D. Loewenstern, P.N. Yianilos, CDNA home page. , 1999. http://pnylab.com/pny/software/cdna/main.html.
    • (1999)
    • Loewenstern, D.1    Yianilos, P.N.2
  • 222
    • 84876737895 scopus 로고    scopus 로고
    • É. Rivals, The trasformation distance home page.
    • J.-S. Varré, J.P. Delahaye, É. Rivals, The trasformation distance home page. , 1999. http://www.lifl.fr/~varre/TD/td.html.
    • (1999)
    • Varré, J.-S.1    Delahaye, J.P.2
  • 223
  • 224
    • 84876728846 scopus 로고    scopus 로고
    • Speeding up HMM decoding and training by exploiting sequence repetitions.
    • Y. Lifshits, S. Mozes, O. Weimann, M. Ziv-Ukelson, Speeding up HMM decoding and training by exploiting sequence repetitions. , 2008. http://www.cs.brown.edu/~shay/hmmspeedup/hmmspeedup.html.
    • (2008)
    • Lifshits, Y.1    Mozes, S.2    Weimann, O.3    Ziv-Ukelson, M.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.