메뉴 건너뛰기




Volumn 276, Issue 1, 2011, Pages 174-180

Numerical characteristics of word frequencies and their application to dissimilarity measure for sequence comparison

Author keywords

Phylogenetic analysis; Regulatory sequence; Word expectation; Word frequency; Word variance

Indexed keywords

MITOCHONDRIAL DNA;

EID: 79951945251     PISSN: 00225193     EISSN: 10958541     Source Type: Journal    
DOI: 10.1016/j.jtbi.2011.02.005     Document Type: Article
Times cited : (25)

References (49)
  • 2
    • 58149496557 scopus 로고    scopus 로고
    • Fast algorithms for computing sequence distances by exhaustive substring composition
    • Apostolico A., Denas O. Fast algorithms for computing sequence distances by exhaustive substring composition. Algorithms Mol. Biol. 2008, 3:13.
    • (2008) Algorithms Mol. Biol. , vol.3 , pp. 13
    • Apostolico, A.1    Denas, O.2
  • 3
    • 0022743812 scopus 로고
    • A measure of the similarity of sets of sequences not requiring sequence alignment
    • Blaisdell B.E. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc. Natl. Acad. Sci. USA 1986, 83:5155-5159.
    • (1986) Proc. Natl. Acad. Sci. USA , vol.83 , pp. 5155-5159
    • Blaisdell, B.E.1
  • 4
    • 0031191630 scopus 로고    scopus 로고
    • The use of the area under the ROC curve in the evaluation of machine learning algorithms
    • Bradley A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 1997, 30:1145-1159.
    • (1997) Pattern Recognition , vol.30 , pp. 1145-1159
    • Bradley, A.P.1
  • 5
    • 0031694349 scopus 로고    scopus 로고
    • Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders
    • Cao Y., et al. Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J. Mol. Evol. 1998, 47:307-322.
    • (1998) J. Mol. Evol. , vol.47 , pp. 307-322
    • Cao, Y.1
  • 6
    • 33750061914 scopus 로고    scopus 로고
    • A novel 2D graphical representation of DNA sequences and its application
    • Dai Q., Liu X.Q., Wang T.M. A novel 2D graphical representation of DNA sequences and its application. J. Mol. Graphics Modell. 2006, 25:340-344.
    • (2006) J. Mol. Graphics Modell. , vol.25 , pp. 340-344
    • Dai, Q.1    Liu, X.Q.2    Wang, T.M.3
  • 7
    • 34248351071 scopus 로고    scopus 로고
    • Linear regression model of DNA sequences and its application
    • Dai Q., Liu X.Q., Wang T.M., Damir V. Linear regression model of DNA sequences and its application. J. Comput. Chem. 2007, 28:1434-1445.
    • (2007) J. Comput. Chem. , vol.28 , pp. 1434-1445
    • Dai, Q.1    Liu, X.Q.2    Wang, T.M.3    Damir, V.4
  • 9
    • 33846698069 scopus 로고    scopus 로고
    • Complementary intron sequence motifs associated with human exon repetition: a role for intragenic, inter-transcript interactions in gene expression
    • Dixon R.J., Eperon I.C., Samani N.J. Complementary intron sequence motifs associated with human exon repetition: a role for intragenic, inter-transcript interactions in gene expression. Bioinformatics 2007, 23:150-155.
    • (2007) Bioinformatics , vol.23 , pp. 150-155
    • Dixon, R.J.1    Eperon, I.C.2    Samani, N.J.3
  • 10
    • 75849146491 scopus 로고    scopus 로고
    • Efficient estimation of pairwise distances between genomes
    • Domazet-Loso M., Haubold B. Efficient estimation of pairwise distances between genomes. Bioinformatics 2009, 25:3221-3227.
    • (2009) Bioinformatics , vol.25 , pp. 3221-3227
    • Domazet-Loso, M.1    Haubold, B.2
  • 13
    • 0000122573 scopus 로고
    • PHYLIP-Phylogeny inference package (version 3.2)
    • Felsenstein J. PHYLIP-Phylogeny inference package (version 3.2). Cladistics 1989, 5:164-166.
    • (1989) Cladistics , vol.5 , pp. 164-166
    • Felsenstein, J.1
  • 14
    • 0029901637 scopus 로고    scopus 로고
    • Inferring phylogenies from protein sequences by parsimony, distance and likelihood methods
    • Felsenstein J. Inferring phylogenies from protein sequences by parsimony, distance and likelihood methods. Methods Enzymol. 1996, 266:418-427.
    • (1996) Methods Enzymol. , vol.266 , pp. 418-427
    • Felsenstein, J.1
  • 15
    • 0023450024 scopus 로고
    • Statistical method for predicting protein coding regions in nucleic acid sequences
    • Fichant G., Gautier C. Statistical method for predicting protein coding regions in nucleic acid sequences. Comput. Appl. Biosci. 1987, 3:287-295.
    • (1987) Comput. Appl. Biosci. , vol.3 , pp. 287-295
    • Fichant, G.1    Gautier, C.2
  • 16
    • 0345983651 scopus 로고    scopus 로고
    • Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison
    • Green R.E., Brenner S.E. Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison. Proc. IEEE. 2002, 90:1834-1847.
    • (2002) Proc. IEEE. , vol.90 , pp. 1834-1847
    • Green, R.E.1    Brenner, S.E.2
  • 17
    • 3242889314 scopus 로고    scopus 로고
    • Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance
    • Hao B., Qi J. Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance. J. Bioinform. Comput. Biol. 2004, 2:1-19.
    • (2004) J. Bioinform. Comput. Biol. , vol.2 , pp. 1-19
    • Hao, B.1    Qi, J.2
  • 19
    • 0034849408 scopus 로고    scopus 로고
    • MRBAYES: Bayesian inference of phylogenetic trees
    • Huelsenbeck J.P., Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 2001, 17:754-755.
    • (2001) Bioinformatics , vol.17 , pp. 754-755
    • Huelsenbeck, J.P.1    Ronquist, F.2
  • 20
    • 34547844142 scopus 로고    scopus 로고
    • A statistical method for alignment-free comparison of regulatory sequences
    • Kantorovitz M.R., Robinson G.E., Sinha S. A statistical method for alignment-free comparison of regulatory sequences. Bioinformatics 2007, 23:i249-i255.
    • (2007) Bioinformatics , vol.23
    • Kantorovitz, M.R.1    Robinson, G.E.2    Sinha, S.3
  • 21
    • 3242810318 scopus 로고    scopus 로고
    • MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment
    • Kumar S., Tamura K., Nei M. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Briefings Bioinform. 2004, 5:150-163.
    • (2004) Briefings Bioinform. , vol.5 , pp. 150-163
    • Kumar, S.1    Tamura, K.2    Nei, M.3
  • 22
    • 0035102453 scopus 로고    scopus 로고
    • An information-based sequence distance and its application to whole mitochondrial genome phylogeny
    • Li M., Badger J.H., Chen X., Kwong S., Kearney P., Zhang H. An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 2001, 17:149-154.
    • (2001) Bioinformatics , vol.17 , pp. 149-154
    • Li, M.1    Badger, J.H.2    Chen, X.3    Kwong, S.4    Kearney, P.5    Zhang, H.6
  • 23
    • 1842525340 scopus 로고    scopus 로고
    • Analysis of similarity/dissimilarity of DNA sequences based on 3-D graphical representation
    • Liao B., Wang T.M. Analysis of similarity/dissimilarity of DNA sequences based on 3-D graphical representation. Chem. Phys. Lett. 2004, 388:195-200.
    • (2004) Chem. Phys. Lett. , vol.388 , pp. 195-200
    • Liao, B.1    Wang, T.M.2
  • 24
    • 12344296751 scopus 로고    scopus 로고
    • 4D representation of DNA sequences and its application
    • Liao B., Tan M.S., Ding K.Q. 4D representation of DNA sequences and its application. Chem. Phys. Lett. 2005, 402:380-383.
    • (2005) Chem. Phys. Lett. , vol.402 , pp. 380-383
    • Liao, B.1    Tan, M.S.2    Ding, K.Q.3
  • 25
    • 0037195172 scopus 로고    scopus 로고
    • Distributional regimes for the number of k-word matches between two random sequences
    • Lippert R.A., Huang H.Y., Waterman M.S. Distributional regimes for the number of k-word matches between two random sequences. Proc. Natl. Acad. Sci. USA 2002, 99:13980-13989.
    • (2002) Proc. Natl. Acad. Sci. USA , vol.99 , pp. 13980-13989
    • Lippert, R.A.1    Huang, H.Y.2    Waterman, M.S.3
  • 26
    • 33750835926 scopus 로고    scopus 로고
    • PNN-curve: a new 2D graphical representation of DNA sequences and its application
    • Liu X.Q., Dai Q., Xiu Z.L., Wang T.M. PNN-curve: a new 2D graphical representation of DNA sequences and its application. J. Theor. Biol. 2006, 243:555-561.
    • (2006) J. Theor. Biol. , vol.243 , pp. 555-561
    • Liu, X.Q.1    Dai, Q.2    Xiu, Z.L.3    Wang, T.M.4
  • 27
    • 39549123618 scopus 로고    scopus 로고
    • A novel feature-based method for whole genome phylogenetic analysis without alignment: application to HEV genotyping and subtyping
    • Liu Z., Meng J., Sun X. A novel feature-based method for whole genome phylogenetic analysis without alignment: application to HEV genotyping and subtyping. Biochem. Biophys. Res. Commun. 2008, 368:223-230.
    • (2008) Biochem. Biophys. Res. Commun. , vol.368 , pp. 223-230
    • Liu, Z.1    Meng, J.2    Sun, X.3
  • 28
    • 47249127227 scopus 로고    scopus 로고
    • An improved string composition method for sequence comparison
    • Lu G.Q., Zhang S.P., Fang X. An improved string composition method for sequence comparison. BMC Bioinform. 2008, 9(Suppl. 6):S15.
    • (2008) BMC Bioinform. , vol.9 , Issue.SUPPL. 6
    • Lu, G.Q.1    Zhang, S.P.2    Fang, X.3
  • 29
    • 31144477556 scopus 로고    scopus 로고
    • Phylogenetic analysis of global hepatitis E virus sequences: genetic diversity, subtypes and zoonosis
    • Lu L., Li C., Hagedorn C.H. Phylogenetic analysis of global hepatitis E virus sequences: genetic diversity, subtypes and zoonosis. Rev. Med. Virol. 2006, 16:5-36.
    • (2006) Rev. Med. Virol. , vol.16 , pp. 5-36
    • Lu, L.1    Li, C.2    Hagedorn, C.H.3
  • 30
    • 4444270864 scopus 로고    scopus 로고
    • Cluster-C: an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques
    • Mohseni-Zadeh S., Brezellec P., Risler J.L. Cluster-C: an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques. Comput. Biol. Chem. 2004, 28:211-218.
    • (2004) Comput. Biol. Chem. , vol.28 , pp. 211-218
    • Mohseni-Zadeh, S.1    Brezellec, P.2    Risler, J.L.3
  • 31
    • 0011479281 scopus 로고
    • Graphical analysis of DNA sequence structure: II. Relative abundances of nucleotides in DNAs, gene evolution and duplication
    • Nandy A., Nandy P. Graphical analysis of DNA sequence structure: II. Relative abundances of nucleotides in DNAs, gene evolution and duplication. Curr. Sci. 1995, 68:75-85.
    • (1995) Curr. Sci. , vol.68 , pp. 75-85
    • Nandy, A.1    Nandy, P.2
  • 32
    • 0037435480 scopus 로고    scopus 로고
    • On the uniqueness of quantitative DNA difference descriptors in 2D graphical representation models
    • Nandy A., Nandy P. On the uniqueness of quantitative DNA difference descriptors in 2D graphical representation models. Chem. Phys. Lett. 2003, 368:102-107.
    • (2003) Chem. Phys. Lett. , vol.368 , pp. 102-107
    • Nandy, A.1    Nandy, P.2
  • 33
    • 0242643741 scopus 로고    scopus 로고
    • A new sequence distance measure for phylogenetic tree construction
    • Otu H.H., Sayood K. A new sequence distance measure for phylogenetic tree construction. Bioinformatics 2003, 19:2122-2130.
    • (2003) Bioinformatics , vol.19 , pp. 2122-2130
    • Otu, H.H.1    Sayood, K.2
  • 34
    • 12344295510 scopus 로고    scopus 로고
    • A probabilistic measure for alignment-free sequence comparison
    • Pham T.D., Zuegg J. A probabilistic measure for alignment-free sequence comparison. Bioinformatics 2004, 20:3455-3461.
    • (2004) Bioinformatics , vol.20 , pp. 3455-3461
    • Pham, T.D.1    Zuegg, J.2
  • 35
    • 33750358065 scopus 로고    scopus 로고
    • Spectral distortion measures for biological sequence comparisons and database searching
    • Pham T.D. Spectral distortion measures for biological sequence comparisons and database searching. Pattern Recognition 2007, 40:516-529.
    • (2007) Pattern Recognition , vol.40 , pp. 516-529
    • Pham, T.D.1
  • 37
    • 0034186712 scopus 로고    scopus 로고
    • On the similarity of DNA primary sequences
    • Randic M., Vrakoc M. On the similarity of DNA primary sequences. J. Chem. Inf. Comput. Sci. 2000, 40:599-606.
    • (2000) J. Chem. Inf. Comput. Sci. , vol.40 , pp. 599-606
    • Randic, M.1    Vrakoc, M.2
  • 38
    • 0037363676 scopus 로고    scopus 로고
    • A four-dimensional representation of DNA primary sequences
    • Randic M., Balaban A.T. A four-dimensional representation of DNA primary sequences. J. Chem. Inf. Comput. Sci. 2003, 43:532-539.
    • (2003) J. Chem. Inf. Comput. Sci. , vol.43 , pp. 532-539
    • Randic, M.1    Balaban, A.T.2
  • 39
    • 1442336331 scopus 로고    scopus 로고
    • Graphical representations of DNA as 2-D map
    • Randic M. Graphical representations of DNA as 2-D map. Chem. Phys. Lett. 2004, 386:468-471.
    • (2004) Chem. Phys. Lett. , vol.386 , pp. 468-471
    • Randic, M.1
  • 40
    • 0034125366 scopus 로고    scopus 로고
    • Probabilistic and statistical properties of words: an overview
    • Reinert G., Schbath S., Waterman M.S. Probabilistic and statistical properties of words: an overview. J. Comput. Biol. 2000, 7:1-46.
    • (2000) J. Comput. Biol. , vol.7 , pp. 1-46
    • Reinert, G.1    Schbath, S.2    Waterman, M.S.3
  • 41
    • 0033238297 scopus 로고    scopus 로고
    • Exact distribution of word occurrences in a random sequence of letters
    • Robin S., Daudin J.J. Exact distribution of word occurrences in a random sequence of letters. J. Appl. Probab. 1999, 36:179-193.
    • (1999) J. Appl. Probab. , vol.36 , pp. 179-193
    • Robin, S.1    Daudin, J.J.2
  • 42
    • 0041386108 scopus 로고    scopus 로고
    • MrBayes 3: Bayesian phylogenetic inference under mixed models
    • Ronquist F., Huelsenbeck J.P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 2003, 19:1572-1574.
    • (2003) Bioinformatics , vol.19 , pp. 1572-1574
    • Ronquist, F.1    Huelsenbeck, J.P.2
  • 43
    • 0034048881 scopus 로고    scopus 로고
    • An overview on the distribution of word counts in Markov chains
    • Schbath S. An overview on the distribution of word counts in Markov chains. J. Comput. Biol. 2000, 7:193-201.
    • (2000) J. Comput. Biol. , vol.7 , pp. 193-201
    • Schbath, S.1
  • 44
    • 0036166508 scopus 로고    scopus 로고
    • Integrated gene and species phylogenies from unaligned whole genome protein sequences
    • Stuart G.W., Moffett K., Baker S. Integrated gene and species phylogenies from unaligned whole genome protein sequences. Bioinformatics 2002, 18:100-108.
    • (2002) Bioinformatics , vol.18 , pp. 100-108
    • Stuart, G.W.1    Moffett, K.2    Baker, S.3
  • 45
    • 0037342499 scopus 로고    scopus 로고
    • Alignment-free sequence comparison-a review
    • Vinga S., Almeida J. Alignment-free sequence comparison-a review. Bioinformatics 2003, 19:513-523.
    • (2003) Bioinformatics , vol.19 , pp. 513-523
    • Vinga, S.1    Almeida, J.2
  • 46
    • 0035752767 scopus 로고    scopus 로고
    • A phylogenetic foundation for comparative mammalian genomics
    • Waddell P.J., Kishino H., Ota R. A phylogenetic foundation for comparative mammalian genomics. Genome Inform. Ser. 2001, 12:141-154.
    • (2001) Genome Inform. Ser. , vol.12 , pp. 141-154
    • Waddell, P.J.1    Kishino, H.2    Ota, R.3
  • 48
    • 33748796945 scopus 로고    scopus 로고
    • Phylogenetic analysis using complete signature information of whole genomes and clustered neighbour-joining method
    • Wu X., Wan X., Wu G., Xu D., Lin G. Phylogenetic analysis using complete signature information of whole genomes and clustered neighbour-joining method. Int. J. Bioinform. Res. Appl. 2006, 2:219-248.
    • (2006) Int. J. Bioinform. Res. Appl. , vol.2 , pp. 219-248
    • Wu, X.1    Wan, X.2    Wu, G.3    Xu, D.4    Lin, G.5
  • 49
    • 0035013276 scopus 로고    scopus 로고
    • Statistical measures of DNA dissimilarity under Markov chain models of base composition
    • Wu T.J., Hsieh Y.C., Li L.A. Statistical measures of DNA dissimilarity under Markov chain models of base composition. Biometrics 2001, 57:441-448.
    • (2001) Biometrics , vol.57 , pp. 441-448
    • Wu, T.J.1    Hsieh, Y.C.2    Li, L.A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.