메뉴 건너뛰기




Volumn 6, Issue 11, 2011, Pages

Integrating overlapping structures and background information of words significantly improves biological sequence comparison

Author keywords

[No Author keywords available]

Indexed keywords

ARTICLE; EXON; GENE CLUSTER; GENE SEQUENCE; GENOTYPE; HEPATITIS E VIRUS; INTRON; RECEIVER OPERATING CHARACTERISTIC; SEQUENCE ANALYSIS; STATISTICAL ANALYSIS; STATISTICAL MODEL; ALGORITHM; BIOLOGY; GENETICS; HUMAN; MATHEMATICAL COMPUTING; REGULATORY SEQUENCE; SEQUENCE ALIGNMENT;

EID: 80755176782     PISSN: None     EISSN: 19326203     Source Type: Journal    
DOI: 10.1371/journal.pone.0026779     Document Type: Article
Times cited : (7)

References (61)
  • 1
    • 0030801002 scopus 로고    scopus 로고
    • Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402.
    • (1997) Nucleic Acids Res , vol.25 , pp. 3389-3402
    • Altschul, S.F.1    Madden, T.L.2    Schaffer, A.A.3    Zhang, J.4    Zhang, Z.5
  • 2
    • 12344295510 scopus 로고    scopus 로고
    • A probabilistic measure for alignment-free sequence comparison
    • Pham TD, Zuegg J, (2004) A probabilistic measure for alignment-free sequence comparison. Bioinformatics 20: 3455-3461.
    • (2004) Bioinformatics , vol.20 , pp. 3455-3461
    • Pham, T.D.1    Zuegg, J.2
  • 3
    • 33750358065 scopus 로고    scopus 로고
    • Spectral distortion measures for biological sequence comparisons and database searching
    • Pham TD, (2007) Spectral distortion measures for biological sequence comparisons and database searching. Pattern Recog 40: 516-529.
    • (2007) Pattern Recog , vol.40 , pp. 516-529
    • Pham, T.D.1
  • 4
    • 48249137399 scopus 로고    scopus 로고
    • Similarity Queries for Temporal Toxicogenomic Expression Profiles
    • Smith AA, Vollrath A, Bradfield CA, Craven M, (2008) Similarity Queries for Temporal Toxicogenomic Expression Profiles. PLoS Comput Biol 4 (7): e1000116.
    • (2008) PLoS Comput Biol , vol.4 , Issue.7
    • Smith, A.A.1    Vollrath, A.2    Bradfield, C.A.3    Craven, M.4
  • 5
    • 53749097779 scopus 로고    scopus 로고
    • Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison
    • Dai Q, Yang YC, Wang TM, (2008) Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison. Bioinformatics 24: 2296-2302.
    • (2008) Bioinformatics , vol.24 , pp. 2296-2302
    • Dai, Q.1    Yang, Y.C.2    Wang, T.M.3
  • 6
    • 34547844142 scopus 로고    scopus 로고
    • A statistical method for alignment-free comparison of regulatory sequences
    • Kantorovitz MR, Robinson GE, Sinha S, (2007) A statistical method for alignment-free comparison of regulatory sequences. Bioinformatics 23: i249-i255.
    • (2007) Bioinformatics , vol.23
    • Kantorovitz, M.R.1    Robinson, G.E.2    Sinha, S.3
  • 7
    • 1342309208 scopus 로고    scopus 로고
    • Metrics for comparing regulatory sequences on the basis of pattern counts
    • Van Helden J, (2004) Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics 20: 399-406.
    • (2004) Bioinformatics , vol.20 , pp. 399-406
    • Van Helden, J.1
  • 8
    • 36949006682 scopus 로고    scopus 로고
    • MORPH: Probabilistic alignment combined with hidden Markov models of cis-regulatory modules
    • Sinha S, He X, (2007) MORPH: Probabilistic alignment combined with hidden Markov models of cis-regulatory modules. PLoS Comput Biol 3 (11): e216.
    • (2007) PLoS Comput Biol , vol.3 , Issue.11
    • Sinha, S.1    He, X.2
  • 9
    • 0029901637 scopus 로고    scopus 로고
    • Inferring phylogenies from protein sequences by parsimony, distance and like-lihood methods
    • Felsenstein J, (1996) Inferring phylogenies from protein sequences by parsimony, distance and like-lihood methods. Meth Enzymol 266: 418-427.
    • (1996) Meth Enzymol , vol.266 , pp. 418-427
    • Felsenstein, J.1
  • 10
    • 0034849408 scopus 로고    scopus 로고
    • MRBAYES: Bayesian inference of phylogenetic trees
    • Huelsenbeck JP, Ronquist F, (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755.
    • (2001) Bioinformatics , vol.17 , pp. 754-755
    • Huelsenbeck, J.P.1    Ronquist, F.2
  • 11
    • 3242810318 scopus 로고    scopus 로고
    • MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment
    • Kumar S, Tamura K, Nei M, (2004) MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Briefing Bioinform 5: 150-163.
    • (2004) Briefing Bioinform , vol.5 , pp. 150-163
    • Kumar, S.1    Tamura, K.2    Nei, M.3
  • 12
    • 0035102453 scopus 로고    scopus 로고
    • An information-based sequence distance and its application to whole mitochondrial genome phylogeny
    • Li M, Badger JH, Chen X, Kwong S, Kearney P, et al. (2001) An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17: 149-154.
    • (2001) Bioinformatics , vol.17 , pp. 149-154
    • Li, M.1    Badger, J.H.2    Chen, X.3    Kwong, S.4    Kearney, P.5
  • 13
    • 0242643741 scopus 로고    scopus 로고
    • A new sequence distance measure for phylogenetic tree construction
    • Otu HH, Sayood K, (2003) A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19: 2122-2130.
    • (2003) Bioinformatics , vol.19 , pp. 2122-2130
    • Otu, H.H.1    Sayood, K.2
  • 14
    • 0041386108 scopus 로고    scopus 로고
    • MrBayes 3: Bayesian phylogenetic inference under mixed models
    • Ronquist F, Huelsenbeck JP, (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572-1574.
    • (2003) Bioinformatics , vol.19 , pp. 1572-1574
    • Ronquist, F.1    Huelsenbeck, J.P.2
  • 16
    • 33846537828 scopus 로고    scopus 로고
    • Using phylogeny to improve genome-wide distant homology recognition
    • Abeln S, Teubner C, Deane CM, (2007) Using phylogeny to improve genome-wide distant homology recognition. PLoS Comput Biol 3 (1): e3.
    • (2007) PLoS Comput Biol , vol.3 , Issue.1
    • Abeln, S.1    Teubner, C.2    Deane, C.M.3
  • 17
    • 52949150097 scopus 로고    scopus 로고
    • Probabilistic Phylogenetic Inference with Insertions and Deletions
    • Rivas E, Eddy SR, (2008) Probabilistic Phylogenetic Inference with Insertions and Deletions. PLoS Comput Biol 4 (9): e1000172.
    • (2008) PLoS Comput Biol , vol.4 , Issue.9
    • Rivas, E.1    Eddy, S.R.2
  • 18
    • 59149089414 scopus 로고    scopus 로고
    • Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling
    • Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A, (2009) Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling. PLoS Comput Biol 5 (1): e1000267.
    • (2009) PLoS Comput Biol , vol.5 , Issue.1
    • Engelen, S.1    Trojan, L.A.2    Sacquin-Mora, S.3    Lavery, R.4    Carbone, A.5
  • 19
    • 0034945722 scopus 로고    scopus 로고
    • Phylogenetic analysis based on 18S rRNA gene and matK gene sequences of Panax vietnamensis and five related species
    • Komatsu K, Zhu S, Fushimi H, Qui TK, Cai S, et al. (2001) Phylogenetic analysis based on 18S rRNA gene and matK gene sequences of Panax vietnamensis and five related species. Planta Med 67: 461-465.
    • (2001) Planta Med , vol.67 , pp. 461-465
    • Komatsu, K.1    Zhu, S.2    Fushimi, H.3    Qui, T.K.4    Cai, S.5
  • 20
    • 4444270864 scopus 로고    scopus 로고
    • Cluster-C: an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques
    • Mohseni-Zadeh S, Brezellec P, Risler JL, (2004) Cluster-C: an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques. Comput Biol Chem 28: 211-218.
    • (2004) Comput Biol Chem , vol.28 , pp. 211-218
    • Mohseni-Zadeh, S.1    Brezellec, P.2    Risler, J.L.3
  • 26
    • 0037342499 scopus 로고    scopus 로고
    • Alignment-free sequence comparison-a review
    • Vinga S, Almeida J, (2003) Alignment-free sequence comparison-a review. Bioinformatics 19: 513-523.
    • (2003) Bioinformatics , vol.19 , pp. 513-523
    • Vinga, S.1    Almeida, J.2
  • 27
    • 0022743812 scopus 로고
    • Ameasure of the similarity of sets of sequences not requiring sequence alignment
    • Blaisdell BE, (1986) Ameasure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci USA 83: 5155-5159.
    • (1986) Proc Natl Acad Sci USA , vol.83 , pp. 5155-5159
    • Blaisdell, B.E.1
  • 28
    • 0031437248 scopus 로고    scopus 로고
    • A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words
    • Wu TJ, Burke JP, Davison DB, (1997) A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words. Biometrics 53: 1431-1439.
    • (1997) Biometrics , vol.53 , pp. 1431-1439
    • Wu, T.J.1    Burke, J.P.2    Davison, D.B.3
  • 29
    • 0035013276 scopus 로고    scopus 로고
    • Statistical measures of DNA dissimilarity under Markov chain models of base composition
    • Wu TJ, Hsieh YC, Li LA, (2001) Statistical measures of DNA dissimilarity under Markov chain models of base composition. Biometrics 57: 441-448.
    • (2001) Biometrics , vol.57 , pp. 441-448
    • Wu, T.J.1    Hsieh, Y.C.2    Li, L.A.3
  • 30
    • 33646005790 scopus 로고    scopus 로고
    • The average common substring approach to phylogenomic reconstruction
    • Ulitsky I, Burstein D, Tuller T, Chor B, (2006) The average common substring approach to phylogenomic reconstruction. J Comput Biol 13: 336-350.
    • (2006) J Comput Biol , vol.13 , pp. 336-350
    • Ulitsky, I.1    Burstein, D.2    Tuller, T.3    Chor, B.4
  • 31
    • 0036166508 scopus 로고    scopus 로고
    • Integrated gene and species phylogenies from unaligned whole genome protein sequences
    • Stuart GW, Moffett K, Baker S, (2002) Integrated gene and species phylogenies from unaligned whole genome protein sequences. Bioinformatics 18: 100-108.
    • (2002) Bioinformatics , vol.18 , pp. 100-108
    • Stuart, G.W.1    Moffett, K.2    Baker, S.3
  • 32
    • 0023450024 scopus 로고
    • Statistical method for predicting protein coding regions in nucleic acid sequences
    • Fichant G, Gautier C, (1987) Statistical method for predicting protein coding regions in nucleic acid sequences. Comput Appl Biosci 3: 287-295.
    • (1987) Comput Appl Biosci , vol.3 , pp. 287-295
    • Fichant, G.1    Gautier, C.2
  • 33
    • 3242889314 scopus 로고    scopus 로고
    • Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance
    • Hao B, Qi J, (2004) Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance. J Bioinform Comput Biol 2: 1-19.
    • (2004) J Bioinform Comput Biol , vol.2 , pp. 1-19
    • Hao, B.1    Qi, J.2
  • 34
    • 33748796945 scopus 로고    scopus 로고
    • Phylogenetic analysis using complete signature information of whole genomes and clustered Neighbour-Joining method
    • Wu X, Wan X, Wu G, Xu D, Lin G, (2006) Phylogenetic analysis using complete signature information of whole genomes and clustered Neighbour-Joining method. Int J Bioinform Res Appl 2: 219-248.
    • (2006) Int J Bioinform Res Appl , vol.2 , pp. 219-248
    • Wu, X.1    Wan, X.2    Wu, G.3    Xu, D.4    Lin, G.5
  • 35
    • 58149496557 scopus 로고    scopus 로고
    • Fast algorithms for computing sequence distances by exhaustive substring composition
    • Apostolico A, Denas O, (2008) Fast algorithms for computing sequence distances by exhaustive substring composition. Algorithms Mol Biol 3: 13.
    • (2008) Algorithms Mol Biol , vol.3 , pp. 13
    • Apostolico, A.1    Denas, O.2
  • 36
    • 47249127227 scopus 로고    scopus 로고
    • An improved string composition method for sequence comparison
    • Lu GQ, Zhang SP, Fang X, (2008) An improved string composition method for sequence comparison. BMC Bioinformatics 9 (Suppl 6): S15.
    • (2008) BMC Bioinformatics , vol.9 , Issue.SUPPL. 6
    • Lu, G.Q.1    Zhang, S.P.2    Fang, X.3
  • 37
    • 0042831182 scopus 로고    scopus 로고
    • Evolutionarily conserved pattern of gene segment usage within the mammalian TCRbeta locus
    • Livak F, (2003) Evolutionarily conserved pattern of gene segment usage within the mammalian TCRbeta locus. Immunogenetics 55: 307-314.
    • (2003) Immunogenetics , vol.55 , pp. 307-314
    • Livak, F.1
  • 38
    • 33846698069 scopus 로고    scopus 로고
    • Complementary intron sequence motifs associated with human exon repetition: a role for intragenic, inter-transcript interactions in gene expression
    • Dixon RJ, Eperon IC, Samani NJ, (2007) Complementary intron sequence motifs associated with human exon repetition: a role for intragenic, inter-transcript interactions in gene expression. Bioinformatics 23: 150-155.
    • (2007) Bioinformatics , vol.23 , pp. 150-155
    • Dixon, R.J.1    Eperon, I.C.2    Samani, N.J.3
  • 39
    • 0030839806 scopus 로고    scopus 로고
    • An E±cient Statistic to Detect Over-and Under-Represented Words in DNA Sequences
    • Schbath S, (1997) An E±cient Statistic to Detect Over-and Under-Represented Words in DNA Sequences. J Comp Biol 4 (2): 189-192.
    • (1997) J Comp Biol , vol.4 , Issue.2 , pp. 189-192
    • Schbath, S.1
  • 40
    • 0034125366 scopus 로고    scopus 로고
    • Probabilistic and statistical properties of words: an overview
    • Reinert G, Schbath S, Waterman MS, (2000) Probabilistic and statistical properties of words: an overview. J Comput Biol 7: 1-46.
    • (2000) J Comput Biol , vol.7 , pp. 1-46
    • Reinert, G.1    Schbath, S.2    Waterman, M.S.3
  • 41
    • 0033238297 scopus 로고    scopus 로고
    • Exact distribution of word occurrences in a random sequence of letters
    • Robin S, Daudin JJ, (1999) Exact distribution of word occurrences in a random sequence of letters. J Appl Prob 36: 179-193.
    • (1999) J Appl Prob , vol.36 , pp. 179-193
    • Robin, S.1    Daudin, J.J.2
  • 42
    • 27944497396 scopus 로고    scopus 로고
    • Sensitivity and convergence of uniformly ergodic Markov chains
    • Mitrophanov AY, (2005) Sensitivity and convergence of uniformly ergodic Markov chains. J Appl Prob 42: 1003-1014.
    • (2005) J Appl Prob , vol.42 , pp. 1003-1014
    • Mitrophanov, A.Y.1
  • 45
    • 0031191630 scopus 로고    scopus 로고
    • The use of the area under the ROC curve in the evaluation of machine learning algorithms
    • Bradley AP, (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recog 30: 1145-1159.
    • (1997) Pattern Recog , vol.30 , pp. 1145-1159
    • Bradley, A.P.1
  • 46
    • 0345983651 scopus 로고    scopus 로고
    • Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison
    • Green RE, Brenner SE, (2002) Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison. Proc IEEE 90: 1834-1847.
    • (2002) Proc IEEE , vol.90 , pp. 1834-1847
    • Green, R.E.1    Brenner, S.E.2
  • 48
    • 32144456548 scopus 로고    scopus 로고
    • REDfly: a Regulatory Element Database for Drosophila
    • Gallo SM, Li L, Hu Z, Halfon MS, (2006) REDfly: a Regulatory Element Database for Drosophila. Bioinformatics 22: 381-383.
    • (2006) Bioinformatics , vol.22 , pp. 381-383
    • Gallo, S.M.1    Li, L.2    Hu, Z.3    Halfon, M.S.4
  • 49
    • 0037195172 scopus 로고    scopus 로고
    • Distributional regimes for the number of k-word matches between two random sequences
    • Lippert RA, Huang HY, Waterman MS, (2002) Distributional regimes for the number of k-word matches between two random sequences. Proc Natl Acad Sci USA 99: 13980C13989.
    • (2002) Proc Natl Acad Sci USA , vol.99
    • Lippert, R.A.1    Huang, H.Y.2    Waterman, M.S.3
  • 51
    • 84929026405 scopus 로고    scopus 로고
    • Classification of short human exons and introns based on statistical features
    • Wu YH, Liew AWC, Yan H, Yang MS, (2003) Classification of short human exons and introns based on statistical features. PHYSICAL REVIEW E 67 (6): 061916.
    • (2003) PHYSICAL REVIEW E , vol.67 , Issue.6 , pp. 061916
    • Wu, Y.H.1    Liew, A.W.C.2    Yan, H.3    Yang, M.S.4
  • 52
    • 39149145162 scopus 로고    scopus 로고
    • Segmentation of short human exons based on spectral features of double curves
    • Jiang R, Yan H, (2008) Segmentation of short human exons based on spectral features of double curves. IJDMB 2 (1): 15-35.
    • (2008) IJDMB , vol.2 , Issue.1 , pp. 15-35
    • Jiang, R.1    Yan, H.2
  • 53
    • 43049175149 scopus 로고    scopus 로고
    • Studies of spectral properties of short genes using the wavelet subspace Hilbert Huang transform(WSHHT)[J]
    • Jiang R, Yan H, (2008) Studies of spectral properties of short genes using the wavelet subspace Hilbert Huang transform(WSHHT)[J]. Physica A 387: 4223-4247.
    • (2008) Physica A , vol.387 , pp. 4223-4247
    • Jiang, R.1    Yan, H.2
  • 54
    • 31144477556 scopus 로고    scopus 로고
    • Phylogenetic analysis of global hepatitis E virus sequences: genetic diversity, subtypes and zoonosis
    • Lu L, Li C, Hagedorn CH, (2006) Phylogenetic analysis of global hepatitis E virus sequences: genetic diversity, subtypes and zoonosis. Rev Med Virol 16: 5-36.
    • (2006) Rev Med Virol , vol.16 , pp. 5-36
    • Lu, L.1    Li, C.2    Hagedorn, C.H.3
  • 55
    • 45549086279 scopus 로고    scopus 로고
    • Molecular characterization and phylogenetic analysis of the complete genome of a hepatitis E virus from European swine
    • Xia HY, Liu LH, Linde AM, Belak S, Norder H, et al. (2008) Molecular characterization and phylogenetic analysis of the complete genome of a hepatitis E virus from European swine. Virus Genes 37: 39C48.
    • (2008) Virus Genes , vol.37
    • Xia, H.Y.1    Liu, L.H.2    Linde, A.M.3    Belak, S.4    Norder, H.5
  • 56
    • 60649116779 scopus 로고    scopus 로고
    • Phylogeny, classification and evolutionary insights into pestiviruses
    • Liu L, Xia H, Wahlberg N, Belok S, Baule C, (2009) Phylogeny, classification and evolutionary insights into pestiviruses. Virology 385: 351C357.
    • (2009) Virology , vol.385
    • Liu, L.1    Xia, H.2    Wahlberg, N.3    Belok, S.4    Baule, C.5
  • 57
    • 77549084603 scopus 로고    scopus 로고
    • Applying phylogenetic analysis to viral livestock diseases: moving beyond molecular typing
    • Olvera A, Busquets N, Cortey M, de Deus N, Ganges L, et al. (2010) Applying phylogenetic analysis to viral livestock diseases: moving beyond molecular typing. Vet J 184 (2): 130-137.
    • (2010) Vet J , vol.184 , Issue.2 , pp. 130-137
    • Olvera, A.1    Busquets, N.2    Cortey, M.3    de Deus, N.4    Ganges, L.5
  • 58
    • 39549123618 scopus 로고    scopus 로고
    • A novel feature-based method for whole genome phylogenetic analysis without alignment: application to HEV genotyping and subtyping
    • Liu Z, Meng J, Sun X, (2008) A novel feature-based method for whole genome phylogenetic analysis without alignment: application to HEV genotyping and subtyping. Biochem Biophys Res Commun 368: 223-30.
    • (2008) Biochem Biophys Res Commun , vol.368 , pp. 223-230
    • Liu, Z.1    Meng, J.2    Sun, X.3
  • 59
    • 25144456056 scopus 로고    scopus 로고
    • Computational Cluster Validation in Post-Genomic Data Analysis
    • Handl J, Knowles J, Kell DB, (2005) Computational Cluster Validation in Post-Genomic Data Analysis. Bioinformatics 21: 3201-3212.
    • (2005) Bioinformatics , vol.21 , pp. 3201-3212
    • Handl, J.1    Knowles, J.2    Kell, D.B.3
  • 60
    • 0000122573 scopus 로고
    • PHYLIP-Phylogeny inference package (version 3.2)
    • Felsenstein J, (1989) PHYLIP-Phylogeny inference package (version 3.2). Cladistics 5: 164-166.
    • (1989) Cladistics , vol.5 , pp. 164-166
    • Felsenstein, J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.