메뉴 건너뛰기




Volumn 13, Issue 1, 2012, Pages

A novel hierarchical clustering algorithm for gene sequences

Author keywords

[No Author keywords available]

Indexed keywords

ALIGNMENT-FREE; BIOLOGICAL CHARACTERISTIC; CLUSTERING METHODS; DISTANCE MEASURE; FEATURE VECTORS; GENE SEQUENCES; HIERARCHICAL CLUSTERING ALGORITHMS; ORDER RELATION; PHYLOGENETIC ANALYSIS; SEQUENCE SIMILARITY;

EID: 84866326775     PISSN: None     EISSN: 14712105     Source Type: Journal    
DOI: 10.1186/1471-2105-13-174     Document Type: Article
Times cited : (84)

References (58)
  • 2
    • 79953701359 scopus 로고    scopus 로고
    • A novel clustering method via nucleotide-based Fourier power spectrum analysis
    • Zhao B, Duan V, Yau SS. A novel clustering method via nucleotide-based Fourier power spectrum analysis. JTheor Biol 2011, 279:83-89.
    • (2011) JTheor Biol , vol.279 , pp. 83-89
    • Zhao, B.1    Duan, V.2    Yau, S.S.3
  • 4
    • 0023989064 scopus 로고
    • Improved tools for biological sequence comparison
    • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. ProcNatlAcad Sci USA 1988, 85(8):2444-2488.
    • (1988) ProcNatlAcad Sci USA , vol.85 , Issue.8 , pp. 2444-2488
    • Pearson, W.R.1    Lipman, D.J.2
  • 5
    • 0037342499 scopus 로고    scopus 로고
    • Alignment-free sequence comparison-a review
    • 10.1093/bioinformatics/btg005, 12611807
    • Vinga S, Almeida J. Alignment-free sequence comparison-a review. Bioinformatics 2003, 19(4):513-523. 10.1093/bioinformatics/btg005, 12611807.
    • (2003) Bioinformatics , vol.19 , Issue.4 , pp. 513-523
    • Vinga, S.1    Almeida, J.2
  • 6
    • 79951527464 scopus 로고    scopus 로고
    • Alignment-free estimation of nucleotide diversity
    • 10.1093/bioinformatics/btq689, 21156730
    • Haubold B, Reed FA, Pfaffelhuber P. Alignment-free estimation of nucleotide diversity. Bioinformatics 2011, 27(4):449-455. 10.1093/bioinformatics/btq689, 21156730.
    • (2011) Bioinformatics , vol.27 , Issue.4 , pp. 449-455
    • Haubold, B.1    Reed, F.A.2    Pfaffelhuber, P.3
  • 7
    • 39549123618 scopus 로고    scopus 로고
    • A novel feature-based method for whole genome phylogenetic analysis without alignment: application to HEV genotyping and subtyping
    • 10.1016/j.bbrc.2008.01.070, 18230342
    • Liu Z, Meng J, Sun X. A novel feature-based method for whole genome phylogenetic analysis without alignment: application to HEV genotyping and subtyping. Biochem Biophys Res Commun 2008, 368(2):223-230. 10.1016/j.bbrc.2008.01.070, 18230342.
    • (2008) Biochem Biophys Res Commun , vol.368 , Issue.2 , pp. 223-230
    • Liu, Z.1    Meng, J.2    Sun, X.3
  • 8
    • 75849146491 scopus 로고    scopus 로고
    • Efficient estimation of pairwise distances between genomes
    • 10.1093/bioinformatics/btp590, 19825795
    • Domazet-Loso M, Haubold B. Efficient estimation of pairwise distances between genomes. Bioinformatics 2009, 25(24):3221-3227. 10.1093/bioinformatics/btp590, 19825795.
    • (2009) Bioinformatics , vol.25 , Issue.24 , pp. 3221-3227
    • Domazet-Loso, M.1    Haubold, B.2
  • 9
    • 79957873906 scopus 로고    scopus 로고
    • Alignment-free detection of local similarity among viral and bacterial genomes
    • 10.1093/bioinformatics/btr176, 21471011
    • Domazet-Loso M, Haubold B. Alignment-free detection of local similarity among viral and bacterial genomes. Bioinformatics 2011, 27(11):1466-1472. 10.1093/bioinformatics/btr176, 21471011.
    • (2011) Bioinformatics , vol.27 , Issue.11 , pp. 1466-1472
    • Domazet-Loso, M.1    Haubold, B.2
  • 10
    • 34548605214 scopus 로고    scopus 로고
    • CLUSS: Clustering of protein sequences based on a new similarity measure
    • 10.1186/1471-2105-8-286, 1976428, 17683581
    • Kelil A, Wang S, Brzezinski R, Fleury A. CLUSS: Clustering of protein sequences based on a new similarity measure. BMC Bioinformatics 2007, 8:286. 10.1186/1471-2105-8-286, 1976428, 17683581.
    • (2007) BMC Bioinformatics , vol.8 , pp. 286
    • Kelil, A.1    Wang, S.2    Brzezinski, R.3    Fleury, A.4
  • 11
    • 75149164526 scopus 로고    scopus 로고
    • Alignment-free sequence comparison (I): statistics and power
    • Reinert G, Chew D, Sun FZ, Waterman MS. Alignment-free sequence comparison (I): statistics and power. JComput Biol 2009, 16(12):1615-1634.
    • (2009) JComput Biol , vol.16 , Issue.12 , pp. 1615-1634
    • Reinert, G.1    Chew, D.2    Sun, F.Z.3    Waterman, M.S.4
  • 12
    • 79951945251 scopus 로고    scopus 로고
    • Numerical characteristics of word frequencies and their application to dissimilarity measure for sequence comparison
    • Dai Q, Liu X, Yao Y, Zhao F. Numerical characteristics of word frequencies and their application to dissimilarity measure for sequence comparison. JTheor Biol 2011, 276(1):174-180.
    • (2011) JTheor Biol , vol.276 , Issue.1 , pp. 174-180
    • Dai, Q.1    Liu, X.2    Yao, Y.3    Zhao, F.4
  • 13
    • 47249127227 scopus 로고    scopus 로고
    • An improved string composition method for sequence comparison
    • 10.1186/1471-2105-9-S6-S15, 2638155, 19091014
    • Lu G, Zhang S, Fang X. An improved string composition method for sequence comparison. BMC Bioinformatics 2008, 9(Suppl 6):S15. 10.1186/1471-2105-9-S6-S15, 2638155, 19091014.
    • (2008) BMC Bioinformatics , vol.9 , Issue.SUPPL. 6
    • Lu, G.1    Zhang, S.2    Fang, X.3
  • 14
    • 80054875426 scopus 로고    scopus 로고
    • A mathematical consideration of the word-composition vector method in comparison of biological sequences
    • 10.1016/j.biosystems.2011.06.009, 21745534
    • Aita T, Husimi Y, Nishigaki K. A mathematical consideration of the word-composition vector method in comparison of biological sequences. BioSystems 2011, 106:67-75. 10.1016/j.biosystems.2011.06.009, 21745534.
    • (2011) BioSystems , vol.106 , pp. 67-75
    • Aita, T.1    Husimi, Y.2    Nishigaki, K.3
  • 15
    • 0022743812 scopus 로고
    • A measure of the similarity of sets of sequences not requiring sequence alignment
    • Blaisdell BE. A measure of the similarity of sets of sequences not requiring sequence alignment. ProcNatlAcad Sci USA 1986, 83:5155-5159.
    • (1986) ProcNatlAcad Sci USA , vol.83 , pp. 5155-5159
    • Blaisdell, B.E.1
  • 16
    • 0031437248 scopus 로고    scopus 로고
    • A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words
    • 10.2307/2533509, 9423258
    • Wu TJ, Burke JP, Davison DB. A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words. Biometrics 1997, 53(4):1431-1439. 10.2307/2533509, 9423258.
    • (1997) Biometrics , vol.53 , Issue.4 , pp. 1431-1439
    • Wu, T.J.1    Burke, J.P.2    Davison, D.B.3
  • 17
    • 0035013276 scopus 로고    scopus 로고
    • Statistical measures of DNA dissimilarity under Markov chain models of base composition
    • 10.1111/j.0006-341X.2001.00441.x, 11414568
    • Wu TJ, Hsieh YC, Li LA. Statistical measures of DNA dissimilarity under Markov chain models of base composition. Biometrics 2001, 57(2):441-448. 10.1111/j.0006-341X.2001.00441.x, 11414568.
    • (2001) Biometrics , vol.57 , Issue.2 , pp. 441-448
    • Wu, T.J.1    Hsieh, Y.C.2    Li, L.A.3
  • 18
    • 0036166508 scopus 로고    scopus 로고
    • Integrated gene and species phylogenies from unaligned whole genome protein sequences
    • 10.1093/bioinformatics/18.1.100, 11836217
    • Stuart GW, Moffett K, Baker S. Integrated gene and species phylogenies from unaligned whole genome protein sequences. Bioinformatics 2002, 18(1):100-108. 10.1093/bioinformatics/18.1.100, 11836217.
    • (2002) Bioinformatics , vol.18 , Issue.1 , pp. 100-108
    • Stuart, G.W.1    Moffett, K.2    Baker, S.3
  • 19
    • 0023450024 scopus 로고
    • Statistical method for predicting protein coding regions in nucleic acid sequences
    • Fichant G, Gautier C. Statistical method for predicting protein coding regions in nucleic acid sequences. ComputAppl Biosci 1987, 3(4):287-295.
    • (1987) ComputAppl Biosci , vol.3 , Issue.4 , pp. 287-295
    • Fichant, G.1    Gautier, C.2
  • 20
    • 49849083665 scopus 로고    scopus 로고
    • WSE, a new sequence distance measure based on word frequencies
    • 10.1016/j.mbs.2008.06.001, 18590747
    • Wang J, Zheng X. WSE, a new sequence distance measure based on word frequencies. Math Biosci 2008, 215(1):78-83. 10.1016/j.mbs.2008.06.001, 18590747.
    • (2008) Math Biosci , vol.215 , Issue.1 , pp. 78-83
    • Wang, J.1    Zheng, X.2
  • 21
    • 58649111252 scopus 로고    scopus 로고
    • A Poisson model of sequence comparison and its application to coronavirus phylogeny
    • 10.1016/j.mbs.2008.11.006, 19073197
    • Zheng X, Qin Y, Wang J. A Poisson model of sequence comparison and its application to coronavirus phylogeny. Math Biosci 2009, 217(2):159-166. 10.1016/j.mbs.2008.11.006, 19073197.
    • (2009) Math Biosci , vol.217 , Issue.2 , pp. 159-166
    • Zheng, X.1    Qin, Y.2    Wang, J.3
  • 22
    • 47249148373 scopus 로고    scopus 로고
    • Performance comparison of gene family clustering methods with expect curated gene family data set in Arabidposis thaliana
    • 10.1007/s00425-008-0748-7, 18493791
    • Yang K, Zhang L. Performance comparison of gene family clustering methods with expect curated gene family data set in Arabidposis thaliana. Planta 2008, 228:439-447. 10.1007/s00425-008-0748-7, 18493791.
    • (2008) Planta , vol.228 , pp. 439-447
    • Yang, K.1    Zhang, L.2
  • 23
    • 52649097926 scopus 로고    scopus 로고
    • Classification, clustering, features and distances of sequence Data
    • Dong G, Pei J. Classification, clustering, features and distances of sequence Data. Sequence Data Mining 2007, 33:47-65.
    • (2007) Sequence Data Mining , vol.33 , pp. 47-65
    • Dong, G.1    Pei, J.2
  • 26
    • 46249133773 scopus 로고    scopus 로고
    • Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space
    • 10.1093/bioinformatics/btn174, 2718652, 18586742
    • Loewenstein Y, Portugaly E, Fromer M, Linial M. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. Bioinformatics 2008, 24(13):i41-i49. 10.1093/bioinformatics/btn174, 2718652, 18586742.
    • (2008) Bioinformatics , vol.24 , Issue.13
    • Loewenstein, Y.1    Portugaly, E.2    Fromer, M.3    Linial, M.4
  • 27
    • 84874603964 scopus 로고    scopus 로고
    • National Center for Biotechnology Information (NCBI)
    • National Center for Biotechnology Information (NCBI) Documentation of the BLASTCLUST-algorithm ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html, National Center for Biotechnology Information (NCBI).
    • Documentation of the BLASTCLUST-algorithm
  • 28
    • 0033944826 scopus 로고    scopus 로고
    • GeneRAGE: a robust algorithm for sequence clustering and domain detection
    • 10.1093/bioinformatics/16.5.451, 10871267
    • Enright AJ, Ouzounis CA. GeneRAGE: a robust algorithm for sequence clustering and domain detection. Bioinformatics 2000, 16(5):451-457. 10.1093/bioinformatics/16.5.451, 10871267.
    • (2000) Bioinformatics , vol.16 , Issue.5 , pp. 451-457
    • Enright, A.J.1    Ouzounis, C.A.2
  • 29
    • 0036211742 scopus 로고    scopus 로고
    • SWORDS: A statistical tool for analyzing large DNA sequences
    • 10.1007/BF02703678, 11927772
    • Chaudhuri P, Das S. SWORDS: A statistical tool for analyzing large DNA sequences. J Biosci 2002, 27(1):1-6. 10.1007/BF02703678, 11927772.
    • (2002) J Biosci , vol.27 , Issue.1 , pp. 1-6
    • Chaudhuri, P.1    Das, S.2
  • 30
    • 32644443138 scopus 로고    scopus 로고
    • Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
    • 10.1093/nar/gkj448, 1351371, 16436801
    • Uchiyama I. Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes. Nucleic Acids Res 2006, 34(2):647-658. 10.1093/nar/gkj448, 1351371, 16436801.
    • (2006) Nucleic Acids Res , vol.34 , Issue.2 , pp. 647-658
    • Uchiyama, I.1
  • 31
    • 79952001955 scopus 로고    scopus 로고
    • Clustering 16 S rRNA for OTU prediction: a method of unsupervised Bayesian clustering
    • 10.1093/bioinformatics/btq725, 3042185, 21233169
    • Hao X, Jiang R, Chen T. Clustering 16 S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics 2011, 27(5):611-618. 10.1093/bioinformatics/btq725, 3042185, 21233169.
    • (2011) Bioinformatics , vol.27 , Issue.5 , pp. 611-618
    • Hao, X.1    Jiang, R.2    Chen, T.3
  • 32
    • 25444502317 scopus 로고    scopus 로고
    • JACOP: a simple and robust method for the automated classification of protein sequences with modular architecture
    • 10.1186/1471-2105-6-216, 1208858, 16135248
    • Sperisen P, Pagni M. JACOP: a simple and robust method for the automated classification of protein sequences with modular architecture. BMC Bioinformatics 2005, 6:216. 10.1186/1471-2105-6-216, 1208858, 16135248.
    • (2005) BMC Bioinformatics , vol.6 , pp. 216
    • Sperisen, P.1    Pagni, M.2
  • 33
  • 34
    • 68049116871 scopus 로고    scopus 로고
    • Clustering Algorithms for ITS Sequence Data with Alignment Metrics
    • Kelarev A, Kang B, Steane D. Clustering Algorithms for ITS Sequence Data with Alignment Metrics. Lect Notes ComputSci 2006, 4304:1027-1031.
    • (2006) Lect Notes ComputSci , vol.4304 , pp. 1027-1031
    • Kelarev, A.1    Kang, B.2    Steane, D.3
  • 35
    • 34548784943 scopus 로고    scopus 로고
    • Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data
    • 10.1093/bioinformatics/btm320, 17597097
    • Tseng GC. Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 2007, 23(17):2247-2255. 10.1093/bioinformatics/btm320, 17597097.
    • (2007) Bioinformatics , vol.23 , Issue.17 , pp. 2247-2255
    • Tseng, G.C.1
  • 38
    • 67649406022 scopus 로고    scopus 로고
    • Enhanced bisecting k-means clustering using intermediate cooperation
    • Kashef R, Kamel MS. Enhanced bisecting k-means clustering using intermediate cooperation. Pattern Recognit 2009, 42(11):2557-2569.
    • (2009) Pattern Recognit , vol.42 , Issue.11 , pp. 2557-2569
    • Kashef, R.1    Kamel, M.S.2
  • 39
    • 33745634395 scopus 로고    scopus 로고
    • Cd-hit: a Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences
    • 10.1093/bioinformatics/btl158, 16731699
    • Li W, Godzik A. Cd-hit: a Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences. Bioinformatics 2006, 22(13):1658-1659. 10.1093/bioinformatics/btl158, 16731699.
    • (2006) Bioinformatics , vol.22 , Issue.13 , pp. 1658-1659
    • Li, W.1    Godzik, A.2
  • 41
    • 67649087001 scopus 로고    scopus 로고
    • EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data
    • 10.1186/1471-2105-10-S6-S10, 2788350, 19958509
    • Picardi E, Mignone F, Pesole G. EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data. BMC Bioinformatics 2009, 10(Suppl 6):S10. 10.1186/1471-2105-10-S6-S10, 2788350, 19958509.
    • (2009) BMC Bioinformatics , vol.10 , Issue.SUPPL. 6
    • Picardi, E.1    Mignone, F.2    Pesole, G.3
  • 44
    • 77952254719 scopus 로고    scopus 로고
    • Alignment and clustering of phylogenetic markers- implications for microbial diversity studies
    • 10.1186/1471-2105-11-152, 2859756, 20334679
    • White JR, Navlakha S, Nagarajan N, Ghodsi M, Kingsford C, Pop M. Alignment and clustering of phylogenetic markers- implications for microbial diversity studies. BMC bioinformatics 2010, 11:152. 10.1186/1471-2105-11-152, 2859756, 20334679.
    • (2010) BMC bioinformatics , vol.11 , pp. 152
    • White, J.R.1    Navlakha, S.2    Nagarajan, N.3    Ghodsi, M.4    Kingsford, C.5    Pop, M.6
  • 48
    • 0031558556 scopus 로고    scopus 로고
    • Estimating the Entropy of DNA sequences
    • Schmitt AO, Herzel H. Estimating the Entropy of DNA sequences. JTheor Biol 1997, 188(3):369-377.
    • (1997) JTheor Biol , vol.188 , Issue.3 , pp. 369-377
    • Schmitt, A.O.1    Herzel, H.2
  • 49
    • 10644274213 scopus 로고    scopus 로고
    • Relative entropy of DNA and its application
    • Li C, Wang J. Relative entropy of DNA and its application. Physica A: Stat Mech Appl 2005, 347:465-471.
    • (2005) Physica A: Stat Mech Appl , vol.347 , pp. 465-471
    • Li, C.1    Wang, J.2
  • 51
    • 24044537630 scopus 로고    scopus 로고
    • Hierarchical Clustering Algorithms for Document Datasets
    • Zhao Y, Karypis G. Hierarchical Clustering Algorithms for Document Datasets. Data Mining Knowl Discov 2005, 10:141-168.
    • (2005) Data Mining Knowl Discov , vol.10 , pp. 141-168
    • Zhao, Y.1    Karypis, G.2
  • 53
    • 0030203863 scopus 로고    scopus 로고
    • TreeView: an application to display phylogenetic trees on personal computers
    • Page RD. TreeView: an application to display phylogenetic trees on personal computers. Bioinformatics 1996, 12:357-358.
    • (1996) Bioinformatics , vol.12 , pp. 357-358
    • Page, R.D.1
  • 54
    • 77955477421 scopus 로고    scopus 로고
    • New method for comparing DNA primary sequences based on a discrimination measure
    • Feng J, Hu Y, Wan P, Zhang A, Zhao W. New method for comparing DNA primary sequences based on a discrimination measure. JTheor Biol 2010, 266(4):703-707.
    • (2010) JTheor Biol , vol.266 , Issue.4 , pp. 703-707
    • Feng, J.1    Hu, Y.2    Wan, P.3    Zhang, A.4    Zhao, W.5
  • 56
  • 57
    • 3042666256 scopus 로고    scopus 로고
    • MUSCLE: multiple sequence alignment with high accuracy and high throughput
    • 10.1093/nar/gkh340, 390337, 15034147
    • Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32(5):1792-1797. 10.1093/nar/gkh340, 390337, 15034147.
    • (2004) Nucleic Acids Res , vol.32 , Issue.5 , pp. 1792-1797
    • Edgar, R.C.1
  • 58
    • 79955120064 scopus 로고    scopus 로고
    • A new distribution vector and its application in genome clustering
    • Zhao B, He RL, Yau SS. A new distribution vector and its application in genome clustering. MolPhylogenet Evol 2011, 59(2):438-443.
    • (2011) MolPhylogenet Evol , vol.59 , Issue.2 , pp. 438-443
    • Zhao, B.1    He, R.L.2    Yau, S.S.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.