메뉴 건너뛰기




Volumn 14, Issue 1, 2013, Pages

KClust: Fast and sensitive clustering of large protein sequence databases

Author keywords

[No Author keywords available]

Indexed keywords

DYNAMIC PROGRAMMING ALGORITHM; FALSE DISCOVERY RATE; HIGH-THROUGHPUT SEQUENCING; HOMOLOGY SEARCH; PROTEIN SEQUENCE DATABASE; SEQUENCE DATABASE; SEQUENCE IDENTITY; THREE ORDERS OF MAGNITUDE;

EID: 84883410459     PISSN: None     EISSN: 14712105     Source Type: Journal    
DOI: 10.1186/1471-2105-14-248     Document Type: Article
Times cited : (69)

References (37)
  • 1
    • 77958487982 scopus 로고    scopus 로고
    • Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe
    • 10.1093/bioinformatics/btq527, 20843957
    • Chubb D, Jefferys BR, Sternberg MJE, Kelley LA. Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe. Bioinformatics 2010, 26(21):2664-2671. [http://bioinformatics.oxfordjournals.org/content/26/21/2664.abstract], 10.1093/bioinformatics/btq527, 20843957.
    • (2010) Bioinformatics , vol.26 , Issue.21 , pp. 2664-2671
    • Chubb, D.1    Jefferys, B.R.2    Sternberg, M.J.E.3    Kelley, L.A.4
  • 2
    • 0036699189 scopus 로고    scopus 로고
    • Sequence clustering strategies improve remote homology recognitions while reducing search times
    • 10.1093/protein/15.8.643, 12364578
    • Li W, Jaroszewski L, Godzik A: Sequence clustering strategies improve remote homology recognitions while reducing search times. Protein Eng 2002, 15(8):643-649. [http://view.ncbi.nlm.nih.gov/pubmed/12364578], 10.1093/protein/15.8.643, 12364578
    • (2002) Protein Eng , vol.15 , Issue.8 , pp. 643-649
    • Li, W.1    Jaroszewski, L.2    Godzik, A.3
  • 3
    • 0033940118 scopus 로고    scopus 로고
    • RSDB: representative protein sequence databases have high information content
    • 10.1093/bioinformatics/16.5.458, 10871268
    • Park J, Holm L, Heger A, Chothia C: RSDB: representative protein sequence databases have high information content. Bioinformatics 2000, 16(5):458-464. [http://view.ncbi.nlm.nih.gov/pubmed/10871268], 10.1093/bioinformatics/16.5.458, 10871268
    • (2000) Bioinformatics , vol.16 , Issue.5 , pp. 458-464
    • Park, J.1    Holm, L.2    Heger, A.3    Chothia, C.4
  • 4
    • 34347388470 scopus 로고    scopus 로고
    • UniRef: comprehensive and non-redundant UniProt reference clusters
    • 10.1093/bioinformatics/btm098, 17379688
    • Suzek B, Huang H, McGarvey P, Mazumder R, Wu C: UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 2007, 23(10):1282-1288. 10.1093/bioinformatics/btm098, 17379688
    • (2007) Bioinformatics , vol.23 , Issue.10 , pp. 1282-1288
    • Suzek, B.1    Huang, H.2    McGarvey, P.3    Mazumder, R.4    Wu, C.5
  • 6
    • 77953050946 scopus 로고    scopus 로고
    • A catalog of reference genomes from the human microbiome
    • 2940224, 20489017, Human Microbiome Jumpstart Reference Strains Consortium
    • Human Microbiome Jumpstart Reference Strains Consortium A catalog of reference genomes from the human microbiome. Science 2010, 328(5981):994-999. [http://dx.doi.org/10.1126/science.1183605], 2940224, 20489017, Human Microbiome Jumpstart Reference Strains Consortium.
    • (2010) Science , vol.328 , Issue.5981 , pp. 994-999
  • 8
    • 84856489442 scopus 로고    scopus 로고
    • HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment
    • 10.1038/nmeth.1818, 22198341
    • Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 2011, 9(2):173-175. 10.1038/nmeth.1818, 22198341.
    • (2011) Nat Methods , vol.9 , Issue.2 , pp. 173-175
    • Remmert, M.1    Biegert, A.2    Hauser, A.3    Söding, J.4
  • 9
    • 0030801002 scopus 로고    scopus 로고
    • Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    • 10.1093/nar/25.17.3389, 146917, 9254694
    • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402. [http://dx.doi.org/10.1093/nar/25.17.3389], 10.1093/nar/25.17.3389, 146917, 9254694.
    • (1997) Nucleic Acids Res , vol.25 , Issue.17 , pp. 3389-3402
    • Altschul, S.F.1    Madden, T.L.2    Schäffer, A.A.3    Zhang, J.4    Zhang, Z.5    Miller, W.6    Lipman, D.J.7
  • 10
    • 0023989064 scopus 로고
    • Improved tools for biological sequence comparison
    • 10.1073/pnas.85.8.2444, 280013, 3162770
    • Pearson W, Lipman D. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 1988, 85(8):2444-2448. 10.1073/pnas.85.8.2444, 280013, 3162770.
    • (1988) Proc Natl Acad Sci U S A , vol.85 , Issue.8 , pp. 2444-2448
    • Pearson, W.1    Lipman, D.2
  • 12
    • 0032726692 scopus 로고    scopus 로고
    • ProtoMap: automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space
    • 10.1002/(SICI)1097-0134(19991115)37:3<360::AID-PROT5>3.0.CO;2-Z, 10591097
    • Yona G, Linial N, Linial M. ProtoMap: automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space. Proteins 1999, 37(3):360-378. [http://www.hubmed.org/display.cgi?uids=10591097], 10.1002/(SICI)1097-0134(19991115)37:3<360::AID-PROT5>3.0.CO;2-Z, 10591097.
    • (1999) Proteins , vol.37 , Issue.3 , pp. 360-378
    • Yona, G.1    Linial, N.2    Linial, M.3
  • 13
    • 0033985049 scopus 로고    scopus 로고
    • The SYSTERS protein sequence cluster set
    • 10.1093/nar/28.1.270, 102384, 10592244
    • Krause A, Stoye J, Vingron M. The SYSTERS protein sequence cluster set. Nucleic Acids Res 2000, 28:270-272. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC102384/], 10.1093/nar/28.1.270, 102384, 10592244.
    • (2000) Nucleic Acids Res , vol.28 , pp. 270-272
    • Krause, A.1    Stoye, J.2    Vingron, M.3
  • 14
    • 79955013072 scopus 로고    scopus 로고
    • Ultra-fast sequence clustering from similarity networks with SiLiX
    • 10.1186/1471-2105-12-116, 3095554, 21513511
    • Miele V, Penel S, Duret L Ultra-fast sequence clustering from similarity networks with SiLiX. BMC Bioinformatics 2011, 12:116. [http://dx.doi.org/10.1186/1471-2105-12-116], 10.1186/1471-2105-12-116, 3095554, 21513511
    • (2011) BMC Bioinformatics , vol.12 , pp. 116
    • Miele, V.1    Penel, S.2    Duret, L.3
  • 15
    • 84862204079 scopus 로고    scopus 로고
    • ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree
    • 10.1093/nar/gkr1027, 3245180, 22121228
    • Rappoport N, Karsenty S, Stern A, Linial N, Linial M. ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree. Nucleic Acids Res 2012, 40(D1):D313-D320. 10.1093/nar/gkr1027, 3245180, 22121228.
    • (2012) Nucleic Acids Res , vol.40 , Issue.D1
    • Rappoport, N.1    Karsenty, S.2    Stern, A.3    Linial, N.4    Linial, M.5
  • 16
    • 0035861990 scopus 로고    scopus 로고
    • Automatic clustering of orthologs and in-paralogs from pairwise species comparisons
    • 10.1006/jmbi.2000.5197, 11743721
    • Remm M, Storm CE, Sonnhammer EL. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 2001, 314(5):1041-1052. [http://dx.doi.org/10.1006/jmbi.2000.5197], 10.1006/jmbi.2000.5197, 11743721.
    • (2001) J Mol Biol , vol.314 , Issue.5 , pp. 1041-1052
    • Remm, M.1    Storm, C.E.2    Sonnhammer, E.L.3
  • 17
    • 0036529479 scopus 로고    scopus 로고
    • An efficient algorithm for large-scale detection of protein families
    • 10.1093/nar/30.7.1575, 101833, 11917018
    • Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 2002, 30(7):1575-1584. [http://dx.doi.org/10.1093/nar/30.7.1575], 10.1093/nar/30.7.1575, 101833, 11917018.
    • (2002) Nucleic Acids Res , vol.30 , Issue.7 , pp. 1575-1584
    • Enright, A.J.1    Van Dongen, S.2    Ouzounis, C.A.3
  • 19
    • 0141519279 scopus 로고    scopus 로고
    • OrthoMCL: identification of ortholog groups for eukaryotic genomes
    • 10.1101/gr.1224503, 403725, 12952885
    • Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 2003, 13(9):2178-2189. [http://dx.doi.org/10.1101/gr.1224503], 10.1101/gr.1224503, 403725, 12952885.
    • (2003) Genome Res , vol.13 , Issue.9 , pp. 2178-2189
    • Li, L.1    Stoeckert, C.J.2    Roos, D.S.3
  • 20
    • 33747892409 scopus 로고    scopus 로고
    • Automatic clustering of orthologs and inparalogs shared by multiple proteomes
    • 10.1093/bioinformatics/btl213, 16873526
    • Alexeyenko A, Tamas I, Liu G, Sonnhammer EL. Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 2006, 22(14):e9-e15. [http://dx.doi.org/10.1093/bioinformatics/btl213], 10.1093/bioinformatics/btl213, 16873526.
    • (2006) Bioinformatics , vol.22 , Issue.14
    • Alexeyenko, A.1    Tamas, I.2    Liu, G.3    Sonnhammer, E.L.4
  • 21
    • 77957891266 scopus 로고    scopus 로고
    • DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection
    • 3040532, 21210985
    • Chen TW, Wu TH, Ng WV, Lin WC. DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection. BMC Bioinformatics 2010, 11(Suppl 7):S6. [http://dx.doi.org/10.1186/1471-2105-11-S7-S6], 3040532, 21210985.
    • (2010) BMC Bioinformatics , vol.11 , Issue.SUPPL. 7
    • Chen, T.W.1    Wu, T.H.2    Ng, W.V.3    Lin, W.C.4
  • 23
    • 0029933671 scopus 로고    scopus 로고
    • Effective protein sequence comparison
    • Pearson WR. Effective protein sequence comparison. Methods Enzymol 1996, 266:227-258. [http://www.hubmed.org/display.cgi?uids=8743688].
    • (1996) Methods Enzymol , vol.266 , pp. 227-258
    • Pearson, W.R.1
  • 24
    • 75549086850 scopus 로고    scopus 로고
    • SIMAP-a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
    • 2808863, 19906725
    • Rattei T, Tischler P, Götz S, Jehl MA, Hoser J, Arnold R, Conesa A, Mewes HW. SIMAP-a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters. Nucleic Acids Res 2010, 38(Database issue):D223-226. [http://dx.doi.org/10.1093/nar/gkp949], 2808863, 19906725.
    • (2010) Nucleic Acids Res , vol.38 , Issue.DATABASE ISSUE
    • Rattei, T.1    Tischler, P.2    Götz, S.3    Jehl, M.A.4    Hoser, J.5    Arnold, R.6    Conesa, A.7    Mewes, H.W.8
  • 25
    • 0036169928 scopus 로고    scopus 로고
    • Tolerating some redundancy significantly speeds up clustering of large protein databases
    • 10.1093/bioinformatics/18.1.77, 11836214
    • Li W, Jaroszewski L, Godzik A. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 2002, 18:77-82. [http://view.ncbi.nlm.nih.gov/pubmed/11836214], 10.1093/bioinformatics/18.1.77, 11836214.
    • (2002) Bioinformatics , vol.18 , pp. 77-82
    • Li, W.1    Jaroszewski, L.2    Godzik, A.3
  • 26
    • 33745634395 scopus 로고    scopus 로고
    • CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences
    • 10.1093/bioinformatics/btl158, 16731699
    • Li W, Godzik A. CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22(13):1658-1659. [http://dx.doi.org/10.1093/bioinformatics/btl158], 10.1093/bioinformatics/btl158, 16731699.
    • (2006) Bioinformatics , vol.22 , Issue.13 , pp. 1658-1659
    • Li, W.1    Godzik, A.2
  • 27
    • 84870431038 scopus 로고    scopus 로고
    • CD-HIT: accelerated for clustering the next-generation sequencing data
    • 10.1093/bioinformatics/bts565, 3516142, 23060610
    • Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 2012, 28(23):3150-3152. [http://dblp.uni-trier.de/db/journals/bioinformatics/bioinformatics28.html#FuNZWL12], 10.1093/bioinformatics/bts565, 3516142, 23060610.
    • (2012) Bioinformatics , vol.28 , Issue.23 , pp. 3150-3152
    • Fu, L.1    Niu, B.2    Zhu, Z.3    Wu, S.4    Li, W.5
  • 28
    • 77957244650 scopus 로고    scopus 로고
    • Search and clustering orders of magnitude faster than BLAST
    • 10.1093/bioinformatics/btq461, 20709691
    • Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26(19):2460-2461. [http://dx.doi.org/10.1093/bioinformatics/btq461], 10.1093/bioinformatics/btq461, 20709691.
    • (2010) Bioinformatics , vol.26 , Issue.19 , pp. 2460-2461
    • Edgar, R.C.1
  • 29
    • 0027092678 scopus 로고
    • Selection of representative protein data sets
    • 2142204, 1304348
    • Hobohm U, Scharf M, Schneider R, Sander C. Selection of representative protein data sets. Protein Sci 1992, 1(3):409-417. [http://www.hubmed.org/display.cgi?uids=1304348], 2142204, 1304348.
    • (1992) Protein Sci , vol.1 , Issue.3 , pp. 409-417
    • Hobohm, U.1    Scharf, M.2    Schneider, R.3    Sander, C.4
  • 30
    • 0036202921 scopus 로고    scopus 로고
    • PatternHunter: faster and more sensitive homology search
    • 10.1093/bioinformatics/18.3.440, 11934743
    • Ma B, Tromp J, Li M. PatternHunter: faster and more sensitive homology search. Bioinformatics 2002, 18(3):440-445. [http://dx.doi.org/10.1093/bioinformatics/18.3.440], 10.1093/bioinformatics/18.3.440, 11934743.
    • (2002) Bioinformatics , vol.18 , Issue.3 , pp. 440-445
    • Ma, B.1    Tromp, J.2    Li, M.3
  • 32
    • 34249857267 scopus 로고    scopus 로고
    • Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments
    • 10.1093/nar/gkm107, 1874647, 17369271
    • Przybylski D, Rost B. Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments. Nucleic Acids Res 2007, 35(7):2238-2246. [http://dx.doi.org/10.1093/nar/gkm107], 10.1093/nar/gkm107, 1874647, 17369271.
    • (2007) Nucleic Acids Res , vol.35 , Issue.7 , pp. 2238-2246
    • Przybylski, D.1    Rost, B.2
  • 34
    • 0033977962 scopus 로고    scopus 로고
    • SCOP: a structural classification of proteins database
    • 10.1093/nar/28.1.257, 102479, 10592240
    • Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C. SCOP: a structural classification of proteins database. Nucleic Acids Res 2000, 28:257-259. [http://view.ncbi.nlm.nih.gov/pubmed/10592240], 10.1093/nar/28.1.257, 102479, 10592240.
    • (2000) Nucleic Acids Res , vol.28 , pp. 257-259
    • Lo Conte, L.1    Ailey, B.2    Hubbard, T.J.3    Brenner, S.E.4    Murzin, A.G.5    Chothia, C.6
  • 36
    • 0034767547 scopus 로고    scopus 로고
    • Annotation transfer for genomics: measuring functional divergence in multi-domain proteins
    • 10.1101/gr. 183801, 311165, 11591640
    • Hegyi H, Gerstein M. Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. Genome Res 2001, 11(10):1632-1640. [http://dx.doi.org/10.1101/gr.%20183801], 10.1101/gr. 183801, 311165, 11591640.
    • (2001) Genome Res , vol.11 , Issue.10 , pp. 1632-1640
    • Hegyi, H.1    Gerstein, M.2
  • 37
    • 80052722558 scopus 로고    scopus 로고
    • SEED: efficient clustering of next-generation sequences
    • 3167058, 21810899
    • Bao E, Jiang T, Kaloshian I, Girke T. SEED: efficient clustering of next-generation sequences. Bioinformatics 2011, 27(18):2502-2509. [http://bioinformatics.oxfordjournals.org/content/27/18/2502.abstract], 3167058, 21810899.
    • (2011) Bioinformatics , vol.27 , Issue.18 , pp. 2502-2509
    • Bao, E.1    Jiang, T.2    Kaloshian, I.3    Girke, T.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.