메뉴 건너뛰기




Volumn 9, Issue 1, 2018, Pages

Clustering huge protein sequence sets in linear time

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; CLUSTER ANALYSIS; DATA SET; DATABASE; GENOMICS; MEMORY; PROTEIN;

EID: 85049336634     PISSN: None     EISSN: 20411723     Source Type: Journal    
DOI: 10.1038/s41467-018-04964-5     Document Type: Article
Times cited : (549)

References (33)
  • 1
    • 0242523909 scopus 로고    scopus 로고
    • The uncultured microbial majority
    • Rappe, M. S. & Giovannoni, S. J. The uncultured microbial majority. Ann. Rev. Microbiol. 57, 369-394 (2003).
    • (2003) Ann. Rev. Microbiol. , vol.57 , Issue.1 , pp. 369-394
    • Rappe, M.S.1    Giovannoni, S.J.2
  • 2
    • 84976877550 scopus 로고    scopus 로고
    • The MG-RAST metagenomics database and portal in 2015
    • Wilke, A. et al. The MG-RAST metagenomics database and portal in 2015. Nucleic Acids Res. 44, D590-D594 (2016).
    • (2016) Nucleic Acids Res. , vol.44 , Issue.1 , pp. D590-D594
    • Wilke, A.1
  • 3
    • 84891800783 scopus 로고    scopus 로고
    • IMG/M 4 version of the integrated metagenome comparative analysis system
    • Markowitz, V. M. et al. IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res. 42, D568-D573 (2014).
    • (2014) Nucleic Acids Res. , vol.42 , Issue.1 , pp. D568-D573
    • Markowitz, V.M.1
  • 4
    • 84856496549 scopus 로고    scopus 로고
    • Next generation sequencing and bioinformatic bottlenecks: The current state of metagenomic data analysis
    • Scholz, M. B., Lo, C.-C. & Chain, P. S. Next generation sequencing and bioinformatic bottlenecks: The current state of metagenomic data analysis. Curr. Opin. Biotechnol. 23, 9-15 (2012).
    • (2012) Curr. Opin. Biotechnol. , vol.23 , Issue.1 , pp. 9-15
    • Scholz, M.B.1    Lo, C.C.2    Chain, P.S.3
  • 6
    • 84870006986 scopus 로고    scopus 로고
    • Functional assignment of metagenomic data: Challenges and applications
    • Prakash, T. & Taylor, T. D. Functional assignment of metagenomic data: challenges and applications. Brief. Bioinform. 13, 711-727 (2012).
    • (2012) Brief Bioinforma. , vol.13 , Issue.6 , pp. 711-727
    • Prakash, T.1    Taylor, T.D.2
  • 7
    • 84870431038 scopus 로고    scopus 로고
    • CD-HIT: Accelerated for clustering the next-generation sequencing data
    • Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150-3152 (2012).
    • (2012) Bioinformatics , vol.28 , pp. 3150-3152
    • Fu, L.1    Niu, B.2    Zhu, Z.3    Wu, S.4    Li, W.5
  • 8
    • 77957244650 scopus 로고    scopus 로고
    • Search and clustering orders of magnitude faster than BLAST
    • Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460-2461 (2010).
    • (2010) Bioinformatics , vol.26 , pp. 2460-2461
    • Edgar, R.C.1
  • 9
    • 84870027011 scopus 로고    scopus 로고
    • Ultrafast clustering algorithms for metagenomic sequence analysis
    • Li, W., Fu, L., Niu, B., Wu, S. & Wooley, J. Ultrafast clustering algorithms for metagenomic sequence analysis. Brief. Bioinform. 13, 656-668 (2012).
    • (2012) Brief Bioinform. , vol.13 , pp. 656-668
    • Li, W.1    Fu, L.2    Niu, B.3    Wu, S.4    Wooley, J.5
  • 10
    • 85033571642 scopus 로고    scopus 로고
    • MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets
    • Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026-1028 (2017).
    • (2017) Nat. Biotechnol. , vol.35 , pp. 1026-1028
    • Steinegger, M.1    Söding, J.2
  • 11
    • 84966378307 scopus 로고    scopus 로고
    • MMseqs software suite for fast and deep clustering and searching of large protein sequence sets
    • Hauser, M., Steinegger, M. & Söding, J. MMseqs software suite for fast and deep clustering and searching of large protein sequence sets. Bioinformatics 32, 1323-1330 (2016).
    • (2016) Bioinformatics , vol.32 , Issue.9 , pp. 1323-1330
    • Hauser, M.1    Steinegger, M.2    Söding, J.3
  • 12
    • 84925021592 scopus 로고    scopus 로고
    • Fast and sensitive protein alignment using diamond
    • Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using diamond. Nat. Methods 12, 59-60 (2015).
    • (2015) Nat. Methods , vol.12 , Issue.1 , pp. 59-60
    • Buchfink, B.1    Xie, C.2    Huson, D.H.3
  • 13
    • 84978998700 scopus 로고    scopus 로고
    • Mash: Fast genome and metagenome distance estimation using minhash
    • Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using minhash. Genome Biol. 17, 132 (2016).
    • (2016) Genome Biol. , vol.17
    • Ondov, B.D.1
  • 14
    • 84855167751 scopus 로고    scopus 로고
    • RAPSearch2: A fast and memory-efficient protein similarity search tool for next-generation sequencing data
    • Zhao, Y., Tang, H. & Ye, Y. RAPSearch2: A fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28, 125-126 (2012).
    • (2012) Bioinformatics , vol.28 , pp. 125-126
    • Zhao, Y.1    Tang, H.2    Ye, Y.3
  • 16
    • 85016161203 scopus 로고    scopus 로고
    • Uniclust databases of clustered and deeply annotated protein sequences and alignments
    • Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170-D176 (2017).
    • (2017) Nucleic Acids Res. , vol.45 , Issue.1 , pp. D170-D176
    • Mirdita, M.1
  • 17
    • 84946735654 scopus 로고    scopus 로고
    • Gene ontology consortium: Going forward
    • Gene Ontology Consortium
    • Gene Ontology Consortium. Gene ontology consortium: going forward. Nucleic Acids Res. 43, D1049-D1056 (2015).
    • (2015) Nucleic Acids Res. , vol.43 , Issue.1 , pp. D1049-D1056
  • 18
    • 84976865403 scopus 로고    scopus 로고
    • The pfam protein families database: Towards a more sustainable future
    • Finn, R. D. et al. The pfam protein families database: Towards a more sustainable future. Nucleic Acids Res. 44, D279-D285 (2016).
    • (2016) Nucleic Acids Res. , vol.44 , Issue.1 , pp. D279-D285
    • Finn, R.D.1
  • 19
    • 77952299957 scopus 로고    scopus 로고
    • Prodigal: Prokaryotic gene recognition and translation initiation site identification
    • Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma. 11, 119 (2010).
    • (2010) BMC Bioinforma. , vol.11 , Issue.1
    • Hyatt, D.1
  • 20
    • 84862198590 scopus 로고    scopus 로고
    • The sequence read archive: Explosive growth of sequencing data
    • Kodama, Y., Shumway, M. & Leinonen, R. The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res. 40, D54-D56 (2012).
    • (2012) Nucleic Acids Res. , vol.40 , Issue.1 , pp. D54-D56
    • Kodama, Y.1    Shumway, M.2    Leinonen, R.3
  • 21
    • 84929992013 scopus 로고    scopus 로고
    • Structure and function of the global ocean microbiome
    • Sunagawa, S. et al., Structure and function of the global ocean microbiome, Science, 348, no. 6237, pp. 1261359-1-9, (2015).
    • (2015) Science , vol.348 , Issue.6237 , pp. 12613591-12613599
    • Sunagawa, S.1
  • 22
    • 85010073180 scopus 로고    scopus 로고
    • Protein structure determination using metagenome sequence data
    • Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355, 294-298 (2017).
    • (2017) Science , vol.355 , pp. 294-298
    • Ovchinnikov, S.1
  • 23
    • 85011967183 scopus 로고    scopus 로고
    • Mutation effects predicted from sequence co-variation
    • Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128-135 (2017).
    • (2017) Nat. Biotechnol. , vol.35 , pp. 128-135
    • Hopf, T.A.1
  • 24
    • 0002546287 scopus 로고
    • Efficient algorithms for agglomerative hierarchical clustering methods
    • Day, W. H. & Edelsbrunner, H. Efficient algorithms for agglomerative hierarchical clustering methods,. J. Classif. 1, 7-24 (1984).
    • (1984) J. Classif. , vol.1 , Issue.1 , pp. 7-24
    • Day, W.H.1    Edelsbrunner, H.2
  • 26
    • 33745634395 scopus 로고    scopus 로고
    • Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences
    • Li, W. & Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658-1659 (2006).
    • (2006) Bioinformatics , vol.22 , pp. 1658-1659
    • Li, W.1    Godzik, A.2
  • 27
    • 84883410459 scopus 로고    scopus 로고
    • KClust: Fast and sensitive clustering of large protein sequence databases
    • Hauser, M., Mayer, C. & Soding, J. kClust: fast and sensitive clustering of large protein sequence databases. BMC Bioinforma. 14, 248 (2013).
    • (2013) BMC Bioinforma. , vol.14
    • Hauser, M.1    Mayer, C.2    Soding, J.3
  • 30
    • 84879678051 scopus 로고    scopus 로고
    • Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes
    • Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533-538 (2013).
    • (2013) Nat. Biotechnol. , vol.31 , Issue.6 , pp. 533-538
    • Albertsen, M.1
  • 31
    • 13444273448 scopus 로고    scopus 로고
    • The Universal Protein Resource (UniProt)
    • Bairoch, A. et al. The Universal Protein Resource (UniProt). Nucleic Acids Res. 33(Suppl. 1), D154-D159 (2005).
    • (2005) Nucleic Acids Res. , vol.33 , pp. D154-D159
    • Bairoch, A.1
  • 32
    • 84891755297 scopus 로고    scopus 로고
    • SSW Library: An SIMD Smith-Waterman C/C++ library for use in genomic applications
    • Zhao, M. Lee, W.-P. Garrison, E. P. & Marth, G. T. SSW Library: An SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS ONE 8, e82138 (2013).
    • (2013) PLoS ONE , vol.8 , pp. e82138
    • Zhao, M.1    Lee, W.-P.2    Garrison, E.P.3    Marth, G.T.4
  • 33
    • 84959923638 scopus 로고    scopus 로고
    • ALP & FALP: C++ libraries for pairwise local alignment E-values
    • Sheetlin, S., Park, Y., Frith, M. C. & Spouge, J. L. ALP & FALP: C++ libraries for pairwise local alignment E-values. Bioinformatics 32, 304-305 (2015).
    • (2015) Bioinformatics , vol.32 , Issue.2 , pp. 304-305
    • Sheetlin, S.1    Park, Y.2    Frith, M.C.3    Spouge, J.L.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.