메뉴 건너뛰기




Volumn 3, Issue 10, 2008, Pages

Probing metagenomics by rapid cluster analysis of very large datasets

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; AMINO ACID SEQUENCE; ARTICLE; CLUSTER ANALYSIS; GENE SEQUENCE; GENETIC DATABASE; GENETIC VARIABILITY; GENOMICS; INTERNET; METAGENOMICS; OPEN READING FRAME; PROTEIN FAMILY; PROTEIN LOCALIZATION; PROTEIN STRUCTURE; SEQUENCE ANALYSIS; SEQUENCE HOMOLOGY; CHEMISTRY; CLASSIFICATION; COMPUTER PROGRAM; GENETICS; GENOME; METHODOLOGY; MICROBIOLOGY; PROTEIN DATABASE; SEA;

EID: 54449096251     PISSN: None     EISSN: 19326203     Source Type: Journal    
DOI: 10.1371/journal.pone.0003375     Document Type: Article
Times cited : (32)

References (40)
  • 1
    • 33947238287 scopus 로고    scopus 로고
    • The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific
    • Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, et al. (2007) The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol 5: e77.
    • (2007) PLoS Biol , vol.5
    • Rusch, D.B.1    Halpern, A.L.2    Sutton, G.3    Heidelberg, K.B.4    Williamson, S.5
  • 2
    • 33947235074 scopus 로고    scopus 로고
    • The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families
    • Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, et al. (2007) The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families. PLoS Biol 5: e16.
    • (2007) PLoS Biol , vol.5
    • Yooseph, S.1    Sutton, G.2    Rusch, D.B.3    Halpern, A.L.4    Williamson, S.J.5
  • 3
  • 4
    • 31544483932 scopus 로고    scopus 로고
    • Community genomics among stratified microbial assemblages in the ocean's interior
    • DeLong EF, Preston CM, Mincer T, Rich V, Hallam SJ, et al. (2006) Community genomics among stratified microbial assemblages in the ocean's interior. Science 311: 496-503.
    • (2006) Science , vol.311 , pp. 496-503
    • DeLong, E.F.1    Preston, C.M.2    Mincer, T.3    Rich, V.4    Hallam, S.J.5
  • 7
    • 11144354360 scopus 로고    scopus 로고
    • Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304: 66-74.
    • Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304: 66-74.
  • 8
    • 1542377296 scopus 로고    scopus 로고
    • Community structure and metabolism through reconstruction of microbial genomes from the environment
    • Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, et al. (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428: 37-43.
    • (2004) Nature , vol.428 , pp. 37-43
    • Tyson, G.W.1    Chapman, J.2    Hugenholtz, P.3    Allen, E.E.4    Ram, R.J.5
  • 9
    • 34249794257 scopus 로고    scopus 로고
    • Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
    • Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, et al. (2007) Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods 4: 495-500.
    • (2007) Nat Methods , vol.4 , pp. 495-500
    • Mavromatis, K.1    Ivanova, N.2    Barry, K.3    Shapiro, H.4    Goltsman, E.5
  • 10
    • 33746106844 scopus 로고    scopus 로고
    • An analysis of the Sargasso Sea resource and the consequences for database composition
    • Tress ML, Cozzetto D, Tramontano A, Valencia A (2006) An analysis of the Sargasso Sea resource and the consequences for database composition. BMC Bioinformatics 7: 213.
    • (2006) BMC Bioinformatics , vol.7 , pp. 213
    • Tress, M.L.1    Cozzetto, D.2    Tramontano, A.3    Valencia, A.4
  • 14
    • 39049092823 scopus 로고    scopus 로고
    • A statistical toolbox for metagenomics: Assessing functional diversity in microbial communities
    • Schloss PD, Handelsman J (2008) A statistical toolbox for metagenomics: assessing functional diversity in microbial communities. BMC Bioinformatics 9: 34.
    • (2008) BMC Bioinformatics , vol.9 , pp. 34
    • Schloss, P.D.1    Handelsman, J.2
  • 17
    • 0031829372 scopus 로고    scopus 로고
    • Removing near-neighbour redundancy from large protein sequence collections
    • Holm L, Sander C (1998) Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14: 423-429.
    • (1998) Bioinformatics , vol.14 , pp. 423-429
    • Holm, L.1    Sander, C.2
  • 18
    • 0032726692 scopus 로고    scopus 로고
    • ProtoMap: Automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space
    • Yona G, Linial N, Linial M (1999) ProtoMap: automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space. Proteins 37: 360-378.
    • (1999) Proteins , vol.37 , pp. 360-378
    • Yona, G.1    Linial, N.2    Linial, M.3
  • 19
    • 0033944826 scopus 로고    scopus 로고
    • GeneRAGE: A robust algorithm for sequence clustering and domain detection
    • Enright AJ, Ouzounis CA (2000) GeneRAGE: a robust algorithm for sequence clustering and domain detection. Bioinformatics 16: 451-457.
    • (2000) Bioinformatics , vol.16 , pp. 451-457
    • Enright, A.J.1    Ouzounis, C.A.2
  • 20
    • 0033940118 scopus 로고    scopus 로고
    • RSDB: Representative protein sequence databases have high information content
    • Park J, Holm L, Heger A, Chothia C (2000) RSDB: representative protein sequence databases have high information content. Bioinformatics 16: 458-464.
    • (2000) Bioinformatics , vol.16 , pp. 458-464
    • Park, J.1    Holm, L.2    Heger, A.3    Chothia, C.4
  • 21
    • 0036529479 scopus 로고    scopus 로고
    • An efficient algorithm for large-scale detection of protein families
    • Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30: 1575-1584.
    • (2002) Nucleic Acids Res , vol.30 , pp. 1575-1584
    • Enright, A.J.1    Van Dongen, S.2    Ouzounis, C.A.3
  • 22
    • 0346652457 scopus 로고    scopus 로고
    • ProClust: Improved clustering of protein sequences with an extended graph-based approach
    • Pipenbacher P, Schliep A, Schneckener S, Schonhuth A, Schomburg D, et al. (2002) ProClust: improved clustering of protein sequences with an extended graph-based approach. Bioinformatics 18 Suppl 2: S182-191.
    • (2002) Bioinformatics , vol.18 , Issue.SUPPL. 2
    • Pipenbacher, P.1    Schliep, A.2    Schneckener, S.3    Schonhuth, A.4    Schomburg, D.5
  • 23
    • 0001899680 scopus 로고    scopus 로고
    • The metric space of proteins-comparative study of clustering algorithms
    • Sasson O, Linial N, Linial M (2002) The metric space of proteins-comparative study of clustering algorithms. Bioinformatics 18 Suppl 1: S14-21.
    • (2002) Bioinformatics , vol.18 , Issue.SUPPL. 1
    • Sasson, O.1    Linial, N.2    Linial, M.3
  • 24
    • 0043122933 scopus 로고    scopus 로고
    • UniqueProt: Creating representative protein sequence sets
    • Mika S, Rost B (2003) UniqueProt: Creating representative protein sequence sets. Nucleic Acids Res 31: 3789-3791.
    • (2003) Nucleic Acids Res , vol.31 , pp. 3789-3791
    • Mika, S.1    Rost, B.2
  • 25
    • 0035072551 scopus 로고    scopus 로고
    • Clustering of highly homologous sequences to reduce the size of large protein databases
    • Li W, Jaroszewski L, Godzik A (2001) Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17: 282-283.
    • (2001) Bioinformatics , vol.17 , pp. 282-283
    • Li, W.1    Jaroszewski, L.2    Godzik, A.3
  • 26
    • 0036169928 scopus 로고    scopus 로고
    • Tolerating some redundancy significantly speeds up clustering of large protein databases
    • Li W, Jaroszewski L, Godzik A (2002) Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 18: 77-82.
    • (2002) Bioinformatics , vol.18 , pp. 77-82
    • Li, W.1    Jaroszewski, L.2    Godzik, A.3
  • 27
    • 33745634395 scopus 로고    scopus 로고
    • Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences
    • Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22: 1658-1659.
    • (2006) Bioinformatics , vol.22 , pp. 1658-1659
    • Li, W.1    Godzik, A.2
  • 28
    • 34347388470 scopus 로고    scopus 로고
    • UniRef: Comprehensive and non-redundant UniProt reference clusters
    • Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23: 1282-1288.
    • (2007) Bioinformatics , vol.23 , pp. 1282-1288
    • Suzek, B.E.1    Huang, H.2    McGarvey, P.3    Mazumder, R.4    Wu, C.H.5
  • 30
    • 0030801002 scopus 로고    scopus 로고
    • Gapped BLAST and PSI-BLAST: A new generation of protein database search programs
    • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402.
    • (1997) Nucleic Acids Res , vol.25 , pp. 3389-3402
    • Altschul, S.F.1    Madden, T.L.2    Schaffer, A.A.3    Zhang, J.4    Zhang, Z.5
  • 31
    • 0036699189 scopus 로고    scopus 로고
    • Sequence clustering strategies improve remote homology recognitions while reducing search times
    • Li W, Jaroszewski L, Godzik A (2002) Sequence clustering strategies improve remote homology recognitions while reducing search times. Protein Eng 15: 643-649.
    • (2002) Protein Eng , vol.15 , pp. 643-649
    • Li, W.1    Jaroszewski, L.2    Godzik, A.3
  • 32
    • 0031743421 scopus 로고    scopus 로고
    • Profile hidden Markov models
    • Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14: 755-763.
    • (1998) Bioinformatics , vol.14 , pp. 755-763
    • Eddy, S.R.1
  • 33
    • 0033997036 scopus 로고    scopus 로고
    • Comparison of sequence profiles. Strategies for structural predictions using sequence information
    • In Process Citation
    • Rychlewski L, Jaroszewski L, Li W, Godzik A (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information [In Process Citation]. Protein Sci 9: 232-241.
    • (2000) Protein Sci , vol.9 , pp. 232-241
    • Rychlewski, L.1    Jaroszewski, L.2    Li, W.3    Godzik, A.4
  • 34
    • 0030660581 scopus 로고    scopus 로고
    • A genomic perspective on protein families
    • Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278: 631-637.
    • (1997) Science , vol.278 , pp. 631-637
    • Tatusov, R.L.1    Koonin, E.V.2    Lipman, D.J.3
  • 35
    • 0035910270 scopus 로고    scopus 로고
    • Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes
    • Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305: 567-580.
    • (2001) J Mol Biol , vol.305 , pp. 567-580
    • Krogh, A.1    Larsson, B.2    von Heijne, G.3    Sonnhammer, E.L.4
  • 37
    • 0033106244 scopus 로고    scopus 로고
    • Evaluation and improvement of multiple sequence methods for protein secondary structure prediction
    • Cuff JA, Barton GJ (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34: 508-519.
    • (1999) Proteins , vol.34 , pp. 508-519
    • Cuff, J.A.1    Barton, G.J.2
  • 38
    • 0026356891 scopus 로고
    • Predicting coiled coils from protein sequences
    • Lupas A, Van Dyke M, Stock J (1991) Predicting coiled coils from protein sequences. Science 252: 1162-1164.
    • (1991) Science , vol.252 , pp. 1162-1164
    • Lupas, A.1    Van Dyke, M.2    Stock, J.3
  • 39
    • 0027968068 scopus 로고
    • CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
    • Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673-4680.
    • (1994) Nucleic Acids Res , vol.22 , pp. 4673-4680
    • Thompson, J.D.1    Higgins, D.G.2    Gibson, T.J.3
  • 40
    • 0027136282 scopus 로고
    • Comparative protein modelling by satisfaction of spatial restraints
    • Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234: 779-815.
    • (1993) J Mol Biol , vol.234 , pp. 779-815
    • Sali, A.1    Blundell, T.L.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.