메뉴 건너뛰기




Volumn 278, Issue 5338, 1997, Pages 631-637

A genomic perspective on protein families

Author keywords

[No Author keywords available]

Indexed keywords

ARTICLE; FUNGUS; GENOME; GRAM NEGATIVE BACTERIUM; GRAM POSITIVE BACTERIUM; MULTIGENE FAMILY; NONHUMAN; PHYLOGENY; PRIORITY JOURNAL; PROTEIN ANALYSIS; TAXONOMY;

EID: 0030660581     PISSN: 00368075     EISSN: None     Source Type: Journal    
DOI: 10.1126/science.278.5338.631     Document Type: Article
Times cited : (2911)

References (67)
  • 2
    • 0028829125 scopus 로고
    • C. M. Fraser et al., ibid. 270, 397 (1995); R. Himmelreich et al., Nucleic Acids Res. 24, 4420 (1996); T. Kaneko et al., DNA Res. 3, 109 (1996); F. R. Blattner et al., Science 277, 1453 (1997).
    • (1995) Science , vol.270 , pp. 397
    • Fraser, C.M.1
  • 3
    • 10544255079 scopus 로고    scopus 로고
    • C. M. Fraser et al., ibid. 270, 397 (1995); R. Himmelreich et al., Nucleic Acids Res. 24, 4420 (1996); T. Kaneko et al., DNA Res. 3, 109 (1996); F. R. Blattner et al., Science 277, 1453 (1997).
    • (1996) Nucleic Acids Res. , vol.24 , pp. 4420
    • Himmelreich, R.1
  • 4
    • 0030606607 scopus 로고    scopus 로고
    • C. M. Fraser et al., ibid. 270, 397 (1995); R. Himmelreich et al., Nucleic Acids Res. 24, 4420 (1996); T. Kaneko et al., DNA Res. 3, 109 (1996); F. R. Blattner et al., Science 277, 1453 (1997).
    • (1996) DNA Res. , vol.3 , pp. 109
    • Kaneko, T.1
  • 5
    • 15444350252 scopus 로고    scopus 로고
    • C. M. Fraser et al., ibid. 270, 397 (1995); R. Himmelreich et al., Nucleic Acids Res. 24, 4420 (1996); T. Kaneko et al., DNA Res. 3, 109 (1996); F. R. Blattner et al., Science 277, 1453 (1997).
    • (1997) Science , vol.277 , pp. 1453
    • Blattner, F.R.1
  • 6
    • 16044367245 scopus 로고    scopus 로고
    • C. J. Bult et al., Science 273, 1058 (1996).
    • (1996) Science , vol.273 , pp. 1058
    • Bult, C.J.1
  • 7
    • 10244239321 scopus 로고    scopus 로고
    • A. Goffeau et al., ibid. 274, 546 (1996); H. W. Mewes et al., Nature 387, 7 (1997).
    • (1996) Science , vol.274 , pp. 546
    • Goffeau, A.1
  • 8
    • 8544240102 scopus 로고    scopus 로고
    • A. Goffeau et al., ibid. 274, 546 (1996); H. W. Mewes et al., Nature 387, 7 (1997).
    • (1997) Nature , vol.387 , pp. 7
    • Mewes, H.W.1
  • 9
    • 0030230748 scopus 로고    scopus 로고
    • C. R. Woese, Curr. Biol. 6, 1060 (1996); G. J. Olsen and C. R. Woese, Cell 89, 991 (1997) ; E. V. Koonin, Genome Res. 7, 418 (1997).
    • (1996) Curr. Biol. , vol.6 , pp. 1060
    • Woese, C.R.1
  • 10
    • 0031587820 scopus 로고    scopus 로고
    • C. R. Woese, Curr. Biol. 6, 1060 (1996); G. J. Olsen and C. R. Woese, Cell 89, 991 (1997) ; E. V. Koonin, Genome Res. 7, 418 (1997).
    • (1997) Cell , vol.89 , pp. 991
    • Olsen, G.J.1    Woese, C.R.2
  • 11
    • 0030907680 scopus 로고    scopus 로고
    • C. R. Woese, Curr. Biol. 6, 1060 (1996); G. J. Olsen and C. R. Woese, Cell 89, 991 (1997) ; E. V. Koonin, Genome Res. 7, 418 (1997).
    • (1997) Genome Res. , vol.7 , pp. 418
    • Koonin, E.V.1
  • 14
    • 0014800108 scopus 로고
    • W. M. Fitch, Syst. Zool. 19, 99 (1970). This definition may not embrace all of the complexity of relationships between genes in different genomes. For example, if genes A and B are paralogs encoded in genome 1, and A' and B' are their respective orthologs in genome 2, what is the appropriate description of the relationship between A and B'? They formally are not paralogs, even though a generalized definition might include such cases. Furthermore, one-to-many and many-to-many orthologous relationships evidently exist.
    • (1970) Syst. Zool. , vol.19 , pp. 99
    • Fitch, W.M.1
  • 18
    • 0030004228 scopus 로고    scopus 로고
    • The protein sequences were from the original references (1-4), with modifications (for example, tentative correction of frame-shift errors) and additions (previously unreported predicted genes) made for E. coli (E. V. Koonin and R. L. Tatusov, unpublished observations; K. E. Rudd, personal communication), H. influenzae (9), M. genitalium and M. jannaschii (10), and S. cerevisiae (T. J. Wolfsberg and D. Landsman, personal communication). The list of systematic names for all E. coli genes was provided by K. Rudd, and the names for all yeast genes were provided by T. Wolfsberg and D. Landsman; the H. influenzae genes were renamed as previously described (9); the gene names for the other species were from the original publications. The resulting protein database from complete genomes used in all comparisons contained 4283 sequences from E. coli, 1703 sequences from H. influenzae, 468 sequences from M. genitalium, 677 sequences from M. pneumoniae, 3168 sequences from Synechocystis sp., 1736 sequences from M. jannaschii, and 5932 sequences from S. cerevisiae, totaling 17,967 sequences. This sequence set is available on the World Wide Web at http://www.ncbi.nlm.nih.gov/ COG. All pairwise comparisons between these sequences were performed using the BLASTPGP program, which is based on an enhanced version of the BLAST algorithm and includes analysis of local alignments with gaps (26). Predicted coiled coil regions in protein sequences were masked before the comparison using the batch version of the COILS2 program [A. Lupas, Methods Enzymol. 266, 513 (1996); D. R. Walker and E. V. Koonin, ISMB 5, 333 (1997)], and additionally, regions of low complexity were masked using the SEG program with default parameters [J. C. Wootton and S. Federhen, Methods Enzymol. 266, 554 (1996)]. Before the detection of triangles of BeTs, paralogs were identified as those proteins from the same lineage that showed greater similarity to each other than to any protein from another lineage. For the purpose of triangle formation, paralogs were treated as a group. The algorithm further included verification that the BeTs included in a triangle formed a consistent multiple alignment; triangles that did not contain a conserved motif were disregarded.
    • (1996) Methods Enzymol. , vol.266 , pp. 513
    • Lupas, A.1
  • 19
    • 0030623948 scopus 로고    scopus 로고
    • The protein sequences were from the original references (1-4), with modifications (for example, tentative correction of frame-shift errors) and additions (previously unreported predicted genes) made for E. coli (E. V. Koonin and R. L. Tatusov, unpublished observations; K. E. Rudd, personal communication), H. influenzae (9), M. genitalium and M. jannaschii (10), and S. cerevisiae (T. J. Wolfsberg and D. Landsman, personal communication). The list of systematic names for all E. coli genes was provided by K. Rudd, and the names for all yeast genes were provided by T. Wolfsberg and D. Landsman; the H. influenzae genes were renamed as previously described (9); the gene names for the other species were from the original publications. The resulting protein database from complete genomes used in all comparisons contained 4283 sequences from E. coli, 1703 sequences from H. influenzae, 468 sequences from M. genitalium, 677 sequences from M. pneumoniae, 3168 sequences from Synechocystis sp., 1736 sequences from M. jannaschii, and 5932 sequences from S. cerevisiae, totaling 17,967 sequences. This sequence set is available on the World Wide Web at http://www.ncbi.nlm.nih.gov/ COG. All pairwise comparisons between these sequences were performed using the BLASTPGP program, which is based on an enhanced version of the BLAST algorithm and includes analysis of local alignments with gaps (26). Predicted coiled coil regions in protein sequences were masked before the comparison using the batch version of the COILS2 program [A. Lupas, Methods Enzymol. 266, 513 (1996); D. R. Walker and E. V. Koonin, ISMB 5, 333 (1997)], and additionally, regions of low complexity were masked using the SEG program with default parameters [J. C. Wootton and S. Federhen, Methods Enzymol. 266, 554 (1996)]. Before the detection of triangles of BeTs, paralogs were identified as those proteins from the same lineage that showed greater similarity to each other than to any protein from another lineage. For the purpose of triangle formation, paralogs were treated as a group. The algorithm further included verification that the BeTs included in a triangle formed a consistent multiple alignment; triangles that did not contain a conserved motif were disregarded.
    • (1997) ISMB , vol.5 , pp. 333
    • Walker, D.R.1    Koonin, E.V.2
  • 20
    • 0029901640 scopus 로고    scopus 로고
    • The protein sequences were from the original references (1-4), with modifications (for example, tentative correction of frame-shift errors) and additions (previously unreported predicted genes) made for E. coli (E. V. Koonin and R. L. Tatusov, unpublished observations; K. E. Rudd, personal communication), H. influenzae (9), M. genitalium and M. jannaschii (10), and S. cerevisiae (T. J. Wolfsberg and D. Landsman, personal communication). The list of systematic names for all E. coli genes was provided by K. Rudd, and the names for all yeast genes were provided by T. Wolfsberg and D. Landsman; the H. influenzae genes were renamed as previously described (9); the gene names for the other species were from the original publications. The resulting protein database from complete genomes used in all comparisons contained 4283 sequences from E. coli, 1703 sequences from H. influenzae, 468 sequences from M. genitalium, 677 sequences from M. pneumoniae, 3168 sequences from Synechocystis sp., 1736 sequences from M. jannaschii, and 5932 sequences from S. cerevisiae, totaling 17,967 sequences. This sequence set is available on the World Wide Web at http://www.ncbi.nlm.nih.gov/ COG. All pairwise comparisons between these sequences were performed using the BLASTPGP program, which is based on an enhanced version of the BLAST algorithm and includes analysis of local alignments with gaps (26). Predicted coiled coil regions in protein sequences were masked before the comparison using the batch version of the COILS2 program [A. Lupas, Methods Enzymol. 266, 513 (1996); D. R. Walker and E. V. Koonin, ISMB 5, 333 (1997)], and additionally, regions of low complexity were masked using the SEG program with default parameters [J. C. Wootton and S. Federhen, Methods Enzymol. 266, 554 (1996)]. Before the detection of triangles of BeTs, paralogs were identified as those proteins from the same lineage that showed greater similarity to each other than to any protein from another lineage. For the purpose of triangle formation, paralogs were treated as a group. The algorithm further included verification that the BeTs included in a triangle formed a consistent multiple alignment; triangles that did not contain a conserved motif were disregarded.
    • (1996) Methods Enzymol. , vol.266 , pp. 554
    • Wootton, J.C.1    Federhen, S.2
  • 21
    • 1842295597 scopus 로고    scopus 로고
    • note
    • Although the exact solution depends on the amino acid composition and size of the particular proteins, under zero approximation, if B (from genome b) is the BeT for A (from genome a), and C (from genome c) is the BeT for B, the probability that C is the BeT for A by chance is close to 1/N, where N is the number of genes in genome c, or ∼0.001.
  • 22
    • 0023184331 scopus 로고
    • C. R. Woese, Microbiol. Rev. 51, 221 (1987); _, R. Overbeek, G. J. Olsen, J. Bacteriol. 176, 1 (1994); N. R. Pace, Science 276, 734 (1997). A BeT to a given clade was registered if detected in any of the constituent species, for example, in E. coli or H. influenzae for the Gram-negative bacteria.
    • (1987) Microbiol. Rev. , vol.51 , pp. 221
    • Woese, C.R.1
  • 23
    • 0023184331 scopus 로고
    • C. R. Woese, Microbiol. Rev. 51, 221 (1987); _, R. Overbeek, G. J. Olsen, J. Bacteriol. 176, 1 (1994); N. R. Pace, Science 276, 734 (1997). A BeT to a given clade was registered if detected in any of the constituent species, for example, in E. coli or H. influenzae for the Gram-negative bacteria.
    • (1994) J. Bacteriol. , vol.176 , pp. 1
    • Overbeek, R.1    Olsen, G.J.2
  • 24
    • 0030982247 scopus 로고    scopus 로고
    • C. R. Woese, Microbiol. Rev. 51, 221 (1987); _, R. Overbeek, G. J. Olsen, J. Bacteriol. 176, 1 (1994); N. R. Pace, Science 276, 734 (1997). A BeT to a given clade was registered if detected in any of the constituent species, for example, in E. coli or H. influenzae for the Gram-negative bacteria.
    • (1997) Science , vol.276 , pp. 734
    • Pace, N.R.1
  • 25
    • 0029074668 scopus 로고
    • H. Watanabe and J. Otsuka, Comput. Appl. Biosci. 11, 159 (1995); E. V. Koonin, R. L. Tatusov, K. E. Rudd, Methods Enzymol. 266, 295 (1996).
    • (1995) Comput. Appl. Biosci. , vol.11 , pp. 159
    • Watanabe, H.1    Otsuka, J.2
  • 28
    • 1842404646 scopus 로고    scopus 로고
    • note
    • A single-linkage clustering procedure was used with random match probability, P < 0.001, as the cutoff (14).
  • 29
    • 1842404044 scopus 로고    scopus 로고
    • note
    • A searchable database of COGs is available at http:// www.ncbi.nlm.nih.gov/COG. Each COG was assigned a unique identification number, which includes a letter for the functional category (19) and a number (see examples in Fig. 1 and Tables 1 and 2).
  • 31
    • 0027133165 scopus 로고
    • The broad functional categories of proteins were as defined previously (9), except that transcription was separated from replication, recombination, and repair. This classification is a modification of the system originally developed for E. coli proteins [M. Riley, Microbiol. Rev. 57, 862 (1993)].
    • (1993) Microbiol. Rev. , vol.57 , pp. 862
    • Riley, M.1
  • 32
    • 0031000180 scopus 로고    scopus 로고
    • A partially similar representation of some of the protein families from complete genomes has been recently published [R. A. Clayton, O. White, K. A. Ketchum, J. C. Venter, Nature 387, 459 (1997)].
    • (1997) Nature , vol.387 , pp. 459
    • Clayton, R.A.1    White, O.2    Ketchum, K.A.3    Venter, J.C.4
  • 38
    • 0024722874 scopus 로고
    • J. P. Gogarten et al., Proc. Natl. Acad. Sci. U.S.A. 86, 6661 (1989); N. Iwabe et al., ibid., p. 9355; J. P. Gogarten, E. Hilario, L. Olendzewski, in Evolution of Microbial Life, D. McL. Roberts, P. Sharp, G. Alderson, M. Collins, Eds. (Cambridge Univ. Press, Cambridge, 1996), pp. 267-292.
    • (1989) Proc. Natl. Acad. Sci. U.S.A. , vol.86 , pp. 6661
    • Gogarten, J.P.1
  • 39
    • 24544459277 scopus 로고    scopus 로고
    • J. P. Gogarten et al., Proc. Natl. Acad. Sci. U.S.A. 86, 6661 (1989); N. Iwabe et al., ibid., p. 9355; J. P. Gogarten, E. Hilario, L. Olendzewski, in Evolution of Microbial Life, D. McL. Roberts, P. Sharp, G. Alderson, M. Collins, Eds. (Cambridge Univ. Press, Cambridge, 1996), pp. 267-292.
    • Proc. Natl. Acad. Sci. U.S.A. , pp. 9355
    • Iwabe, N.1
  • 40
    • 0002375347 scopus 로고    scopus 로고
    • D. McL. Roberts, P. Sharp, G. Alderson, M. Collins, Eds. Cambridge Univ. Press, Cambridge
    • J. P. Gogarten et al., Proc. Natl. Acad. Sci. U.S.A. 86, 6661 (1989); N. Iwabe et al., ibid., p. 9355; J. P. Gogarten, E. Hilario, L. Olendzewski, in Evolution of Microbial Life, D. McL. Roberts, P. Sharp, G. Alderson, M. Collins, Eds. (Cambridge Univ. Press, Cambridge, 1996), pp. 267-292.
    • (1996) Evolution of Microbial Life , pp. 267-292
    • Gogarten, J.P.1    Hilario, E.2    Olendzewski, L.3
  • 41
    • 0030801002 scopus 로고    scopus 로고
    • S. F. Altschul et al., Nucleic Acids Res. 25, 3389 (1997). The probability of a random match, P < 0.001, was used in all PSI-BLAST searches.
    • (1997) Nucleic Acids Res. , vol.25 , pp. 3389
    • Altschul, S.F.1
  • 42
    • 0001607723 scopus 로고
    • J. E. Walker, M. Saraste, M. J. Runswick, N. J. Gay, EMBO J. 1, 945 (1982); A. E. Gorbalenya and E. V. Koonin, Nucleic Acids Res. 17, 8413 (1989); M. Saraste, P. R. Sibbald, A. Wittinghofer, Trends Biochem. Sci. 15, 430 (1990).
    • (1982) EMBO J. , vol.1 , pp. 945
    • Walker, J.E.1    Saraste, M.2    Runswick, M.J.3    Gay, N.J.4
  • 43
    • 0024462161 scopus 로고
    • J. E. Walker, M. Saraste, M. J. Runswick, N. J. Gay, EMBO J. 1, 945 (1982); A. E. Gorbalenya and E. V. Koonin, Nucleic Acids Res. 17, 8413 (1989); M. Saraste, P. R. Sibbald, A. Wittinghofer, Trends Biochem. Sci. 15, 430 (1990).
    • (1989) Nucleic Acids Res. , vol.17 , pp. 8413
    • Gorbalenya, A.E.1    Koonin, E.V.2
  • 44
    • 0025048136 scopus 로고
    • J. E. Walker, M. Saraste, M. J. Runswick, N. J. Gay, EMBO J. 1, 945 (1982); A. E. Gorbalenya and E. V. Koonin, Nucleic Acids Res. 17, 8413 (1989); M. Saraste, P. R. Sibbald, A. Wittinghofer, Trends Biochem. Sci. 15, 430 (1990).
    • (1990) Trends Biochem. Sci. , vol.15 , pp. 430
    • Saraste, M.1    Sibbald, P.R.2    Wittinghofer, A.3
  • 45
    • 1842410612 scopus 로고    scopus 로고
    • Protein sequences can be submitted for searching against COGs at http://www.ncbi.nlm.nih.gov/ COG/cognitor.html
    • Protein sequences can be submitted for searching against COGs at http://www.ncbi.nlm.nih.gov/ COG/cognitor.html
  • 47
    • 0029807124 scopus 로고    scopus 로고
    • G. Chanfreau, S. M. Noble, C. Guthrie, Science 274, 1511 (1996); A. Jenny, L. Minvielle-Sebastia, P. J. Preker, W. Keller, ibid. 274, 1514 (1996); G. Stumpf and H. Domdey, ibid., p. 1517.
    • (1996) Science , vol.274 , pp. 1511
    • Chanfreau, G.1    Noble, S.M.2    Guthrie, C.3
  • 49
    • 0029804177 scopus 로고    scopus 로고
    • G. Chanfreau, S. M. Noble, C. Guthrie, Science 274, 1511 (1996); A. Jenny, L. Minvielle-Sebastia, P. J. Preker, W. Keller, ibid. 274, 1514 (1996); G. Stumpf and H. Domdey, ibid., p. 1517.
    • Science , pp. 1517
    • Stumpf, G.1    Domdey, H.2
  • 50
    • 0030835739 scopus 로고    scopus 로고
    • J.-F. Tomb et al., Nature 388, 539 (1997).
    • (1997) Nature , vol.388 , pp. 539
    • Tomb, J.-F.1
  • 52
    • 0027399170 scopus 로고
    • P. Green et al., Science 259, 1711 (1993).
    • (1993) Science , vol.259 , pp. 1711
    • Green, P.1
  • 64
    • 0029874435 scopus 로고    scopus 로고
    • B. E. Alber and J. G. Ferry, Proc. Natl. Acad. Sci. U.S.A. 91, 6909 (1994); C. Kisker et al., EMBO J. 15, 2323 (1996).
    • (1996) EMBO J. , vol.15 , pp. 2323
    • Kisker, C.1
  • 65
    • 0029144599 scopus 로고    scopus 로고
    • E. V. Koonin, Protein Sci. 4, 1608 (1995); M. N. Rozanov and E. V. Koonin, unpublished observations.
    • (1995) Protein Sci. , vol.4 , pp. 1608
    • Koonin, E.V.1
  • 66
    • 0029144599 scopus 로고    scopus 로고
    • unpublished observations
    • E. V. Koonin, Protein Sci. 4, 1608 (1995); M. N. Rozanov and E. V. Koonin, unpublished observations.
    • Rozanov, M.N.1    Koonin, E.V.2
  • 67
    • 1842415310 scopus 로고    scopus 로고
    • note
    • We thank A. Schaffer for modifying the PSI-BLAST program; R. Walker, H. Watanabe, and M. Rozanov for valuable help with data analysis; K. Rudd, T. Wolfsberg, and D. Landsman for unpublished data; and P. Bork, M. Galperin, M. Gelfand, A. Mushegian, P. Pevzner, M. Roytberg, M. Rozanov, and R. Walker for helpful discussions.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.