SCOPUS 정보 검색 플랫폼

Volumn 301, Issue 5629, 2003, Pages 71-76

Finding functional features in Saccharomyces genomes by phylogenetic footprinting

(9) Cliften, Paul a Sudarsanam, Priya a Desikan, Ashwin a Fulton, Lucinda a Fulton, Bob a Majors, John a Waterston, Robert a Cohen, Barak A a Johnston, Mark a

a WASHINGTON UNIVERSITY SCHOOL OF MEDICINE (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BIOTECHNOLOGY; DNA; YEAST;

PHYLOGENETIC FOOTPRINTS;

GENES;

FUNGAL DNA; TRANSCRIPTION FACTOR;

DNA; FOOTPRINT; GENOME; PHYLOGENETICS;

ARTICLE; CODON; DNA FOOTPRINTING; DNA SEQUENCE; FUNGAL GENE; GENE FUNCTION; NONHUMAN; PRIORITY JOURNAL; PROTEIN MOTIF; SACCHAROMYCES; SEQUENCE ALIGNMENT; SEQUENCE ANALYSIS; SPECIES DIFFERENCE; TATA BOX; TRANSCRIPTION REGULATION;

ALGORITHMS; BASE SEQUENCE; BINDING SITES; COMPUTATIONAL BIOLOGY; CONSERVED SEQUENCE; DNA, INTERGENIC; GENE EXPRESSION PROFILING; GENES, FUNGAL; GENOME, FUNGAL; MOLECULAR SEQUENCE DATA; PHYLOGENY; REGULATORY SEQUENCES, NUCLEIC ACID; SACCHAROMYCES; SACCHAROMYCES CEREVISIAE; SEQUENCE ALIGNMENT; SEQUENCE ANALYSIS, DNA; TRANSCRIPTION FACTORS;

PROKARYOTA; SACCHAROMYCES;

EID: 0038724989 PISSN: 00368075 EISSN: None Source Type: Journal
DOI: 10.1126/science.1084337 Document Type: Article

Times cited : (714)

References (63)

1
- 0030722935
- R. C. Hardison, J. Oeltjen, W. Miller, Genome Res. 7, 959 (1997).
- (1997) Genome Res. , vol.7 , pp. 959
- Hardison, R.C.¹ Oeltjen, J.² Miller, W.³

2
- 0023821768
- D. A. Tagle et al., J. Mol. Biol. 203, 439 (1988).
- (1988) J. Mol. Biol. , vol.203 , pp. 439
- Tagle, D.A.¹

3
- 0031593199
- R. Hardison et al., Gene 205, 73 (1997).
- (1997) Gene , vol.205 , pp. 73
- Hardison, R.¹

4
- 0034919548
- P. F. Cliften et al., Genome Res. 11, 1175 (2001).
- (2001) Genome Res. , vol.11 , pp. 1175
- Cliften, P.F.¹

5
- 0038063894
- note
- The strains are S. bayanus (623-6c), S. mikatae (IFO 1815), S. castellii (NRRL Y-12630), S. kluyveri (NRRL Y-12651), and S. kudriavzevii (IFO 1802).

6
- 0032707372
- R. F. Petersen, T. Nilsson-Tillgren, J. Piskur, Int. J. Syst. Bacteriol. 49 Pt 4, 1925 (1999).
- (1999) Int. J. Syst. Bacteriol. , vol.49 , Issue.PART 4 , pp. 1925
- Petersen, R.F.¹ Nilsson-Tillgren, T.² Piskur, J.³

7
- 0037137922
- K. Moller et al., Biotechnol. Bioeng. 77, 186 (2002).
- (2002) Biotechnol. Bioeng. , vol.77 , pp. 186
- Moller, K.¹

8
- 0034704325
- B. Llorente et al., FEBS Lett. 487, 101 (2000).
- (2000) FEBS Lett. , vol.487 , pp. 101
- Llorente, B.¹

9
- 0034713383
- G. Fischer, S. A. James, I. N. Roberts, S. G. Oliver, E. J. Louis, Nature 405, 451 (2000).
- (2000) Nature , vol.405 , pp. 451
- Fischer, G.¹ James, S.A.² Roberts, I.N.³ Oliver, S.G.⁴ Louis, E.J.⁵

10
- 0037726239
- note
- Libraries of genomic DNA fragments of the species were constructed in plasmid pOT4. Plating and sequencing of plasmid library subclones as well as sample loading, data collection, and processing were done as described at http://genome.wustl.edu/tools/protocols/. The sequences were assembled using the PHRAP (phragment assembly program; www.phrap.org) (51) using the following parameters: forcelevel 1, minmatch 17, minscore 40, new_ace, view.

11
- 0023988195
- E. S. Lander, M. S. Waterman, Genomics 2, 231 (1988).
- (1988) Genomics , vol.2 , pp. 231
- Lander, E.S.¹ Waterman, M.S.²

12
- 0035054380
- D. Gordon, C. Desmarais, P. Green, Genome Res. 11, 614 (2001).
- (2001) Genome Res. , vol.11 , pp. 614
- Gordon, D.¹ Desmarais, C.² Green, P.³

13
- 0038063893
- note
- -20) to translated intergenic regions ("NotFeatures DNA," obtained from the SGD) of S. cerevisiae were inspected manually. Finally, multiple DNA sequence alignments of the small ORFs were used to make accurate predictions of start and stop codons.

14
- 0037726237
- note
- Genomic sequences of the five species were compared with annotated genes in S. cerevisiae using TBLASTN to identify those that are likely to be false. ORFs were termed false if most (≥50%) of the top scoring alignments contain frameshift or stop codons, (for overlapping ORFS, only one species had to have a frameshift or stop codon to be called false). Most of the false ORFs contain frame shifts or multiple stop codons in two or more species.

15
- 0038740328
- note
- S. cerevisiae genomic sequences and annotations were obtained from the SGD (www.yeastgenome. org; July 24, 2002 version).

16
- 0038740321
- note
- An additional 82 annotated S. cerevisiae ORFs do not have significant similarity to sequences in any of the Saccharomyces species, and are therefore likely not to be genes, but because we do not have complete genome sequences of the other species we cannot be certain of this.

17
- 0037726238
- note
- All changes to the S. cerevisiae genome annotation were submitted to the SGD.

18
- 0027968068
- J. D. Thompson, D. G. Higgins, T. J. Gibson, Nucleic Acids Res. 22, 4673 (1994),
- (1994) Nucleic Acids Res. , vol.22 , pp. 4673
- Thompson, J.D.¹ Higgins, D.G.² Gibson, T.J.³

19
- 0025183708
- S, F. Altschul, W. Gish, W. Miller, E. W. Myers, D. J. Lipman, J. Mol. Biol. 215, 403 (1990).
- (1990) J. Mol. Biol. , vol.215 , pp. 403
- Altschul, S.F.¹ Gish, W.² Miller, W.³ Myers, E.W.⁴ Lipman, D.J.⁵

20
- 0038063885
- note
- -10) were recorded. Promoter sequences from the -1 position, relative to the ATG start codon, up to 1000 bp (if available) upstream, or to the next gene, were extracted. [S. cerevisiae genes overlapping or immediately adjacent to other genomic features (692 of the 6359 total genes listed in SGD) were not included in the analysis.] The orthologous promoter sequences of the sensu stricto species were aligned using CLUSTALW (15). Low-scoring alignments (less than 20% identity) and alignments lacking at least one run of six exact nucleotides were manually inspected. When appropriate, sequences were removed from the alignments to improve the quality (and accuracy) of the alignment. For example, we removed sequences if there were multiple sequences from one species (perhaps duplication of an orthologous gene in the other species) or if there was one sequence not similar enough to align with the rest of the sequences (perhaps a miscalled ortholog). After manually editing the sensu stricto species alignments, sequences from the more distantly related species were added to the files. Because these sequences are too diverged from the other sequences to align accurately we could not use alignments to inform ambiguous orthology calls. Multiple orthologs were retained for a given species unless one of them clearly had a more significant BLASTX hit. In many cases these decisions had to be made manually because of incomplete sequence coverage (that is, partial gene coverage and potential frame shifts or sequencing errors that affect the BLAST score).

21
- 0038402171
- note
- Sixty-two percent of intergenic regions (3523) are available from a[[three sensu stricto species; 86% (4908) are available from two of the three sensu stricto species; 2292 intergenic regions (40% of the total) are available from all six species.

22
- 0038740319
- note
- The sequences are available at GenBank (project accession numbers: S.kluyved, AACE00000000; S. castellii, AACF00000000; S. bayanus, AACG00000000; S.mikatae, AACH00000000; S. kudriavzevii, AACl000000000). The sequences are also available on the SGD Web site (www.yeastgenome.org) and at www.genetics.wustl.edu/ saccharomycesgenomes/.

23
- 0038740327
- note
- The number of identical residues in multiple alignments of orthologous promoters of the sensu stricto species (after removing terminal gaps in the alignments caused by differences in sequence length) was tabulated for each consecutive 25-bp window from the start of translation backward to the 5′ end of the promoter sequence. Essentially identical results were obtained with shorter window lengths.

24
- 0038402169
- Information is available at www.genetics.wustl.edu/ saccharomycesgenomes/promoter_significance.html.

25
- 0029559463
- K. Struhl, Annu. Rev. Genet. 29, 651 (1995).
- (1995) Annu. Rev. Genet. , vol.29 , pp. 651
- Struhl, K.¹

26
- 0038740320
- note
- The CLUSTALW alignments of 59 genes were at least 75% identical in the 25 nt upstream of the ATG codon (fig. S2).

27
- 0031000833
- J. Vilardell, J. R. Wamer, Mol. Cell Biol. 17, 1959 (1997).
- (1997) Mol. Cell Biol. , vol.17 , pp. 1959
- Vilardell, J.¹ Wamer, J.R.²

28
- 0009407476
- A. G. Hinnebusch, Proc. Natl. Acad. Sci. U.S.A. 81, 6442 (1984).
- (1984) Proc. Natl. Acad. Sci. U.S.A. , vol.81 , pp. 6442
- Hinnebusch, A.G.¹

29
- 0038063886
- note
- Sixty percent of alignments of orthologous sequences upstream of genes encoding ribosomal proteins, but only 3% of alignments of sequence upstream of other genes have 70% of greater identity over the 30 nt adjacent to the ATG codon.

30
- 0036180002
- N. Rajewsky, N. D. Soccii, M. Zapotocky, E. D. Siggia, Genome Res. 12, 298 (2002).
- (2002) Genome Res. , vol.12 , pp. 298
- Rajewsky, N.¹ Soccii, N.D.² Zapotocky, M.³ Siggia, E.D.⁴

31
- 0037726230
- note
- Of 3523 intergenic regions, 390 are greater than 50% identical; an additional 379 are greater than 45% identical.

32
- 0038063889
- note
- Known transcription factor binding sites included Gcn4, Hap2, Mbf1, Ndt80, Pho4, Reb1, Rpn4, SCB, Ste12, Upc2, Mac1, and Gln3.

33
- 0034628901
- J. D. Hughes, P. W. Estep, S. Tavazoie, G. M. Church, J. Mol. Biol. 296, 1205 (2000).
- (2000) J. Mol. Biol. , vol.296 , pp. 1205
- Hughes, J.D.¹ Estep, P.W.² Tavazoie, S.³ Church, G.M.⁴

34
- 0032826179
- G. Z. Hertz, G. D. Stormo, Bioinformatics 15, 563 (1999).
- (1999) Bioinformatics , vol.15 , pp. 563
- Hertz, G.Z.¹ Stormo, G.D.²

35
- 0027912333
- C. E. Lawrence et al., Science 262, 208 (1993).
- (1993) Science , vol.262 , pp. 208
- Lawrence, C.E.¹

36
- 0029197507
- T. L. Bailey, C. Elkan, Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 21 (1995).
- (1995) Proc. Int. Conf. Intell. Syst. Mol. Biol. , vol.3 , pp. 21
- Bailey, T.L.¹ Elkan, C.²

37
- 0038063888
- note
- The nine known gapped sequence motifs are the binding sites for Gal4, Abf1, Lys14, Leu3, Cha4, Put3, Uga3, Hap1, and PQr1. The characterized motifs that we miss are long and strictly conserved (six of the seven are 8 nt or greater in length), probably because the motifs have not been sufficiently characterized to define their essential positions.

38
- 0037726231
- note
- Four-way CLUSTALW promoter alignments with 40% or less identity were used in the analysis (2377 of 3523 alignments). Forty percent was arbitrarily chosen as a cutoff to remove alignments with too much similarity. The CLUSTALW alignments were first modified to remove terminal unaligned sequence from the output. The alignments were shuffled 10,000 independent times using the shuffle utility program in SQUID (www.genetics.wustl.edu/eddy/software/#squid). Runs of identical sequence aligned in a[[four species in the real alignments and the shuffled alignments were extracted and counted.

39
- 0037726229
- note
- Six-mers are not statistically significant in the six-way comparisons and were therefore not tabulated; n-mers longer than 10 nt are quite rare in these comparisons.

40
- 0038063887
- note
- Sequences of 100 shuffled multiple intergenic sequence alignments of sensu stricto species were extracted and combined with intergenic sequences from the two distantly related species, n-oligomers present in all species were identified in the real promoter sets and compared with those present in the shuffled data sets.

41
- 0038740322
- note
- For example, essentially all of the 10-mers conserved in the sensu stricto species' sequences (considered because there is a high degree of confidence that they are not chance occurrences) that occur frequently in the genome (considered because those are likely to be functional) are known: 28 of 32 (present in alignments of sequences upstream of at least seven different genes) are accounted for by the following previously identified sequence motifs: eight variations of the PAC motif, seven variations of the RRPE motif, seven variations of a Ume6 binding site, two variations of the PACE motif (Rpn4 binding site), two variations of an Mbp1 binding site, and one each of the binding sites for Ndt8O and Reb1; the remaining four most frequent n-mers were simple A+Trich sequences. Similarly, 94 of the 160 conserved 10mers identified in the six-way sequence comparisons correspond to known sequence motifs table S4), all of which occur upstream of genes known or likely to be regulated by the factor that binds to them. Of the remaining 66 conserved sequence motifs, 46 are A+Trich, and 11 are immediately upstream of the translation initiation codon of genes encoding ribosomal proteins and thus may be translational regulatory sequences. This leaves only nine that are reasonable candidates for new regulatory sequences, four of which are conserved in the sequences upstream of several genes of similar function (the motifs marked with an asterisk in table S4).

42
- 0038740323
- note
- We were surprised to find that only 15.7% of the 3523 multiple alignments of sensu stricto species promoters contain one of the seven sequences that have TATA element function (TATAA, TATATA, TATTTA, CATTTA, TTTAAT, TAATAA, TATAA) (52) conserved and aligned within 250 bp of the translational start codon. Even if the stringency of the search criteria is relaxed to allow for unaligned TATA boxes, promoters containing this sequence element are still in the minority:. Only 42.8% of the sensu stricto species' promoters contain any one of the seven TATA sequences (52) anywhere within 250 bp of the translational start codon in all four orthologs. Furthermore, 142 promoters (4%) do not contain a TATA element in any of the four sensu stricto species. Thus, it appears that TATA-containing promoters are the minority in S. cerevisiae.

43
- 0038402170
- note
- Because many of conserved n-mers are longer than the typical 6 to 8 bp that are required for a transcription factor to bind to DNA, we extracted all unique 6- to 8-mers from the longer conserved n-mers to test these for functional enrichment and coherent expression. Each unique n-mer had to be present in at least five different intergenic regions to test for the functional enrichment or coherent expression.

44
- 0038063891
- note
- Functional enrichment was based on the Munich Information Center for Protein Sequences (MIPS) classification of S. cerevisiae genes. Functional enrichment and the associated P values were calculated as in (S3).

45
- 0026429182
- N. F. Lowndes, A. L. Johnson, L. H. Johnston, Nature 350, 247 (1991).
- (1991) Nature , vol.350 , pp. 247
- Lowndes, N.F.¹ Johnson, A.L.² Johnston, L.H.³

46
- 0026026933
- E. M. McIntosh, T. Atkinson, R. K. Storms, M. Smith, Mol. Cell Biol. 11, 329 (1991).
- (1991) Mol. Cell Biol. , vol.11 , pp. 329
- McIntosh, E.M.¹ Atkinson, T.² Storms, R.K.³ Smith, M.⁴

47
- 0038402172
- note
- Expression coherence was calculated as described previously (54). Expression coherence was calculated for cell cycle (55), meiosis (56), methyl methanesulfonate (MMS) damage (57), sporulation (58), stress response (59), DNA damage (60), mitogen-activated protein kinase (MAPK) (61), and mitochondrial dysfunction (62) data sets.

48
- 0037174671
- T. I. Lee et al., Science 298, 799 (2002).
- (2002) Science , vol.298 , pp. 799
- Lee, T.I.¹

49
- 0037726233
- note
- Using high-quality weight matrices for the binding sites of 23 transcription factors whose consensus binding sites are known, we identified: (i) all the occurrences of a particular binding site in the intergenic regions of S. cerevisiae using Patser (34), and then (ii) those occurrences that are conserved in the orthologous promoters in the other Saccharomyces species and/or are aligned in the CLUSTALW alignments of intergenic sequences of sensu stricto species. Sets of intergenic regions that bind to a particular DNA binding protein come from the data of Lee et al. at P value < 0.001 (48). The motif alignments for known transcription factor binding sites were generated by applying AlignACE (33) on the appropriate MIPS functional category.

50
- 0038402177
- note
- Thirty-six n-mers are upstream of genes that are functionally enriched [18 from the sensu stricto sequence alignments and 18 from the six-way sequence comparisons (Table 1)], 52 n-mers are identified by coherent expression [39 from the sensu stricto sequence alignments and 13 from the six-way sequence comparisons (Table 2)], and 13 are from upstream of genes bound by a particular transcription factor [nine from the sensu stricto sequence alignments and four from the six-way sequence comparisons (Table 3)]. Twenty-two n-mers are found in more than one data set, leaving 79 conserved sequence motifs linked to a function.

51
- 0038402173
- P. Green, PHRAP, Department of Genome Sciences, University of Washington, Seattle, WA
- P. Green, PHRAP, Department of Genome Sciences, University of Washington, Seattle, WA.

52
- 0025228931
- V. L Singer, C. R. Wobbe, K. Struhl, Genes Dev. 4, 636 (1990).
- (1990) Genes Dev. , vol.4 , pp. 636
- Singer, V.L.¹ Wobbe, C.R.² Struhl, K.³

53
- 0033028596
- S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho, G. M. Church, Nature Genet. 22, 281 (1999).
- (1999) Nature Genet. , vol.22 , pp. 281
- Tavazoie, S.¹ Hughes, J.D.² Campbell, M.J.³ Cho, R.J.⁴ Church, G.M.⁵

54
- 0034785443
- Y. Pilpel, P. Sudarsanam, G. M. Church, Nature Genet. 29, 153 (2001).
- (2001) Nature Genet. , vol.29 , pp. 153
- Pilpel, Y.¹ Sudarsanam, P.² Church, G.M.³

55
- 0032112293
- R. J. Cho et al., Mol. Cell 2, 65 (1998).
- (1998) Mol. Cell , vol.2 , pp. 65
- Cho, R.J.¹

56
- 0033672594
- M. Primig et al., Nature Genet. 26, 415 (2000).
- (2000) Nature Genet. , vol.26 , pp. 415
- Primig, M.¹

57
- 0033772765
- S. A. Jelinsky, P. Estep, G. M. Church, L. D. Samson, Mol. Cell Biol. 20, 8157 (2000).
- (2000) Mol. Cell Biol. , vol.20 , pp. 8157
- Jelinsky, S.A.¹ Estep, P.² Church, G.M.³ Samson, L.D.⁴

58
- 0032561246
- S. Chu et al., Science 282, 699 (1998).
- (1998) Science , vol.282 , pp. 699
- Chu, S.¹

59
- 0033637153
- A. P. Gasch et al., Mol. Biol. Cell 11, 4241 (2000).
- (2000) Mol. Biol. Cell , vol.11 , pp. 4241
- Gasch, A.P.¹

60
- 0035162698
- A. P. Gasch et al., Mol. Biol. Cell 12, 2987 (2001).
- (2001) Mol. Biol. Cell , vol.12 , pp. 2987
- Gasch, A.P.¹

61
- 0034603061
- C. J. Roberts et al., Science 287, 873 (2000).
- (2000) Science , vol.287 , pp. 873
- Roberts, C.J.¹

62
- 0035157544
- C. B. Epstein et al., Mol. Biol. Cell 12, 297 (2001).
- (2001) Mol. Biol. Cell , vol.12 , pp. 297
- Epstein, C.B.¹

63
- 0037726232
- note
- We thank E. Louis (University of Leicester) for invaluable advice on the Saccharomyces phylogeny, and for providing yeast strains; our Washington University colleagues M. Brent, J. Buhler, S. Eddy and members of his lab, S.-W. Ho, and G. Stormo, as well as E. Siggia (Rockefeller University) and R. Young (MIT) for advice and insightful comments on the manuscript. This project was funded by a grant from NIH (RO1 GM63803).

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.