메뉴 건너뛰기




Volumn 301, Issue 5629, 2003, Pages 71-76

Finding functional features in Saccharomyces genomes by phylogenetic footprinting

Author keywords

[No Author keywords available]

Indexed keywords

BIOTECHNOLOGY; DNA; YEAST;

EID: 0038724989     PISSN: 00368075     EISSN: None     Source Type: Journal    
DOI: 10.1126/science.1084337     Document Type: Article
Times cited : (714)

References (63)
  • 3
    • 0031593199 scopus 로고    scopus 로고
    • R. Hardison et al., Gene 205, 73 (1997).
    • (1997) Gene , vol.205 , pp. 73
    • Hardison, R.1
  • 5
    • 0038063894 scopus 로고    scopus 로고
    • note
    • The strains are S. bayanus (623-6c), S. mikatae (IFO 1815), S. castellii (NRRL Y-12630), S. kluyveri (NRRL Y-12651), and S. kudriavzevii (IFO 1802).
  • 10
    • 0037726239 scopus 로고    scopus 로고
    • note
    • Libraries of genomic DNA fragments of the species were constructed in plasmid pOT4. Plating and sequencing of plasmid library subclones as well as sample loading, data collection, and processing were done as described at http://genome.wustl.edu/tools/protocols/. The sequences were assembled using the PHRAP (phragment assembly program; www.phrap.org) (51) using the following parameters: forcelevel 1, minmatch 17, minscore 40, new_ace, view.
  • 13
    • 0038063893 scopus 로고    scopus 로고
    • note
    • -20) to translated intergenic regions ("NotFeatures DNA," obtained from the SGD) of S. cerevisiae were inspected manually. Finally, multiple DNA sequence alignments of the small ORFs were used to make accurate predictions of start and stop codons.
  • 14
    • 0037726237 scopus 로고    scopus 로고
    • note
    • Genomic sequences of the five species were compared with annotated genes in S. cerevisiae using TBLASTN to identify those that are likely to be false. ORFs were termed false if most (≥50%) of the top scoring alignments contain frameshift or stop codons, (for overlapping ORFS, only one species had to have a frameshift or stop codon to be called false). Most of the false ORFs contain frame shifts or multiple stop codons in two or more species.
  • 15
    • 0038740328 scopus 로고    scopus 로고
    • note
    • S. cerevisiae genomic sequences and annotations were obtained from the SGD (www.yeastgenome. org; July 24, 2002 version).
  • 16
    • 0038740321 scopus 로고    scopus 로고
    • note
    • An additional 82 annotated S. cerevisiae ORFs do not have significant similarity to sequences in any of the Saccharomyces species, and are therefore likely not to be genes, but because we do not have complete genome sequences of the other species we cannot be certain of this.
  • 17
    • 0037726238 scopus 로고    scopus 로고
    • note
    • All changes to the S. cerevisiae genome annotation were submitted to the SGD.
  • 20
    • 0038063885 scopus 로고    scopus 로고
    • note
    • -10) were recorded. Promoter sequences from the -1 position, relative to the ATG start codon, up to 1000 bp (if available) upstream, or to the next gene, were extracted. [S. cerevisiae genes overlapping or immediately adjacent to other genomic features (692 of the 6359 total genes listed in SGD) were not included in the analysis.] The orthologous promoter sequences of the sensu stricto species were aligned using CLUSTALW (15). Low-scoring alignments (less than 20% identity) and alignments lacking at least one run of six exact nucleotides were manually inspected. When appropriate, sequences were removed from the alignments to improve the quality (and accuracy) of the alignment. For example, we removed sequences if there were multiple sequences from one species (perhaps duplication of an orthologous gene in the other species) or if there was one sequence not similar enough to align with the rest of the sequences (perhaps a miscalled ortholog). After manually editing the sensu stricto species alignments, sequences from the more distantly related species were added to the files. Because these sequences are too diverged from the other sequences to align accurately we could not use alignments to inform ambiguous orthology calls. Multiple orthologs were retained for a given species unless one of them clearly had a more significant BLASTX hit. In many cases these decisions had to be made manually because of incomplete sequence coverage (that is, partial gene coverage and potential frame shifts or sequencing errors that affect the BLAST score).
  • 21
    • 0038402171 scopus 로고    scopus 로고
    • note
    • Sixty-two percent of intergenic regions (3523) are available from a[[three sensu stricto species; 86% (4908) are available from two of the three sensu stricto species; 2292 intergenic regions (40% of the total) are available from all six species.
  • 22
    • 0038740319 scopus 로고    scopus 로고
    • note
    • The sequences are available at GenBank (project accession numbers: S.kluyved, AACE00000000; S. castellii, AACF00000000; S. bayanus, AACG00000000; S.mikatae, AACH00000000; S. kudriavzevii, AACl000000000). The sequences are also available on the SGD Web site (www.yeastgenome.org) and at www.genetics.wustl.edu/ saccharomycesgenomes/.
  • 23
    • 0038740327 scopus 로고    scopus 로고
    • note
    • The number of identical residues in multiple alignments of orthologous promoters of the sensu stricto species (after removing terminal gaps in the alignments caused by differences in sequence length) was tabulated for each consecutive 25-bp window from the start of translation backward to the 5′ end of the promoter sequence. Essentially identical results were obtained with shorter window lengths.
  • 24
    • 0038402169 scopus 로고    scopus 로고
    • Information is available at www.genetics.wustl.edu/ saccharomycesgenomes/promoter_significance.html.
  • 26
    • 0038740320 scopus 로고    scopus 로고
    • note
    • The CLUSTALW alignments of 59 genes were at least 75% identical in the 25 nt upstream of the ATG codon (fig. S2).
  • 29
    • 0038063886 scopus 로고    scopus 로고
    • note
    • Sixty percent of alignments of orthologous sequences upstream of genes encoding ribosomal proteins, but only 3% of alignments of sequence upstream of other genes have 70% of greater identity over the 30 nt adjacent to the ATG codon.
  • 31
    • 0037726230 scopus 로고    scopus 로고
    • note
    • Of 3523 intergenic regions, 390 are greater than 50% identical; an additional 379 are greater than 45% identical.
  • 32
    • 0038063889 scopus 로고    scopus 로고
    • note
    • Known transcription factor binding sites included Gcn4, Hap2, Mbf1, Ndt80, Pho4, Reb1, Rpn4, SCB, Ste12, Upc2, Mac1, and Gln3.
  • 37
    • 0038063888 scopus 로고    scopus 로고
    • note
    • The nine known gapped sequence motifs are the binding sites for Gal4, Abf1, Lys14, Leu3, Cha4, Put3, Uga3, Hap1, and PQr1. The characterized motifs that we miss are long and strictly conserved (six of the seven are 8 nt or greater in length), probably because the motifs have not been sufficiently characterized to define their essential positions.
  • 38
    • 0037726231 scopus 로고    scopus 로고
    • note
    • Four-way CLUSTALW promoter alignments with 40% or less identity were used in the analysis (2377 of 3523 alignments). Forty percent was arbitrarily chosen as a cutoff to remove alignments with too much similarity. The CLUSTALW alignments were first modified to remove terminal unaligned sequence from the output. The alignments were shuffled 10,000 independent times using the shuffle utility program in SQUID (www.genetics.wustl.edu/eddy/software/#squid). Runs of identical sequence aligned in a[[four species in the real alignments and the shuffled alignments were extracted and counted.
  • 39
    • 0037726229 scopus 로고    scopus 로고
    • note
    • Six-mers are not statistically significant in the six-way comparisons and were therefore not tabulated; n-mers longer than 10 nt are quite rare in these comparisons.
  • 40
    • 0038063887 scopus 로고    scopus 로고
    • note
    • Sequences of 100 shuffled multiple intergenic sequence alignments of sensu stricto species were extracted and combined with intergenic sequences from the two distantly related species, n-oligomers present in all species were identified in the real promoter sets and compared with those present in the shuffled data sets.
  • 41
    • 0038740322 scopus 로고    scopus 로고
    • note
    • For example, essentially all of the 10-mers conserved in the sensu stricto species' sequences (considered because there is a high degree of confidence that they are not chance occurrences) that occur frequently in the genome (considered because those are likely to be functional) are known: 28 of 32 (present in alignments of sequences upstream of at least seven different genes) are accounted for by the following previously identified sequence motifs: eight variations of the PAC motif, seven variations of the RRPE motif, seven variations of a Ume6 binding site, two variations of the PACE motif (Rpn4 binding site), two variations of an Mbp1 binding site, and one each of the binding sites for Ndt8O and Reb1; the remaining four most frequent n-mers were simple A+Trich sequences. Similarly, 94 of the 160 conserved 10mers identified in the six-way sequence comparisons correspond to known sequence motifs table S4), all of which occur upstream of genes known or likely to be regulated by the factor that binds to them. Of the remaining 66 conserved sequence motifs, 46 are A+Trich, and 11 are immediately upstream of the translation initiation codon of genes encoding ribosomal proteins and thus may be translational regulatory sequences. This leaves only nine that are reasonable candidates for new regulatory sequences, four of which are conserved in the sequences upstream of several genes of similar function (the motifs marked with an asterisk in table S4).
  • 42
    • 0038740323 scopus 로고    scopus 로고
    • note
    • We were surprised to find that only 15.7% of the 3523 multiple alignments of sensu stricto species promoters contain one of the seven sequences that have TATA element function (TATAA, TATATA, TATTTA, CATTTA, TTTAAT, TAATAA, TATAA) (52) conserved and aligned within 250 bp of the translational start codon. Even if the stringency of the search criteria is relaxed to allow for unaligned TATA boxes, promoters containing this sequence element are still in the minority:. Only 42.8% of the sensu stricto species' promoters contain any one of the seven TATA sequences (52) anywhere within 250 bp of the translational start codon in all four orthologs. Furthermore, 142 promoters (4%) do not contain a TATA element in any of the four sensu stricto species. Thus, it appears that TATA-containing promoters are the minority in S. cerevisiae.
  • 43
    • 0038402170 scopus 로고    scopus 로고
    • note
    • Because many of conserved n-mers are longer than the typical 6 to 8 bp that are required for a transcription factor to bind to DNA, we extracted all unique 6- to 8-mers from the longer conserved n-mers to test these for functional enrichment and coherent expression. Each unique n-mer had to be present in at least five different intergenic regions to test for the functional enrichment or coherent expression.
  • 44
    • 0038063891 scopus 로고    scopus 로고
    • note
    • Functional enrichment was based on the Munich Information Center for Protein Sequences (MIPS) classification of S. cerevisiae genes. Functional enrichment and the associated P values were calculated as in (S3).
  • 47
    • 0038402172 scopus 로고    scopus 로고
    • note
    • Expression coherence was calculated as described previously (54). Expression coherence was calculated for cell cycle (55), meiosis (56), methyl methanesulfonate (MMS) damage (57), sporulation (58), stress response (59), DNA damage (60), mitogen-activated protein kinase (MAPK) (61), and mitochondrial dysfunction (62) data sets.
  • 48
    • 0037174671 scopus 로고    scopus 로고
    • T. I. Lee et al., Science 298, 799 (2002).
    • (2002) Science , vol.298 , pp. 799
    • Lee, T.I.1
  • 49
    • 0037726233 scopus 로고    scopus 로고
    • note
    • Using high-quality weight matrices for the binding sites of 23 transcription factors whose consensus binding sites are known, we identified: (i) all the occurrences of a particular binding site in the intergenic regions of S. cerevisiae using Patser (34), and then (ii) those occurrences that are conserved in the orthologous promoters in the other Saccharomyces species and/or are aligned in the CLUSTALW alignments of intergenic sequences of sensu stricto species. Sets of intergenic regions that bind to a particular DNA binding protein come from the data of Lee et al. at P value < 0.001 (48). The motif alignments for known transcription factor binding sites were generated by applying AlignACE (33) on the appropriate MIPS functional category.
  • 50
    • 0038402177 scopus 로고    scopus 로고
    • note
    • Thirty-six n-mers are upstream of genes that are functionally enriched [18 from the sensu stricto sequence alignments and 18 from the six-way sequence comparisons (Table 1)], 52 n-mers are identified by coherent expression [39 from the sensu stricto sequence alignments and 13 from the six-way sequence comparisons (Table 2)], and 13 are from upstream of genes bound by a particular transcription factor [nine from the sensu stricto sequence alignments and four from the six-way sequence comparisons (Table 3)]. Twenty-two n-mers are found in more than one data set, leaving 79 conserved sequence motifs linked to a function.
  • 51
    • 0038402173 scopus 로고    scopus 로고
    • P. Green, PHRAP, Department of Genome Sciences, University of Washington, Seattle, WA
    • P. Green, PHRAP, Department of Genome Sciences, University of Washington, Seattle, WA.
  • 55
    • 0032112293 scopus 로고    scopus 로고
    • R. J. Cho et al., Mol. Cell 2, 65 (1998).
    • (1998) Mol. Cell , vol.2 , pp. 65
    • Cho, R.J.1
  • 58
    • 0032561246 scopus 로고    scopus 로고
    • S. Chu et al., Science 282, 699 (1998).
    • (1998) Science , vol.282 , pp. 699
    • Chu, S.1
  • 61
    • 0034603061 scopus 로고    scopus 로고
    • C. J. Roberts et al., Science 287, 873 (2000).
    • (2000) Science , vol.287 , pp. 873
    • Roberts, C.J.1
  • 63
    • 0037726232 scopus 로고    scopus 로고
    • note
    • We thank E. Louis (University of Leicester) for invaluable advice on the Saccharomyces phylogeny, and for providing yeast strains; our Washington University colleagues M. Brent, J. Buhler, S. Eddy and members of his lab, S.-W. Ho, and G. Stormo, as well as E. Siggia (Rockefeller University) and R. Young (MIT) for advice and insightful comments on the manuscript. This project was funded by a grant from NIH (RO1 GM63803).


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.