-
1
-
-
0030218799
-
Finding genes by computer: The state of the art
-
Fickett JW. Finding genes by computer: the state of the art. Trends Genet. 12:1996;316-320.
-
(1996)
Trends Genet
, vol.12
, pp. 316-320
-
-
Fickett, J.W.1
-
2
-
-
0030768930
-
Computational methods for the identification of genes in vertebrate genomic sequences
-
of special interest. This well-written review gives a brief history of the various methods that have been applied to computational gene identification, summarizes the methods used by current programs, and includes web addresses for most available gene finding software as well as fairly extensive references. The author also points out some of the limitations of the current methods, for example, the inability of current algorithms to handle the complexities of overlapping genes and alternative transcription or splicing patterns, and the difficulties in predicting the beginning and end of genes.
-
Claverie J-M. Computational methods for the identification of genes in vertebrate genomic sequences. of special interest Hum Mol Genet. 6:1997;1735-1744 This well-written review gives a brief history of the various methods that have been applied to computational gene identification, summarizes the methods used by current programs, and includes web addresses for most available gene finding software as well as fairly extensive references. The author also points out some of the limitations of the current methods, for example, the inability of current algorithms to handle the complexities of overlapping genes and alternative transcription or splicing patterns, and the difficulties in predicting the beginning and end of genes.
-
(1997)
Hum Mol Genet
, vol.6
, pp. 1735-1744
-
-
Claverie J-M1
-
3
-
-
0030585734
-
Evaluation of gene structure prediction programs
-
of outstanding interest. This landmark paper provided the first large-scale, systematic, unbiased comparison of available gene-finding methods. The authors describe the construction of a large reference data set of 570 vertebrate gene sequences, critically evaluate the usefulness of a variety of predictive accuracy measures proposed previously, and introduce some new accuracy measures. They also provide the results of a systematic test of all available exon and gene prediction programs and assess the current (as of 1996) state of the gene finding problem.
-
Burset M, Guigo R. Evaluation of gene structure prediction programs. of outstanding interest Genomics. 34:1996;353-367 This landmark paper provided the first large-scale, systematic, unbiased comparison of available gene-finding methods. The authors describe the construction of a large reference data set of 570 vertebrate gene sequences, critically evaluate the usefulness of a variety of predictive accuracy measures proposed previously, and introduce some new accuracy measures. They also provide the results of a systematic test of all available exon and gene prediction programs and assess the current (as of 1996) state of the gene finding problem.
-
(1996)
Genomics
, vol.34
, pp. 353-367
-
-
Burset, M.1
Guigo, R.2
-
4
-
-
0029258948
-
Prediction of function in DNA sequence analysis
-
Gelfand MS. Prediction of function in DNA sequence analysis. J Comput Biol. 2:1995;87-115.
-
(1995)
J Comput Biol
, vol.2
, pp. 87-115
-
-
Gelfand, M.S.1
-
5
-
-
0028826042
-
FANS-REF: A bibliography on statistics and functional analysis of nucleotide sequences
-
Gelfand MS. FANS-REF: a bibliography on statistics and functional analysis of nucleotide sequences. Comput Appl Biosci. 11:1995;541.
-
(1995)
Comput Appl Biosci
, vol.11
, pp. 541
-
-
Gelfand, M.S.1
-
6
-
-
0027944605
-
A hidden Markov model that finds genes in E. coli DNA
-
Krogh A, Mian IS, Haussler D. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 22:1994;4768-4778.
-
(1994)
Nucleic Acids Res
, vol.22
, pp. 4768-4778
-
-
Krogh, A.1
Mian, I.S.2
Haussler, D.3
-
7
-
-
0000241874
-
GENMARK: Parallel gene recognition for both DNA strands
-
Borodovsky M, McIninch J. GENMARK: parallel gene recognition for both DNA strands. Comput Chem. 17:1993;123-133.
-
(1993)
Comput Chem
, vol.17
, pp. 123-133
-
-
Borodovsky, M.1
McIninch, J.2
-
9
-
-
0028136885
-
Bacterial gene transfer by genetic transformation in the environment
-
Lorenz MG, Wackernagel W. Bacterial gene transfer by genetic transformation in the environment. Microbiol Rev. 58:1994;563-602.
-
(1994)
Microbiol Rev
, vol.58
, pp. 563-602
-
-
Lorenz, M.G.1
Wackernagel, W.2
-
10
-
-
0026332291
-
Evidence for horizontal gene transfer in Escherichia coli speciation
-
Medigue C, Rouxel T, Vigier P, Henaut A, Danchin A. Evidence for horizontal gene transfer in Escherichia coli speciation. J Mol Biol. 222:1991;851-856.
-
(1991)
J Mol Biol
, vol.222
, pp. 851-856
-
-
Medigue, C.1
Rouxel, T.2
Vigier, P.3
Henaut, A.4
Danchin, A.5
-
11
-
-
16044367245
-
Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
-
Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RM, Gocayne JD, et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 273:1996;1058-1073.
-
(1996)
Science
, vol.273
, pp. 1058-1073
-
-
Bult, C.J.1
White, O.2
Olsen, G.J.3
Zhou, L.4
Fleischmann, R.D.5
Sutton, G.G.6
Blake, J.A.7
Fitzgerald, L.M.8
Clayton, R.M.9
Gocayne, J.D.10
-
12
-
-
0030569016
-
What drives codon choices in human genes?
-
Karlin S, Mrazek J. What drives codon choices in human genes? J Mol Biol. 262:1996;459-472.
-
(1996)
J Mol Biol
, vol.262
, pp. 459-472
-
-
Karlin, S.1
Mrazek, J.2
-
13
-
-
0025278259
-
Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences
-
Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J Mol Biol. 212:1990;563-578.
-
(1990)
J Mol Biol
, vol.212
, pp. 563-578
-
-
Bucher, P.1
-
15
-
-
0029038960
-
Predicting pol II promoter sequences using transcription factor binding sites
-
Prestridge DS. Predicting pol II promoter sequences using transcription factor binding sites. J Mol Biol. 249:1995;923-932.
-
(1995)
J Mol Biol
, vol.249
, pp. 923-932
-
-
Prestridge, D.S.1
-
16
-
-
0030213227
-
Interpreting cDNA sequences: Some insights from studies on translation
-
Kozak M. Interpreting cDNA sequences: some insights from studies on translation. Mamm Genome. 7:1996;563-574.
-
(1996)
Mamm Genome
, vol.7
, pp. 563-574
-
-
Kozak, M.1
-
17
-
-
0031586003
-
Prediction of complete gene structures in human genomic DNA
-
of outstanding interest. The authors introduce a probabilistic model for the structural and sequence compositional properties of genes in human genomic DNA and describe the application of this model to gene finding using the program GENSCAN. The model architecture employed is quite general, allowing for multiple complete or partial gene structures occurring on either or both DNA strands. The model also captures sequence properties of some of the most important cis elements involved in transcription, translation, and pre-mRNA splicing, as well as the length distributions of gene components such as exons and introns. The results show significant improvements in predictive accuracy over other available gene-finding methods, as measured on standard test sets of human and vertebrate geromic sequences.
-
Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. of outstanding interest J Mol Biol. 268:1997;78-94 The authors introduce a probabilistic model for the structural and sequence compositional properties of genes in human genomic DNA and describe the application of this model to gene finding using the program GENSCAN. The model architecture employed is quite general, allowing for multiple complete or partial gene structures occurring on either or both DNA strands. The model also captures sequence properties of some of the most important cis elements involved in transcription, translation, and pre-mRNA splicing, as well as the length distributions of gene components such as exons and introns. The results show significant improvements in predictive accuracy over other available gene-finding methods, as measured on standard test sets of human and vertebrate geromic sequences.
-
(1997)
J Mol Biol
, vol.268
, pp. 78-94
-
-
Burge, C.1
Karlin, S.2
-
18
-
-
0001877802
-
Splicing of precursors to mRNAs by the spliceosome
-
R.F. Gesteland, Atkins J.F. Plainview, New York: Cold Spring Harbor Laboratory Press
-
Moore MJ, Query CC, Sharp PA. Splicing of precursors to mRNAs by the spliceosome. Gesteland RF, Atkins JF. RNA World. 1993;305-358 Cold Spring Harbor Laboratory Press, Plainview, New York.
-
(1993)
RNA World
, pp. 305-358
-
-
Moore, M.J.1
Query, C.C.2
Sharp, P.A.3
-
19
-
-
0031456850
-
Classification of introns: U2-type or U12-type
-
of special interest. The authors summarize recent research that has shown: that a very small fraction of nuclear pre-mRNA introns have AT and AC dinucleotides at their 5′ and 3′ termini rather than the more common termini of GT and AG; that two distinct types of spliceosome, termed U2-type and U12-type, are present in both animal and plant cells, that individual introns are apparently spliced by only one type of spliceosome or the other; and that contrary to what was initially thought, the type of spliceosome used is not determined simply by the terminal dinucleotides, but instead depends on the presence or absence of specific internal consensus sequences at both the 5′ splice site and branch site of the intron. Known U12-type AT→AC introns, U2-type AT→AC introns, and U12-type GT→AG introns are tabulated and consensus patterns are described.
-
Sharp PA, Burge CB. Classification of introns: U2-type or U12-type. of special interest Cell. 91:1997;875-879 The authors summarize recent research that has shown: that a very small fraction of nuclear pre-mRNA introns have AT and AC dinucleotides at their 5′ and 3′ termini rather than the more common termini of GT and AG; that two distinct types of spliceosome, termed U2-type and U12-type, are present in both animal and plant cells, that individual introns are apparently spliced by only one type of spliceosome or the other; and that contrary to what was initially thought, the type of spliceosome used is not determined simply by the terminal dinucleotides, but instead depends on the presence or absence of specific internal consensus sequences at both the 5′ splice site and branch site of the intron. Known U12-type AT→AC introns, U2-type AT→AC introns, and U12-type GT→AG introns are tabulated and consensus patterns are described.
-
(1997)
Cell
, vol.91
, pp. 875-879
-
-
Sharp, P.A.1
Burge, C.B.2
-
23
-
-
0000093674
-
Modeling dependencies in pre-mRNA splicing signals
-
S. Salzberg, D.B. Searls, Kasif S. Amsterdam: Elsevier Science
-
Burge C. Modeling dependencies in pre-mRNA splicing signals. Salzberg S, Searls DB, Kasif S. Computational Methods in Molecular Biology. 1998;127-163 Elsevier Science, Amsterdam.
-
(1998)
Computational Methods in Molecular Biology
, pp. 127-163
-
-
Burge, C.1
-
24
-
-
0028895417
-
Exon recognition in vertebrate splicing
-
Berget SM. Exon recognition in vertebrate splicing. J Biol Chem. 270:1995;2411-2414.
-
(1995)
J Biol Chem
, vol.270
, pp. 2411-2414
-
-
Berget, S.M.1
-
25
-
-
0031027525
-
Identification of protein coding regions in the human genome by quadratic discriminant analysis
-
of special interest. This paper describes a program called MZEF for the predict on of internal coding exons in genomic sequences using a weighted combination of factors related to splice sites and the composition of exons and introns. The method, using quadratic discriminant analysis, is a generalization of the linear discriminant analysis approach to exon prediction used by Solovyev et al. [34] with the widely used HEXON/FGENEH program.
-
Zhang MQ. Identification of protein coding regions in the human genome by quadratic discriminant analysis. of special interest Proc Natl Acad Sci USA. 94:1997;565-568 This paper describes a program called MZEF for the predict on of internal coding exons in genomic sequences using a weighted combination of factors related to splice sites and the composition of exons and introns. The method, using quadratic discriminant analysis, is a generalization of the linear discriminant analysis approach to exon prediction used by Solovyev et al. [34] with the widely used HEXON/FGENEH program.
-
(1997)
Proc Natl Acad Sci USA
, vol.94
, pp. 565-568
-
-
Zhang, M.Q.1
-
26
-
-
0027059264
-
Assessment of protein coding measures
-
Fickett JW, Tung C-S. Assessment of protein coding measures. Nucleic Acids Res. 20:1992;6441-6450.
-
(1992)
Nucleic Acids Res
, vol.20
, pp. 6441-6450
-
-
Fickett, J.W.1
Tung C-S2
-
27
-
-
0029587471
-
The human genome: Organization and evolutionary history
-
Bernardi G. The human genome: organization and evolutionary history. Annu Rev Genet. 29:1995;445-476.
-
(1995)
Annu Rev Genet
, vol.29
, pp. 445-476
-
-
Bernardi, G.1
-
28
-
-
0030560766
-
Base composition and gene distribution: Critical patterns in mammalian genome organization
-
Gardiner K. Base composition and gene distribution: critical patterns in mammalian genome organization. Trends Genet. 12:1996;519-524.
-
(1996)
Trends Genet
, vol.12
, pp. 519-524
-
-
Gardiner, K.1
-
29
-
-
0024610919
-
A tutorial on hidden Markov models and selected applications in speech recognition
-
Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 77:1987;257-285.
-
(1987)
Proc IEEE
, vol.77
, pp. 257-285
-
-
Rabiner, L.R.1
-
32
-
-
0030333286
-
A generalized hidden Markov model for the recognition of human genes in DNA
-
of special interest. AAAI Press Menlo Park This paper pioneered the use of 'generalized' HMMs, in which gene structure is modeled by an underlying Markov process of generalized hidden states, each of which can emit one or more nucleotides, possibly according to probabilities derived from an internal model structure such as a HMM or neural network
-
Kulp D, Haussler D, Reese MG, Eeckman FH. A generalized hidden Markov model for the recognition of human genes in DNA. of special interest Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology. 1996;AAAI Press, Menlo Park, This paper pioneered the use of 'generalized' HMMs, in which gene structure is modeled by an underlying Markov process of generalized hidden states, each of which can emit one or more nucleotides, possibly according to probabilities derived from an internal model structure such as a HMM or neural network.
-
(1996)
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
-
-
Kulp, D.1
Haussler, D.2
Reese, M.G.3
Eeckman, F.H.4
-
33
-
-
0003415703
-
Identification of genes in human genomic DNA
-
Stanford: Stanford University
-
Burge C. Identification of genes in human genomic DNA. PhD thesis. 1997;Stanford University, Stanford.
-
(1997)
PhD Thesis
-
-
Burge, C.1
-
34
-
-
0029814920
-
A segment-based dynamic programming algorithm for predicting gene structure
-
Wu T. A segment-based dynamic programming algorithm for predicting gene structure. J Comput Biol. 3:1996;375-394.
-
(1996)
J Comput Biol
, vol.3
, pp. 375-394
-
-
Wu, T.1
-
35
-
-
0028618270
-
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames
-
Solovyev VV, Salamov AA, Lawrence CB. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res. 22:1994;5156-5163.
-
(1994)
Nucleic Acids Res
, vol.22
, pp. 5156-5163
-
-
Solovyev, V.V.1
Salamov, A.A.2
Lawrence, C.B.3
-
36
-
-
0030801002
-
Gapped BLAST and PSI-BLAST: A new generation of protein database search programs
-
Altschul SF, Madden TL, Schafer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:1997;3389-3402.
-
(1997)
Nucleic Acids Res
, vol.25
, pp. 3389-3402
-
-
Altschul, S.F.1
Madden, T.L.2
Schafer, A.A.3
Zhang, J.4
Zhang, Z.5
Miller, W.6
Lipman, D.J.7
-
39
-
-
0027968068
-
CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
-
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:1994;4673-4680.
-
(1994)
Nucleic Acids Res
, vol.22
, pp. 4673-4680
-
-
Thompson, J.D.1
Higgins, D.G.2
Gibson, T.J.3
-
40
-
-
0028605559
-
Constructing gene models from accurately predicted exons: An application of dynamic programming
-
Xu Y, Mural RJ, Uberbacher EC. Constructing gene models from accurately predicted exons: an application of dynamic programming. Comput Appl Biosci. 10:1994;613-623.
-
(1994)
Comput Appl Biosci
, vol.10
, pp. 613-623
-
-
Xu, Y.1
Mural, R.J.2
Uberbacher, E.C.3
-
41
-
-
0030786487
-
Las Vegas algorithms for gene recognition: Suboptimal and error-tolerant spliced alignment
-
of special interest. The authors describe some variations and extensions to the 'spliced alignment' algorithm introduced by Gelfand et al. [38] and implemented in the PROCRUSTES program. The basic idea of PROCRUSTES is to identify the exon and intron structure of a gene in a genomic sequence by searching for the set of genomic segments (predicted exons) that maximize a global similarity measure to a pre-specified homologous protein. These homology-based methods may be extremely accurate when a sufficiently similar protein is available (e.g. human genomic DNA versus orthologous mouse protein).
-
Sze S-H, Pcvzner PA. Las Vegas algorithms for gene recognition: suboptimal and error-tolerant spliced alignment. of special interest J Comput Biol. 4:1997;297-309 The authors describe some variations and extensions to the 'spliced alignment' algorithm introduced by Gelfand et al. [38] and implemented in the PROCRUSTES program. The basic idea of PROCRUSTES is to identify the exon and intron structure of a gene in a genomic sequence by searching for the set of genomic segments (predicted exons) that maximize a global similarity measure to a pre-specified homologous protein. These homology-based methods may be extremely accurate when a sufficiently similar protein is available (e.g. human genomic DNA versus orthologous mouse protein).
-
(1997)
J Comput Biol
, vol.4
, pp. 297-309
-
-
Sze S-H1
Pcvzner, P.A.2
-
42
-
-
0030479536
-
The origin of interspersed repeats in the human genome
-
Smit AFA. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 6:1996;743-748.
-
(1996)
Curr Opin Genet Dev
, vol.6
, pp. 743-748
-
-
Smit, A.F.A.1
-
43
-
-
0030094146
-
CENSOR - A program for identification and elimination of repetitive elements from DNA sequences
-
Jurka J, Klonowski P, Dagman V, Pelton P. CENSOR - a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem. 20:1996;119-122.
-
(1996)
Comput Chem
, vol.20
, pp. 119-122
-
-
Jurka, J.1
Klonowski, P.2
Dagman, V.3
Pelton, P.4
-
44
-
-
0030854739
-
TRNAcan-SE: A program for improved detection of transfer RNA genes in genomic sequence
-
Lowe TM, Eddy SR. tRNAcan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:1997;955-964.
-
(1997)
Nucleic Acids Res
, vol.25
, pp. 955-964
-
-
Lowe, T.M.1
Eddy, S.R.2
-
45
-
-
0026456701
-
The human XIST gene: Analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus
-
Brown CJ, Hendrich BD, Rupert JL, Lafreniere RG, Xing Y, Lawrence J, Willard HF. The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell. 71:1992;527-542.
-
(1992)
Cell
, vol.71
, pp. 527-542
-
-
Brown, C.J.1
Hendrich, B.D.2
Rupert, J.L.3
Lafreniere, R.G.4
Xing, Y.5
Lawrence, J.6
Willard, H.F.7
-
46
-
-
0026749475
-
DNA sequence analysis of 66 kb of human MHC class II region encoding a cluster of genes for antigen processing
-
Beck S, Kelly A, Radley E, Khurshid F, Alderton RP, Trowsdale J. DNA sequence analysis of 66 kb of human MHC class II region encoding a cluster of genes for antigen processing. J Mol Biol. 228:1992;433-441.
-
(1992)
J Mol Biol
, vol.228
, pp. 433-441
-
-
Beck, S.1
Kelly, A.2
Radley, E.3
Khurshid, F.4
Alderton, R.P.5
Trowsdale, J.6
-
47
-
-
0028782111
-
Statistical studies of biomolecular sequences: Score-based methods
-
Karlin S. Statistical studies of biomolecular sequences: score-based methods. Phil Trans R Soc Lond Biol. 344:1994;391-402.
-
(1994)
Phil Trans R Soc Lond Biol
, vol.344
, pp. 391-402
-
-
Karlin, S.1
-
48
-
-
0030786488
-
Automated gene identification in large-scale genomic sequences
-
Xu Y, Uberbacher EC. Automated gene identification in large-scale genomic sequences. J Comput Biol. 4:1997;325-338.
-
(1997)
J Comput Biol
, vol.4
, pp. 325-338
-
-
Xu, Y.1
Uberbacher, E.C.2
-
49
-
-
0031972648
-
Regulation of sex-specific selection of fruitless 5′ splice sites by transformer and transformer-2
-
Heinrichs V, Ryner LC, Baker BS. Regulation of sex-specific selection of fruitless 5′ splice sites by transformer and transformer-2. Mol Cell Biol. 18:1998;450-458.
-
(1998)
Mol Cell Biol
, vol.18
, pp. 450-458
-
-
Heinrichs, V.1
Ryner, L.C.2
Baker, B.S.3
|