메뉴 건너뛰기




Volumn 46, Issue 6, 2003, Pages 479-503

Words in DNA sequences: Some case studies based on their frequency statistics

Author keywords

Average linkage clustering; Chernoff's faces; Dendrograms; DNA words; F ranks of words; F ratios of words; l1 distance; Phylogenetic relationships; Rank correlation; Single linkage clustering

Indexed keywords

EUKARYOTA; PROKARYOTA; UNIDENTIFIED BACTERIOPHAGE;

EID: 1542540910     PISSN: 03036812     EISSN: None     Source Type: Journal    
DOI: 10.1007/s00285-002-0185-3     Document Type: Article
Times cited : (16)

References (48)
  • 1
    • 84972496740 scopus 로고
    • Poisson approximation and the Chen-Stein method
    • Arratia, R., Goldstein, L., Gordon, L.: Poisson approximation and the Chen-Stein method. Stat. Sci. 5, 403-434 (1990)
    • (1990) Stat. Sci. , vol.5 , pp. 403-434
    • Arratia, R.1    Goldstein, L.2    Gordon, L.3
  • 2
    • 0029792949 scopus 로고    scopus 로고
    • Poisson approximation for long repeats in a random sequence with application to sequencing by hybridization
    • Arratia, R., Martin, D., Reinert, G., Waterman, M.S.: Poisson approximation for long repeats in a random sequence with application to sequencing by hybridization. J. Comput. Biol. 3, 425-463 (1996)
    • (1996) J. Comput. Biol. , vol.3 , pp. 425-463
    • Arratia, R.1    Martin, D.2    Reinert, G.3    Waterman, M.S.4
  • 4
    • 0012358091 scopus 로고    scopus 로고
    • Statistical analysis of large DNA sequences using distribution of DNA words
    • Chaudhuri, P., Das, S.: Statistical analysis of large DNA sequences using distribution of DNA words. Curr. Sci. 80, 1161-1166 (2001)
    • (2001) Curr. Sci. , vol.80 , pp. 1161-1166
    • Chaudhuri, P.1    Das, S.2
  • 5
    • 0036211742 scopus 로고    scopus 로고
    • SWORDS: A statistical tool for analyzing large DNA sequences
    • Chaudhuri, P., Das, S.: SWORDS: a statistical tool for analyzing large DNA sequences. J. Biosci. 27, 1-6 (2002)
    • (2002) J. Biosci. , vol.27 , pp. 1-6
    • Chaudhuri, P.1    Das, S.2
  • 6
    • 84949331034 scopus 로고
    • The use of faces to represent points in k-dimensional space graphically
    • Chernoff, H.: The use of faces to represent points in k-dimensional space graphically. J. Amer. Statist. Assoc. 68, 361-368 (1973)
    • (1973) J. Amer. Statist. Assoc. , vol.68 , pp. 361-368
    • Chernoff, H.1
  • 7
    • 0024557590 scopus 로고
    • Stochastic models for heterogeneous DNA sequences
    • Churchill, G.A.: Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51, 79-94 (1989)
    • (1989) Bull. Math. Biol. , vol.51 , pp. 79-94
    • Churchill, G.A.1
  • 8
    • 0021760092 scopus 로고
    • A comprehensive set of sequence analysis programs for the VAX
    • Devereux, J.P., Haeberli, P., Smithies, O.: A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12, 387-395 (1984)
    • (1984) Nucleic Acids Res. , vol.12 , pp. 387-395
    • Devereux, J.P.1    Haeberli, P.2    Smithies, O.3
  • 9
    • 0025307624 scopus 로고
    • Molecular evolution: Computer analysis of protein and nucleic acid sequences
    • Doolittle, R.F.: Molecular evolution: computer analysis of protein and nucleic acid sequences. Meth. Enzymol. 183, 1-735 (1990)
    • (1990) Meth. Enzymol. , vol.183 , pp. 1-735
    • Doolittle, R.F.1
  • 10
    • 0008737165 scopus 로고    scopus 로고
    • Molecular evolution: Computer methods for macromolecular sequence analysis
    • Doolittle, R.F.: Molecular evolution: computer methods for macromolecular sequence analysis. Meth. Enzymol. 266, 1-711 (1996)
    • (1996) Meth. Enzymol. , vol.266 , pp. 1-711
    • Doolittle, R.F.1
  • 12
    • 0001027766 scopus 로고
    • Statistical inference of phylogenies
    • Felsenstein, J.: Statistical inference of phylogenies. J. R. Statist. Soc. (A) 146, 246-272 (1983)
    • (1983) J. R. Statist. Soc. (A) , vol.146 , pp. 246-272
    • Felsenstein, J.1
  • 13
    • 0024152983 scopus 로고
    • Phylogenies from molecular sequences: Inference and reliability
    • Felsenstein, J.: Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet. 22, 521-565 (1988)
    • (1988) Annu. Rev. Genet. , vol.22 , pp. 521-565
    • Felsenstein, J.1
  • 14
    • 0000122573 scopus 로고
    • PHYLIP - Phylogeny inference package (Version 3.4)
    • Felsenstein, J.: PHYLIP - phylogeny inference package (Version 3.4). Cladistics 5, 164-166 (1989)
    • (1989) Cladistics , vol.5 , pp. 164-166
    • Felsenstein, J.1
  • 15
    • 0002845889 scopus 로고
    • Improved Poisson approximations for word patterns
    • Godbole, A.P., Schaffner, A.A.: Improved Poisson approximations for word patterns. Adv. Appl. Prob. 25, 334-347 (1993)
    • (1993) Adv. Appl. Prob. , vol.25 , pp. 334-347
    • Godbole, A.P.1    Schaffner, A.A.2
  • 17
    • 0027458843 scopus 로고
    • Patchiness and correlations in DNA sequences
    • Karlin, S., Brendel, V.: Patchiness and correlations in DNA sequences. Science 259(5095), 677-680 (1993)
    • (1993) Science , vol.259 , Issue.5095 , pp. 677-680
    • Karlin, S.1    Brendel, V.2
  • 18
    • 0028558694 scopus 로고
    • Which bacterium is the ancestor of the animal mitochondrial genome?
    • Karlin, S., Campbell, A.M.: Which bacterium is the ancestor of the animal mitochondrial genome? Proc. Natl. Acad. Sci. USA 91, 12842-12846 (1994)
    • (1994) Proc. Natl. Acad. Sci. USA , vol.91 , pp. 12842-12846
    • Karlin, S.1    Campbell, A.M.2
  • 19
    • 0028028028 scopus 로고
    • Computational DNA sequence analysis
    • Karlin, S., Cardon, L.R.: Computational DNA sequence analysis. Annu. Rev. Microbiol. 44, 619-654 (1994)
    • (1994) Annu. Rev. Microbiol. , vol.44 , pp. 619-654
    • Karlin, S.1    Cardon, L.R.2
  • 20
    • 0028606501 scopus 로고
    • Comparisons of eukaryotic genomic sequences
    • Karlin, S., Ladunga, I.: Comparisons of eukaryotic genomic sequences. Proc. Natl. Acad. Sci. USA 91, 12832-12836 (1994)
    • (1994) Proc. Natl. Acad. Sci. USA , vol.91 , pp. 12832-12836
    • Karlin, S.1    Ladunga, I.2
  • 24
    • 0029795578 scopus 로고    scopus 로고
    • Over- and underrepresentation of short DNA words in herpesvirus genomes
    • Leung, M.-Y., Marsh, G.M., Speed, T.P.: Over-and underrepresentation of short DNA words in herpesvirus genomes. J. Comput. Biol. 3, 345-360 (1996)
    • (1996) J. Comput. Biol. , vol.3 , pp. 345-360
    • Leung, M.-Y.1    Marsh, G.M.2    Speed, T.P.3
  • 25
    • 0001320216 scopus 로고    scopus 로고
    • Oligonucleotide freqnecies in DNA follow a Yule distribution
    • Martindale, C., Konopka, A.K.: Oligonucleotide freqnecies in DNA follow a Yule distribution. Comp. Chem. 20, 35-38 (1996)
    • (1996) Comp. Chem. , vol.20 , pp. 35-38
    • Martindale, C.1    Konopka, A.K.2
  • 26
    • 0003034099 scopus 로고    scopus 로고
    • Modeling bacaterial genomes using Hidden Marokov Models
    • R. Payne, P.J. Green (eds), Physica-Verlag, Heidelberg
    • Muri, F.: Modeling bacaterial genomes using Hidden Marokov Models. In: Compstat'98 Proceedings in Computational Statistics, R. Payne, P.J. Green (eds), pp. 89-100. Physica-Verlag, Heidelberg. 1998
    • (1998) Compstat'98 Proceedings in Computational Statistics , pp. 89-100
    • Muri, F.1
  • 27
    • 0014757386 scopus 로고
    • A general method applicable to the search for similarities in the amino acid sequence for two proteins
    • Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence for two proteins. J. Mol. Biol. 48, 443-453 (1970)
    • (1970) J. Mol. Biol. , vol.48 , pp. 443-453
    • Needleman, S.B.1    Wunsch, C.D.2
  • 28
    • 0030458552 scopus 로고    scopus 로고
    • Phylogenetic analysis in molecular evolutionary genetics
    • Nei, M.: Phylogenetic analysis in molecular evolutionary genetics. Annu. Rev. Genet. 30, 371-403 (1996)
    • (1996) Annu. Rev. Genet. , vol.30 , pp. 371-403
    • Nei, M.1
  • 29
    • 0019321678 scopus 로고
    • Some rules in the ordering of nucleotides in the DNA
    • Nussinov, R.: Some rules in the ordering of nucleotides in the DNA. Nucleic. Acids Res. 8, 4545-4562 (1980)
    • (1980) Nucleic. Acids Res. , vol.8 , pp. 4545-4562
    • Nussinov, R.1
  • 30
    • 0019888313 scopus 로고
    • Nearest neighbor nucleotide patterns: Structural and biological implications
    • Nussinov, R.: Nearest neighbor nucleotide patterns: structural and biological implications. J. Biol. Chem. 256, 8458-8462 (1981)
    • (1981) J. Biol. Chem. , vol.256 , pp. 8458-8462
    • Nussinov, R.1
  • 31
    • 0019952514 scopus 로고
    • Some indications for inverse DNA duplication
    • Nussinov, R.: Some indications for inverse DNA duplication. J. Theor. Biol. 95, 783-793 (1982)
    • (1982) J. Theor. Biol. , vol.95 , pp. 783-793
    • Nussinov, R.1
  • 32
    • 0021759169 scopus 로고
    • Doublet frequencies in evolutionary distinct groups
    • Nussinov, R.: Doublet frequencies in evolutionary distinct groups. Nucleic. Acids Res. 12, 1749-1763 (1984a)
    • (1984) Nucleic. Acids Res. , vol.12 , pp. 1749-1763
    • Nussinov, R.1
  • 33
    • 0021137397 scopus 로고
    • Strong doublet preferences in nucleotide sequences and DNA geometry
    • Nussinov, R.: Strong doublet preferences in nucleotide sequences and DNA geometry. J. Mol. Evol. 20, 111-119 (1984b)
    • (1984) J. Mol. Evol. , vol.20 , pp. 111-119
    • Nussinov, R.1
  • 34
    • 38249015539 scopus 로고
    • Nucleotide sequences versus Markov models
    • Pevzner, P.A.: Nucleotide sequences versus Markov models. Comput. Chem. 16, 103-106 (1992)
    • (1992) Comput. Chem. , vol.16 , pp. 103-106
    • Pevzner, P.A.1
  • 35
    • 0024514063 scopus 로고
    • Linguistics of nucleotide sequences I: The significance of deviations from mean statistical characteristics and prediction of the frequencies of occurrence of words
    • Pevzner, P.A., Borodovsky, M.Y., Mironov, A.A.: Linguistics of nucleotide sequences I: the significance of deviations from mean statistical characteristics and prediction of the frequencies of occurrence of words. J. Biomol. Struct. Dyn. 6(5), 1013-1026 (1989a)
    • (1989) J. Biomol. Struct. Dyn. , vol.6 , Issue.5 , pp. 1013-1026
    • Pevzner, P.A.1    Borodovsky, M.Y.2    Mironov, A.A.3
  • 36
    • 0024593127 scopus 로고
    • Linguistics of nucleotide sequences II: Stationary words in genetic texts and the zonal structure of DNA
    • Pevzner, P.A., Borodovsky, M.Y., Mironov, A.A.: Linguistics of nucleotide sequences II: stationary words in genetic texts and the zonal structure of DNA. J. Biomol. Struct. Dyn. 6(5), 1027-1038 (1989b)
    • (1989) J. Biomol. Struct. Dyn. , vol.6 , Issue.5 , pp. 1027-1038
    • Pevzner, P.A.1    Borodovsky, M.Y.2    Mironov, A.A.3
  • 37
    • 0023664154 scopus 로고
    • Mono- through hexa-nucleotide composition of the Escherichia coli genome: A Markov chain analysis
    • Phillips, G., Arnold, J., Ivarie, R.: Mono-through hexa-nucleotide composition of the Escherichia coli genome: a Markov chain analysis. Nucleic. Acids Res. 15, 2611-2626 (1987a)
    • (1987) Nucleic. Acids Res. , vol.15 , pp. 2611-2626
    • Phillips, G.1    Arnold, J.2    Ivarie, R.3
  • 38
    • 0023664163 scopus 로고
    • The effect of codon usage on the oligonucleotide composition of the E. coli genome and identification of over- and underrepresented sequences by Markov chain analysis
    • Phillips, G., Arnold, J., Ivarie, R.: The effect of codon usage on the oligonucleotide composition of the E. coli genome and identification of over-and underrepresented sequences by Markov chain analysis. Nucleic. Acids Res. 15, 2627-2638 (1987b)
    • (1987) Nucleic. Acids Res , vol.15 , pp. 2627-2638
    • Phillips, G.1    Arnold, J.2    Ivarie, R.3
  • 39
    • 0000375303 scopus 로고
    • Finding words with unexpected frequencies in deoxyribonucleic acid sequences
    • Prum, B., Rodolphe, F., de Turckheim, E.: Finding words with unexpected frequencies in deoxyribonucleic acid sequences. J. R. Statist. Soc. (B) 57, 205-220 (1995)
    • (1995) J. R. Statist. Soc. (B) , vol.57 , pp. 205-220
    • Prum, B.1    Rodolphe, F.2    De Turckheim, E.3
  • 41
    • 0031902984 scopus 로고    scopus 로고
    • Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains
    • Reinert, G., Schbath, S.: Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223-253 (1998)
    • (1998) J. Comput. Biol. , vol.5 , pp. 223-253
    • Reinert, G.1    Schbath, S.2
  • 42
    • 0008724278 scopus 로고    scopus 로고
    • Large compound Poisson approximations for occurrences of multiple words
    • Statistics in Molecular Biology and Genetics, F. Seillier-Moiseiwitsch (ed.), IMS, Hayward, California
    • Reinert, G., Schbath, S.: Large compound Poisson approximations for occurrences of multiple words. In: Statistics in Molecular Biology and Genetics, F. Seillier-Moiseiwitsch (ed.), IMS Lecture Notes and Monograph Series, vol. 33, pp. 257-275. IMS, Hayward, California, 1999
    • (1999) IMS Lecture Notes and Monograph Series , vol.33 , pp. 257-275
    • Reinert, G.1    Schbath, S.2
  • 43
    • 0034125366 scopus 로고    scopus 로고
    • Probabilistic and statistical properties of words: An overview
    • Reinert, G., Schbath, S., Waterman, M.S.: Probabilistic and statistical properties of words: an overview. J. Comput. Biol. 7, 1-46 (2000)
    • (2000) J. Comput. Biol. , vol.7 , pp. 1-46
    • Reinert, G.1    Schbath, S.2    Waterman, M.S.3
  • 44
    • 0029365207 scopus 로고
    • Exceptional motifs in different Markov chain models for statistical analysis of DNA sequences
    • Schbath, S., Prum, B., de Turckheim, E.: Exceptional motifs in different Markov chain models for statistical analysis of DNA sequences. J. Comput. Biol. 2, 417-437 (1995)
    • (1995) J. Comput. Biol. , vol.2 , pp. 417-437
    • Schbath, S.1    Prum, B.2    De Turckheim, E.3
  • 45
    • 0029836454 scopus 로고    scopus 로고
    • Quartet puzzling: A quartet maximum likelihood method for reconstructing tree toplogies
    • Strimmer, K., von Haesler, A.: Quartet puzzling: a quartet maximum likelihood method for reconstructing tree toplogies. Mol. Biol. Evol. 13, 964-969 (1996)
    • (1996) Mol. Biol. Evol. , vol.13 , pp. 964-969
    • Strimmer, K.1    Von Haesler, A.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.