메뉴 건너뛰기




Volumn 7, Issue 1, 2008, Pages 192-201

Data augmentation algorithms for detecting conserved domains in protein sequences: A comparative study

Author keywords

Data augmentation; Expectation maximization (EM); Gibbs sampling; Markov chain Monte Carlo; Motif discovery; Multiple local alignment; Protein sequence analysis

Indexed keywords

HELIX LOOP HELIX PROTEIN;

EID: 38649088564     PISSN: 15353893     EISSN: None     Source Type: Journal    
DOI: 10.1021/pr070475q     Document Type: Article
Times cited : (9)

References (39)
  • 1
    • 0027912333 scopus 로고
    • Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment
    • Lawrence, C. E.; Altschul, S. F.; Boguski, M. S.; Liu, J. S.; Neuwald, A. F.; Wootton, J. C. Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 1993, 262, 208-214.
    • (1993) Science , vol.262 , pp. 208-214
    • Lawrence, C.E.1    Altschul, S.F.2    Boguski, M.S.3    Liu, J.S.4    Neuwald, A.F.5    Wootton, J.C.6
  • 2
    • 0029144601 scopus 로고
    • Gibbs motif sampling: Detection of bacterial outer membrane protein repeats
    • Neuwald, A. F.; Liu, J. S.; Lawrence, C. E. Gibbs motif sampling: Detection of bacterial outer membrane protein repeats. Protein Sci. 1995, 4, 1618-1632.
    • (1995) Protein Sci , vol.4 , pp. 1618-1632
    • Neuwald, A.F.1    Liu, J.S.2    Lawrence, C.E.3
  • 3
    • 0027764344 scopus 로고
    • Protein sequence alignments: A strategy for the hierarchical analysis of residue conservation
    • Livingstone, C. D.; Barton, G. J. Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput. Appl. Biosci. 1993, 9, 745-756.
    • (1993) Comput. Appl. Biosci , vol.9 , pp. 745-756
    • Livingstone, C.D.1    Barton, G.J.2
  • 4
    • 0028300741 scopus 로고
    • Combining evolutionary information and neural networks to predict protein secondary structure
    • Rost, B.; Sander, C. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 1994, 19, 55-72.
    • (1994) Proteins , vol.19 , pp. 55-72
    • Rost, B.1    Sander, C.2
  • 5
    • 0035812694 scopus 로고    scopus 로고
    • Protein structure prediction and structural genomics
    • Baker, D.; Sali, A. Protein structure prediction and structural genomics. Science 2001, 294, 93-96.
    • (2001) Science , vol.294 , pp. 93-96
    • Baker, D.1    Sali, A.2
  • 6
    • 0033168097 scopus 로고    scopus 로고
    • A comprehensive comparison of multiple sequence alignment programs
    • Thompson, J. D.; Plewniak, F.; Poch, O. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 1999, 27, 2682-2690.
    • (1999) Nucleic Acids Res , vol.27 , pp. 2682-2690
    • Thompson, J.D.1    Plewniak, F.2    Poch, O.3
  • 7
    • 0002629270 scopus 로고
    • Maximum Likelihood from Incomplete Data via the EM Algorithm (with Discussion)
    • Dempster, A. P.; Laird, N. M.; Rubin, D. B. Maximum Likelihood from Incomplete Data via the EM Algorithm (with Discussion). J. Royal Stat. Soc. B 1977, 39, 1-38.
    • (1977) J. Royal Stat. Soc. B , vol.39 , pp. 1-38
    • Dempster, A.P.1    Laird, N.M.2    Rubin, D.B.3
  • 8
    • 0025320805 scopus 로고
    • An expectation maximization algorithm for the identification and characterization of common sites in unaligned biopolymer sequences
    • Lawrence, C. E.; Reilly, A. A. An expectation maximization algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: Struct., Funct., Genet. 1990, 7, 41-51.
    • (1990) Proteins: Struct., Funct., Genet , vol.7 , pp. 41-51
    • Lawrence, C.E.1    Reilly, A.A.2
  • 9
    • 0002759539 scopus 로고
    • Unsupervised learning of multiple motifs in biopolymers using expectation maximization
    • Bailey, T. L.; Elkan, C. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 1995, 21, 51-80.
    • (1995) Machine Learning , vol.21 , pp. 51-80
    • Bailey, T.L.1    Elkan, C.2
  • 10
    • 0021518209 scopus 로고
    • Stochastic relaxation, Gibbs distribution and Bayesian restoration of images
    • Geman, S.; Geman, D. Stochastic relaxation, Gibbs distribution and Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 1984, 6, 721-741.
    • (1984) IEEE Trans. Pattern Anal. Mach. Intell , vol.6 , pp. 721-741
    • Geman, S.1    Geman, D.2
  • 11
    • 84950424966 scopus 로고
    • Bayesian models for multiple local sequence alignment and Gibbs sampling strategies
    • Liu, J. S.; Neuwald, A. F.; Lawrence, C. E. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 1995, 90, 1156-1170.
    • (1995) J. Am. Stat. Assoc , vol.90 , pp. 1156-1170
    • Liu, J.S.1    Neuwald, A.F.2    Lawrence, C.E.3
  • 13
    • 33646229522 scopus 로고    scopus 로고
    • Practical strategies for discovering regulatory DNA sequence motifs
    • MacIsaac, K. D.; Fraenkel, E. Practical strategies for discovering regulatory DNA sequence motifs. PLoS Comput. Biol. 2006, 2, e26.
    • (2006) PLoS Comput. Biol , vol.2
    • MacIsaac, K.D.1    Fraenkel, E.2
  • 14
    • 33748765218 scopus 로고    scopus 로고
    • Computational biology: Towards deciphering gene regulatory information in mammalian genomes
    • Ji, H.; Wong, W. W. Computational biology: Towards deciphering gene regulatory information in mammalian genomes. Biometrics 2006, 62, 645-663.
    • (2006) Biometrics , vol.62 , pp. 645-663
    • Ji, H.1    Wong, W.W.2
  • 16
    • 14644404134 scopus 로고    scopus 로고
    • Bayesian modeling and computation in bioinformatics research
    • Jiang, T, Xu, Y, Zhang, M, Ed, MIT Press, Cambridge, MA
    • Liu, J. S. Bayesian modeling and computation in bioinformatics research. In Current Topics in Computational Biology, Jiang, T., Xu, Y., Zhang, M., Ed.; MIT Press.: Cambridge, MA, 2002; pp 11-44.
    • (2002) Current Topics in Computational Biology , pp. 11-44
    • Liu, J.S.1
  • 19
    • 34248191269 scopus 로고    scopus 로고
    • A stochastic EM-type algorithm for motif-finding in biopolymer sequences
    • Bi, C.-P. SEAM: A stochastic EM-type algorithm for motif-finding in biopolymer sequences. J. Bioinf. Comput. Biol. 2007, 5, 47-77.
    • (2007) J. Bioinf. Comput. Biol , vol.5 , pp. 47-77
    • Bi, C.-P.S.1
  • 20
    • 84950758368 scopus 로고
    • The calculation of posterior distributions by data augmentation
    • Tanner, M. A.; Wong, W. H. The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 1987, 82, 528-550.
    • (1987) J. Am. Stat. Assoc , vol.82 , pp. 528-550
    • Tanner, M.A.1    Wong, W.H.2
  • 21
    • 33747349191 scopus 로고
    • Nonuniversal critical dynamics in Monte Carlo simulations
    • Swendsen, R. H.; Wang, J. S. Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett. 1987, 58, 86-88.
    • (1987) Phys. Rev. Lett , vol.58 , pp. 86-88
    • Swendsen, R.H.1    Wang, J.S.2
  • 22
    • 0000251971 scopus 로고
    • Maximum likelihood estimation via the ECM algorithm: A general framework
    • Meng, X. L.; Rubin, D. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 1993, 80, 267-278.
    • (1993) Biometrika , vol.80 , pp. 267-278
    • Meng, X.L.1    Rubin, D.2
  • 23
    • 84950432017 scopus 로고    scopus 로고
    • Wei, G. C. G.; Tanner, M. A. A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms. J. Am. Stat. Assoc. 1990, 85, 699-704.
    • Wei, G. C. G.; Tanner, M. A. A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms. J. Am. Stat. Assoc. 1990, 85, 699-704.
  • 24
    • 34547433608 scopus 로고    scopus 로고
    • Multiple sequence local alignment using Monte Carlo EM algorithm
    • Mandoiu, I, Zelikovsky, A, Eds, Springer-Verlag: Berlin Heidelberg
    • Bi, C.-P. Multiple sequence local alignment using Monte Carlo EM algorithm. In ISBRA 2007 Lecture Notes in Bioinformatics; Mandoiu, I., Zelikovsky, A., Eds.; Springer-Verlag: Berlin Heidelberg, 2007; Vol. 4463, 465-476.
    • (2007) ISBRA 2007 Lecture Notes in Bioinformatics , vol.4463 , pp. 465-476
    • Bi, C.-P.1
  • 26
    • 38649085151 scopus 로고    scopus 로고
    • Neal, R. M.; Hinton, G. E. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models; Jordan, M., Ed.; NATO Science Series Huwer: Norwell, MA, 1998; pp 355-368.
    • Neal, R. M.; Hinton, G. E. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models; Jordan, M., Ed.; NATO Science Series Huwer: Norwell, MA, 1998; pp 355-368.
  • 27
    • 0035998835 scopus 로고    scopus 로고
    • Model-based clustering, discriminant analysis, and density estimation
    • Fraley, C.; Raftery, A. Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 2002, 97, 611-631.
    • (2002) J. Am. Stat. Assoc , vol.97 , pp. 611-631
    • Fraley, C.1    Raftery, A.2
  • 28
    • 0034072450 scopus 로고    scopus 로고
    • DNA binding sites: Representation and discovery
    • Stormo, G. D. DNA binding sites: representation and discovery. Bioinformatics 2000, 16, 16-23.
    • (2000) Bioinformatics , vol.16 , pp. 16-23
    • Stormo, G.D.1
  • 29
    • 0023147228 scopus 로고
    • Selection of DNA binding sites by regulatory proteins: Statistical-mechanical theory and application to operators and promoters
    • Berg, O. G.; von Hippel, P. H. Selection of DNA binding sites by regulatory proteins: Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 1987, 193, 723-750.
    • (1987) J. Mol. Biol , vol.193 , pp. 723-750
    • Berg, O.G.1    von Hippel, P.H.2
  • 30
    • 0028679709 scopus 로고
    • On the complexity of multiple sequence alignment
    • Wang, L.; Jiang, T. On the complexity of multiple sequence alignment. J. Comput. Biol. 1994, 1, 337-348.
    • (1994) J. Comput. Biol , vol.1 , pp. 337-348
    • Wang, L.1    Jiang, T.2
  • 31
    • 21144439147 scopus 로고    scopus 로고
    • Tompa, M.; Li, N.; Bailey, T. L.; Church, G. M.; De Moor, B.; Eskin, E.; Favorov, A. V.; Frith, M. C.; Fu, Y.; Kent, W. J.; Makeev, V. J.; Mironor, A. A.; Noble, W. S.; Pavesi, G.; Pesole, G.; Regnier, M.; Simonis, N.; Sinha, S.; Thijs, G.; vanHelden, J.; Vandenbogaert, M.; Weng, Z.; Workman, C.; Ye, C.; Zhu, Z. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 2005, 23, 137-144.
    • Tompa, M.; Li, N.; Bailey, T. L.; Church, G. M.; De Moor, B.; Eskin, E.; Favorov, A. V.; Frith, M. C.; Fu, Y.; Kent, W. J.; Makeev, V. J.; Mironor, A. A.; Noble, W. S.; Pavesi, G.; Pesole, G.; Regnier, M.; Simonis, N.; Sinha, S.; Thijs, G.; vanHelden, J.; Vandenbogaert, M.; Weng, Z.; Workman, C.; Ye, C.; Zhu, Z. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 2005, 23, 137-144.
  • 32
    • 84886053679 scopus 로고    scopus 로고
    • Bi, C.-P. A genetic-based EM motif-finding algorithm for biological sequence analysis. Proc. of IEEE Symp. on Computat. Intell. Bioinf. Computat. Biol. 2007, 07, 275-282.
    • Bi, C.-P. A genetic-based EM motif-finding algorithm for biological sequence analysis. Proc. of IEEE Symp. on Computat. Intell. Bioinf. Computat. Biol. 2007, 07, 275-282.
  • 33
    • 33847263536 scopus 로고    scopus 로고
    • Supervised detection of conserved motifs in DNA sequences with cosmo
    • Article
    • Bembom, O.; Keles, S.; van der Laan, M. J. Supervised detection of conserved motifs in DNA sequences with cosmo. Stat. Appl. Genet. Mol. Biol. 2006, 6,Article 8.
    • (2006) Stat. Appl. Genet. Mol. Biol , vol.6 , pp. 8
    • Bembom, O.1    Keles, S.2    van der Laan, M.J.3
  • 34
    • 33646129914 scopus 로고    scopus 로고
    • The IcIR family of transcriptional activators and repressors can be defined by a single profile
    • Krell, T.; Molina-Henares, A. J.; Ramos, J. L. The IcIR family of transcriptional activators and repressors can be defined by a single profile. Protein Sci. 2006, 15, 1207-1213.
    • (2006) Protein Sci , vol.15 , pp. 1207-1213
    • Krell, T.1    Molina-Henares, A.J.2    Ramos, J.L.3
  • 36
    • 24644457706 scopus 로고    scopus 로고
    • BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark
    • Thompson, J. D.; Koehl, P.; Ripp, R.; Poch, O. BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark. Proteins 2005, 61, 127-36.
    • (2005) Proteins , vol.61 , pp. 127-136
    • Thompson, J.D.1    Koehl, P.2    Ripp, R.3    Poch, O.4
  • 37
    • 0034565285 scopus 로고    scopus 로고
    • Combinatorial approaches to finding subtle signals in DNA sequences
    • Pevzner, P. A.; Sze, S.-H. Combinatorial approaches to finding subtle signals in DNA sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2000, 8, 269-278.
    • (2000) Proc. Int. Conf. Intell. Syst. Mol. Biol , vol.8 , pp. 269-278
    • Pevzner, P.A.1    Sze, S.-H.2
  • 38
    • 3042666256 scopus 로고    scopus 로고
    • MUSCLE: Multiple sequence alignment with high accuracy and high thoughput
    • Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high thoughput. Nucleic Acids Res. 2004, 32, 1792-1797.
    • (2004) Nucleic Acids Res , vol.32 , pp. 1792-1797
    • Edgar, R.C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.