메뉴 건너뛰기




Volumn 12, Issue , 2011, Pages 2335-2382

Producing power-law distributions and damping word frequencies with two-stage language models

Author keywords

Language model; Nonparametric Bayes; Pitman Yor process; Unsupervised

Indexed keywords

BAYESIAN MODEL; DIRICHLET PROCESS; ESTIMATION PROCEDURES; FORMAL ANALYSIS; LANGUAGE MODEL; NATURAL LANGUAGES; NON-PARAMETRIC; PITMAN-YOR PROCESS; POWER LAW; POWER LAW DISTRIBUTION; PROBABILISTIC MODELS; STATISTICAL MODELS; TWO STAGE; UNSUPERVISED; WORD FREQUENCIES;

EID: 80052252167     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (64)

References (91)
  • 1
    • 0242353111 scopus 로고    scopus 로고
    • Rules vs. analogy in English past tenses: A computational/experimental study
    • Adam Albright and Bruce Hayes. Rules vs. analogy in English past tenses: a computational/experimental study. Cognition, 90:118-161, 2003.
    • (2003) Cognition , vol.90 , pp. 118-161
    • Albright, A.1    Hayes, B.2
  • 3
    • 0003008572 scopus 로고    scopus 로고
    • Frequency effects and the representational status of regular inflections
    • Maria Alegre and Peter Gordon. Frequency effects and the representational status of regular inflections. Journal of Memory and Language, 40(1):41-61, 1999.
    • (1999) Journal of Memory and Language , vol.40 , Issue.1 , pp. 41-61
    • Alegre, M.1    Gordon, P.2
  • 4
    • 0000708831 scopus 로고
    • Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems
    • Charles Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2:1152-1174, 1974.
    • (1974) The Annals of Statistics , vol.2 , pp. 1152-1174
    • Antoniak, C.1
  • 5
    • 0000038791 scopus 로고
    • Poisson process approximations for the Ewens sampling formula
    • Richard Arratia, A. D. Barbour, and Simon Tavaré. Poisson process approximations for the Ewens sampling formula. Annals of Applied Probability, 2:519-535, 1992.
    • (1992) Annals of Applied Probability , vol.2 , pp. 519-535
    • Arratia, R.1    Barbour, A.D.2    Tavaré, S.3
  • 8
    • 0002617436 scopus 로고
    • Ferguson distributions via Pólya urn schemes
    • David Blackwell and James B. MacQueen. Ferguson distributions via Pólya urn schemes. Annals of Statistics, 1:353-355, 1973.
    • (1973) Annals of Statistics , vol.1 , pp. 353-355
    • Blackwell, D.1    MacQueen, J.B.2
  • 9
    • 84898995847 scopus 로고    scopus 로고
    • Latent Dirichlet allocation
    • T. Dietterich, S. Becker, and Z. Ghahramani, editors Cambridge, MA MIT Press
    • David Blei, Andrew Ng, and Michael Jordan. Latent Dirichlet allocation. In T. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.
    • (2002) Advances in Neural Information Processing Systems , vol.14
    • Blei, D.1    Ng, A.2    Jordan, M.3
  • 11
    • 8644225400 scopus 로고    scopus 로고
    • Hierarchical topic models and the nested Chinese restaurant process
    • S. Thrun, L. Saul, and B. Schölkopf, editors Cambridge, MA MIT Press
    • David Blei, Thomas L. Griffiths, Michael Jordan, and Joshua B. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing 16, Cambridge, MA, 2004. MIT Press.
    • (2004) Advances in Neural Information Processing , vol.16
    • Blei, D.1    Griffiths, T.L.2    Jordan, M.3    Tenenbaum, J.B.4
  • 12
    • 0032677683 scopus 로고    scopus 로고
    • An efficient probabilistically sound algorithm for segmentation and word discovery
    • Michael Brent. An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning, 34:71-105, 1999.
    • (1999) Machine Learning , vol.34 , pp. 71-105
    • Brent, M.1
  • 16
    • 84937343492 scopus 로고    scopus 로고
    • Cambridge University press, Cambridge, UK
    • Joan Bybee. Phonology and Language Use. Cambridge University press, Cambridge, UK, 2001.
    • (2001) Phonology and Language Use
    • Bybee, J.1
  • 26
    • 33749257142 scopus 로고    scopus 로고
    • Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution
    • Pittsburgh, Pennsylvania
    • Charles Elkan. Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution. In Proceedings of the 23rd International Conference on Machine Learning, pages 289-296, Pittsburgh, Pennsylvania, 2006.
    • (2006) Proceedings of the 23rd International Conference on Machine Learning , pp. 289-296
    • Elkan, C.1
  • 28
  • 29
    • 0001120413 scopus 로고
    • A Bayesian analysis of some nonparametric problems
    • Thomas S. Ferguson. A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1:209-230, 1973.
    • (1973) Annals of Statistics , vol.1 , pp. 209-230
    • Ferguson, T.S.1
  • 31
    • 0003860037 scopus 로고    scopus 로고
    • Walter R. Gilks, Sylvia Richardson, and David J. Spiegelhalter, editors Chapman and Hall, Suffolk
    • Walter R. Gilks, Sylvia Richardson, and David J. Spiegelhalter, editors. Markov Chain Monte Carlo in Practice. Chapman and Hall, Suffolk, 1996.
    • (1996) Markov Chain Monte Carlo in Practice
  • 32
    • 0041079008 scopus 로고    scopus 로고
    • Unsupervised learning of the morphology of a natural language
    • John Goldsmith. Unsupervised learning of the morphology of a natural language. Computational Linguistics, 27:153-198, 2001.
    • (2001) Computational Linguistics , vol.27 , pp. 153-198
    • Goldsmith, J.1
  • 33
    • 33751175157 scopus 로고    scopus 로고
    • An algorithm for the unsupervised learning of morphology
    • DOI 10.1017/S1351324905004055, PII S1351324905004055
    • John Goldsmith. An algorithm for the unsupervised learning of morphology. Journal of Natural Language Engineering, 12:353-371, 2006. (Pubitemid 44773861)
    • (2006) Natural Language Engineering , vol.12 , Issue.4 , pp. 353-371
    • Goldsmith, J.1
  • 36
    • 67349278780 scopus 로고    scopus 로고
    • A Bayesian framework for word segmentation: Exploring the effects of context
    • Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. A Bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1):21-54, 2009.
    • (2009) Cognition , vol.112 , Issue.1 , pp. 21-54
    • Goldwater, S.1    Griffiths, T.L.2    Johnson, M.3
  • 37
    • 0004055734 scopus 로고
    • Language universals
    • T. A. Sebok, editor Mouton, The Hague
    • Joseph Greenberg. Language universals. In T. A. Sebok, editor, Current Trends in Linguistics III. Mouton, The Hague, 1966.
    • (1966) Current Trends in Linguistics III
    • Greenberg, J.1
  • 40
    • 21144436663 scopus 로고    scopus 로고
    • Statistical morphological disambiguation for agglutinative languages
    • DOI 10.1023/A:1020271707826
    • Dilek Z. Hakkani-Tür, Kemal Oflazer, and Gökhan Tür. Statistical morphological disambiguation for agglutinative languages. Computers and the Humanities, 36(4):381-410, 2002. (Pubitemid 44947889)
    • (2002) Computers and the Humanities , vol.36 , Issue.4 , pp. 381-410
    • Hakkani-Tur, D.Z.1    Oflazer, K.2    Tur, G.3
  • 41
    • 0001074490 scopus 로고
    • From phoneme to morpheme
    • Zelig Harris. From phoneme to morpheme. Language, 31:190-222, 1955.
    • (1955) Language , vol.31 , pp. 190-222
    • Harris, Z.1
  • 42
    • 0040663141 scopus 로고    scopus 로고
    • Lexical frequency in morphology: Is everything relative?
    • Jennifer Hay. Lexical frequency in morphology: Is everything relative? Linguistics, 39(6):1041-1070, 2001. (Pubitemid 33620691)
    • (2001) Linguistics , vol.39 , Issue.376 , pp. 1041-1070
    • Hay, J.1
  • 43
    • 21344444397 scopus 로고    scopus 로고
    • Shifting paradigms: Gradient structure in morphology
    • DOI 10.1016/j.tics.2005.04.002, PII S1364661305001038
    • Jennifer Hay and R. Harald Baayen. Shifting paradigms: gradient structure in morphology. Trends in Cognitive Sciences, 9(7):342-348, 2005. (Pubitemid 40910333)
    • (2005) Trends in Cognitive Sciences , vol.9 , Issue.7 , pp. 342-348
    • Hay, J.B.1    Baayen, R.H.2
  • 45
    • 0346338291 scopus 로고    scopus 로고
    • Generalized weighted Chinese restaurant processes for species sampling mixture models
    • Hemant Ishwaran and Lancelot James. Generalized weighted Chinese restaurant processes for species sampling mixture models. Statistica Sinica, 13:1211-1235, 2003. (Pubitemid 38042014)
    • (2003) Statistica Sinica , vol.13 , Issue.4 , pp. 1211-1235
    • Ishwaran, H.1    James, L.F.2
  • 47
    • 84858427431 scopus 로고    scopus 로고
    • Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure
    • Mark Johnson. Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure. In Proceedings of ACL-08: HLT, Columbus, Ohio, 2008a.
    • (2008) Proceedings of ACL-08: HLT, Columbus, Ohio
    • Johnson, M.1
  • 49
    • 80052248446 scopus 로고    scopus 로고
    • PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names
    • Uppsala, Sweden
    • Mark Johnson. PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. In Proceedings of the 48th Annual Meeting of the Association of Computational Linguistics, pages 1148-1157, Uppsala, Sweden, 2010.
    • (2010) Proceedings of the 48th Annual Meeting of the Association of Computational Linguistics , pp. 1148-1157
    • Johnson, M.1
  • 51
    • 67349214856 scopus 로고    scopus 로고
    • Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models
    • B. Schölkopf, J. Platt, and T. Hoffman, editors Cambridge, MA MIT Press
    • Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. Adaptor grammars: a framework for specifying compositional nonparametric Bayesian models. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, Cambridge, MA, 2007. MIT Press.
    • (2007) Advances in Neural Information Processing Systems , vol.19
    • Johnson, M.1    Griffiths, T.L.2    Goldwater, S.3
  • 52
    • 46749133617 scopus 로고    scopus 로고
    • Tree adjoining grammars
    • Ruslan Mikkov, editor Oxford University Press, Oxford, England
    • Aravind Joshi. Tree adjoining grammars. In Ruslan Mikkov, editor, The Oxford Handbook of Computational Linguistics, pages 483-501. Oxford University Press, Oxford, England, 2003.
    • (2003) The Oxford Handbook of Computational Linguistics , pp. 483-501
    • Joshi, A.1
  • 57
    • 0001876649 scopus 로고
    • On a class of Bayesian nonparametric estimates
    • Albert Lo. On a class of Bayesian nonparametric estimates. Annals of Statistics, 12:351-357, 1984.
    • (1984) Annals of Statistics , vol.12 , pp. 351-357
    • Lo, A.1
  • 59
    • 0022083299 scopus 로고
    • The child language data exchange system
    • Brian MacWhinney and Catharine Snow. The child language data exchange system. Journal of Child Language, 12:271-296, 1985.
    • (1985) Journal of Child Language , vol.12 , pp. 271-296
    • MacWhinney, B.1    Snow, C.2
  • 61
    • 34249852033 scopus 로고
    • Building a large annotated corpus of English: The Penn treebank
    • Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of English: the Penn treebank. Computational Linguistics, 19(2):331-330, 1993.
    • (1993) Computational Linguistics , vol.19 , Issue.2 , pp. 331-330
    • Marcus, M.P.1    Santorini, B.2    Ann Marcinkiewicz, M.3
  • 62
    • 79960601554 scopus 로고    scopus 로고
    • A brief history of generative models for power law and lognormal distributions
    • Michael Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2):226-251, 2004.
    • (2004) Internet Mathematics , vol.1 , Issue.2 , pp. 226-251
    • Mitzenmacher, M.1
  • 64
    • 77950032550 scopus 로고    scopus 로고
    • Markov chain sampling methods for Dirichlet process mixture models
    • Radford Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9:249-265, 2000.
    • (2000) Journal of Computational and Graphical Statistics , vol.9 , pp. 249-265
    • Neal, R.1
  • 66
    • 0027929445 scopus 로고
    • On structuring probabilistic dependencies in stochastic language modeling
    • Hermann Ney, Ufe Essen, and Reinhard Kneser. On structuring probabilistic dependencies in stochastic language modeling. Computer, Speech, and Language, 8:1-38, 1994.
    • (1994) Computer Speech and Language , vol.8 , pp. 1-38
    • Ney, H.1    Essen, U.2    Kneser, R.3
  • 69
    • 2142729392 scopus 로고    scopus 로고
    • Probabilistic phonology: Discrimination and robustness
    • R. Bod, J. Hay, and S. Jannedy, editors MIT Press, Cambridge, MA
    • Janet Pierrehumbert. Probabilistic phonology: Discrimination and robustness. In R. Bod, J. Hay, and S. Jannedy, editors, Probabilistic Linguistics. MIT Press, Cambridge, MA, 2003.
    • (2003) Probabilistic Linguistics
    • Pierrehumbert, J.1
  • 70
    • 21844504293 scopus 로고
    • Exchangeable and partially exchangeable random partitions
    • Jim Pitman. Exchangeable and partially exchangeable random partitions. Probability Theory and Related Fields, 102:145-158, 1995.
    • (1995) Probability Theory and Related Fields , vol.102 , pp. 145-158
    • Pitman, J.1
  • 72
    • 0031534984 scopus 로고    scopus 로고
    • The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator
    • Jim Pitman and Marc Yor. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability, 25:855-900, 1997.
    • (1997) Annals of Probability , vol.25 , pp. 855-900
    • Pitman, J.1    Yor, M.2
  • 73
    • 0033815969 scopus 로고    scopus 로고
    • Are non-semantic morphological effects incompatible with a distributed connectionist approach to lexical processing?
    • David Plaut and Laura Gonnerman. Are non-semantic morphological effects incompatible with a distributed connectionist approach to lexical processing? Language and Cognitive Processes, 15: 445-485, 2000.
    • (2000) Language and Cognitive Processes , vol.15 , pp. 445-485
    • Plaut, D.1    Gonnerman, L.2
  • 78
    • 0000720609 scopus 로고
    • A constructive definition of Dirichlet priors
    • Jayazam Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639-650, 1994.
    • (1994) Statistica Sinica , vol.4 , pp. 639-650
    • Sethuraman, J.1
  • 79
    • 0000570212 scopus 로고
    • On a class of skew distribution functions
    • Herbert Simon. On a class of skew distribution functions. Biometrika, 42(3/4):425-440, 1955.
    • (1955) Biometrika , vol.42 , Issue.3-4 , pp. 425-440
    • Simon, H.1
  • 80
    • 84899019621 scopus 로고    scopus 로고
    • A probabilistic model for learning concatenative morphology
    • Cambridge, MA MIT Press
    • Matthew Snover and Michael Brent. A probabilistic model for learning concatenative morphology. In Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press.
    • (2003) Advances in Neural Information Processing Systems , vol.15
    • Snover, M.1    Brent, M.2
  • 81
    • 84859904364 scopus 로고    scopus 로고
    • Unsupervised multilingual learning for morphological segmentation
    • Columbus, Ohio
    • Benjamin Snyder and Regina Barzilay. Unsupervised multilingual learning for morphological segmentation. In Proceedings of ACL-08: HLT, pages 737-745, Columbus, Ohio, 2008.
    • (2008) Proceedings of ACL-08: HLT , pp. 737-745
    • Snyder, B.1    Barzilay, R.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.