메뉴 건너뛰기




Volumn 7, Issue 3, 2011, Pages

Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs

Author keywords

POMDP; Reinforcement learning; Spoken dialogue systems

Indexed keywords

DIALOGUE MANAGER; DIALOGUE MODELS; DIALOGUE SYSTEMS; INFORMATION DOMAINS; LEARNING PARAMETERS; MAIN COMPONENT; MODEL PARAMETERS; NATURAL GRADIENT; NOVEL ALGORITHM; OPTIMAL MODEL; OPTIMAL POLICIES; OPTIMIZATION TECHNIQUES; PARTIALLY OBSERVABLE MARKOV DECISION PROCESS; POLICY GRADIENT METHODS; POMDP; PRIOR DISTRIBUTION; RANDOM SEARCH ALGORITHM; REINFORCEMENT ALGORITHMS; REWARD FUNCTION; SPOKEN DIALOGUE SYSTEM; STATE INFORMATION;

EID: 80052051092     PISSN: 15504875     EISSN: 15504883     Source Type: Journal    
DOI: 10.1145/1966407.1966411     Document Type: Article
Times cited : (38)

References (39)
  • 2
    • 0000396062 scopus 로고    scopus 로고
    • Natural Gradient Works Efficiently in Learning
    • AMARI, S. 1998. Natural gradient works efficiently in learning. Neural Comput. 10, 2, 251-276. (Pubitemid 128463152)
    • (1998) Neural Computation , vol.10 , Issue.2 , pp. 251-276
    • Amari, S.-I.1
  • 4
    • 69849095934 scopus 로고    scopus 로고
    • A tractable hybrid DDN-POMDP approach to affective dialogue modeling for probabilistic frame-based dialogue systems
    • BUI, T. H., POEL, M., NIJHOLT, A., AND ZWIERS, J. 2009. A tractable hybrid DDN-POMDP approach to affective dialogue modeling for probabilistic frame-based dialogue systems. Natural Lang. Engin. 15, 2, 273-307.
    • (2009) Natural Lang. Engin. , vol.15 , Issue.2 , pp. 273-307
    • Bui, T.H.1    Poel, M.2    Nijholt, A.3    Zwiers, J.4
  • 9
    • 0035377566 scopus 로고    scopus 로고
    • Completely derandomized self-adaptation in evolution strategies
    • HANSEN,N. ANDOSTERMEIER, A. 2001. Completely derandomized self-adaptation in evolution strategies. Evolut. Computat. 9, 2, 159-195.
    • (2001) Evolut. Computat. , vol.9 , Issue.2 , pp. 159-195
    • Hansen, N.1    Ostermeier, A.2
  • 10
    • 84942484786 scopus 로고
    • Ridge regression: Biased estimation for nonorthogonal problems
    • HOERL, A. E. AND KENNARD, R. W. 1970. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 12, 55-67.
    • (1970) Technometrics , vol.12 , pp. 55-67
    • Hoerl, A.E.1    Kennard, R.W.2
  • 12
    • 0032073263 scopus 로고    scopus 로고
    • Planning and acting in partially observable stochastic domains
    • PII S000437029800023X
    • KAELBLING, L. P., LITTMAN, M. L., AND CASSANDRA, A. R. 1998. Planning and Acting in Partially Observable Stochastic Domains. Art. Intell. 101, 99-134. (Pubitemid 128387390)
    • (1998) Artificial Intelligence , vol.101 , Issue.1-2 , pp. 99-134
    • Kaelbling, L.P.1    Littman, M.L.2    Cassandra, A.R.3
  • 16
    • 34247500374 scopus 로고    scopus 로고
    • Python for scientific computing
    • DOI 10.1109/MCSE.2007.58, 4160250
    • OLIPHANT, T. E. 2007. Python for scientific computing. Comput. Sci. Engin. 9, 3, 10-20. (Pubitemid 46646860)
    • (2007) Computing in Science and Engineering , vol.9 , Issue.3 , pp. 10-20
    • Oliphant, T.E.1
  • 17
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • PETERS, J. AND SCHAAL, S. 2008a. Natural actor-critic. Neurocomput. 71, 7-9, 1180-1190.
    • (2008) Neurocomput. , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 18
    • 44949241322 scopus 로고    scopus 로고
    • Reinforcement learning of motor skills with policy gradients
    • PETERS, J. AND SCHAAL, S. 2008b. Reinforcement learning of motor skills with policy gradients. Neural Netw. 21, 4, 682-697.
    • (2008) Neural Netw. , vol.21 , Issue.4 , pp. 682-697
    • Peters, J.1    Schaal, S.2
  • 21
    • 0024610919 scopus 로고
    • A tutorial on hidden Markov models and selected applications in speech recognition
    • RABINER, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. In Proc. IEEE. 257-286.
    • (1989) Proc. IEEE. , pp. 257-286
    • Rabiner, L.R.1
  • 30
    • 77950862681 scopus 로고    scopus 로고
    • Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems
    • THOMSON, B. AND YOUNG, S. 2010. Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems. Comput. Speech Lang. 24, 4, 562-588.
    • (2010) Comput. Speech Lang. , vol.24 , Issue.4 , pp. 562-588
    • Thomson, B.1    Young, S.2
  • 36
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • WILLIAMS, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8.
    • (1992) Mach. Learn. , vol.8
    • Williams, R.J.1
  • 37
    • 78549277875 scopus 로고    scopus 로고
    • Tech. rep., Engineering Department, Cambridge University
    • YOUNG, S. 2007. CUED standard dialogue acts: http://mi.eng.cam.ac.uk/ research/dialogue/LocalDocs/dastd.pdf. Tech. rep., Engineering Department, Cambridge University.
    • (2007) CUED Standard Dialogue Acts
    • Young, S.1
  • 38
    • 70349231178 scopus 로고    scopus 로고
    • The Hidden Information State Model: A practical framework for POMDP-based spoken dialogue management
    • YOUNG, S.,GAŠIĆ, M.,KEIZER, S.,MAIRESSE, F., SCHATZMANN, J., THOMSON, B., AND YU, K. 2010. The Hidden Information State Model: a practical framework for POMDP-based spoken dialogue management. Comput. Speech Lang. 24, 2, 150-174.
    • (2010) Comput. Speech Lang. , vol.24 , Issue.2 , pp. 150-174
    • Young, S.1    Gašić, M.2    Keizer, S.3    Mairesse, F.4    Schatzmann, J.5    Thomson, B.6    Yu, K.7


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.