SCOPUS 정보 검색 플랫폼

ACM Transactions on Speech and Language Processing

Volumn 7, Issue 3, 2011, Pages

Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs

(3) Jurčíček, Filip a Thomson, Blaise a Young, Steve a

a UNIVERSITY OF CAMBRIDGE (United Kingdom)

Author keywords

POMDP; Reinforcement learning; Spoken dialogue systems

Indexed keywords

DIALOGUE MANAGER; DIALOGUE MODELS; DIALOGUE SYSTEMS; INFORMATION DOMAINS; LEARNING PARAMETERS; MAIN COMPONENT; MODEL PARAMETERS; NATURAL GRADIENT; NOVEL ALGORITHM; OPTIMAL MODEL; OPTIMAL POLICIES; OPTIMIZATION TECHNIQUES; PARTIALLY OBSERVABLE MARKOV DECISION PROCESS; POLICY GRADIENT METHODS; POMDP; PRIOR DISTRIBUTION; RANDOM SEARCH ALGORITHM; REINFORCEMENT ALGORITHMS; REWARD FUNCTION; SPOKEN DIALOGUE SYSTEM; STATE INFORMATION;

GRADIENT METHODS; LEARNING ALGORITHMS; MARKOV PROCESSES; OPTIMIZATION; SPEECH PROCESSING;

PARAMETER ESTIMATION;

EID: 80052051092 PISSN: 15504875 EISSN: 15504883 Source Type: Journal
DOI: 10.1145/1966407.1966411 Document Type: Article

Times cited : (38)

References (39)

1
- 14344253499
- Ph.D. thesis, Australian National University
- ABERDEEN, D. A. 2003. Policy-gradient algorithms for partially observable Markov decision processes. Ph.D. thesis, Australian National University.
- (2003) Policy-gradient Algorithms for Partially Observable Markov Decision Processes
- Aberdeen, D.A.¹

2
- 0000396062
- Natural Gradient Works Efficiently in Learning
- AMARI, S. 1998. Natural gradient works efficiently in learning. Neural Comput. 10, 2, 251-276. (Pubitemid 128463152)
- (1998) Neural Computation , vol.10 , Issue.2 , pp. 251-276
- Amari, S.-I.¹

3
- 33846516584
- Springer
- BISHOP, C. 2006. Pattern Recognition and Machine Learning. Springer.
- (2006) Pattern Recognition and Machine Learning
- Bishop, C.¹

4
- 69849095934
- A tractable hybrid DDN-POMDP approach to affective dialogue modeling for probabilistic frame-based dialogue systems
- BUI, T. H., POEL, M., NIJHOLT, A., AND ZWIERS, J. 2009. A tractable hybrid DDN-POMDP approach to affective dialogue modeling for probabilistic frame-based dialogue systems. Natural Lang. Engin. 15, 2, 273-307.
- (2009) Natural Lang. Engin. , vol.15 , Issue.2 , pp. 273-307
- Bui, T.H.¹ Poel, M.² Nijholt, A.³ Zwiers, J.⁴

5
- 34548212336
- Efficient model learning for dialog management
- DOI 10.1145/1228716.1228726, HRI 2007 - Proceedings of the 2007 ACM/IEEE Conference on Human-Robot Interaction - Robot as Team Member
- DOSHI, F. AND ROY, N. 2007. Efficient model learning for dialog management. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI'07). ACM, New York, NY, 65-72. (Pubitemid 47327128)
- (2007) HRI 2007 - Proceedings of the 2007 ACM/IEEE Conference on Human-Robot Interaction - Robot as Team Member , pp. 65-72
- Doshi, F.¹ Roy, N.²

6
- 31844451013
- Reinforcement learning with Gaussian Processes
- ENGEL, Y.,MANNOR, S., AND MEIR, R. 2005. Reinforcement learning with Gaussian Processes. In Proceedings of the 22nd International Conference on Machine Learning.
- (2005) Proceedings of the 22nd International Conference on Machine Learning
- Engel, Y.¹ Mannor, S.² Meir, R.³

7
- 67650458797
- Kalman Temporal Differences: The deterministic case
- GEIST, M., PIETQUIN, O., AND FRICOUT, G. 2009. Kalman Temporal Differences: The deterministic case. In Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL'09). 185-192.
- (2009) Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL'09) , pp. 185-192
- Geist, M.¹ Pietquin, O.² Fricout, G.³

8
- 48749101568
- Automatic annotation of communicator dialogue data for learnin dialogue strategies and user simulations
- GEORGILA, K., LEMON, O., AND HENDERSON, J. 2005. Automatic annotation of COMMUNICATOR dialogue data for learnin dialogue strategies and user simulations. In Proceedings of the 9thWorkshop on the Semantics and Pragmatics of Dialogue.
- (2005) Proceedings of the 9thWorkshop on the Semantics and Pragmatics of Dialogue
- Georgila, K.¹ Lemon, O.² Henderson, J.³

9
- 0035377566
- Completely derandomized self-adaptation in evolution strategies
- HANSEN,N. ANDOSTERMEIER, A. 2001. Completely derandomized self-adaptation in evolution strategies. Evolut. Computat. 9, 2, 159-195.
- (2001) Evolut. Computat. , vol.9 , Issue.2 , pp. 159-195
- Hansen, N.¹ Ostermeier, A.²

10
- 84942484786
- Ridge regression: Biased estimation for nonorthogonal problems
- HOERL, A. E. AND KENNARD, R. W. 1970. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 12, 55-67.
- (1970) Technometrics , vol.12 , pp. 55-67
- Hoerl, A.E.¹ Kennard, R.W.²

11
- 85097311538
- Natural belief-critic: A reinforcement algorithm for parameter estimation in statistical spoken dialogue systems
- JURČÍČEK, F., THOMSON, B., KEIZER, S., GAŠIĆ, M., MAIRESSE, F., YU, K., AND YOUNG, S. 2010. Natural Belief-Critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems. In Proceedings of Interspeech.
- (2010) Proceedings of Interspeech
- Jurčíček, F.¹ Thomson, B.² Keizer, S.³ Gašić, M.⁴ Mairesse, F.⁵ Yu, K.⁶ Young, S.⁷

12
- 0032073263
- Planning and acting in partially observable stochastic domains
- PII S000437029800023X
- KAELBLING, L. P., LITTMAN, M. L., AND CASSANDRA, A. R. 1998. Planning and Acting in Partially Observable Stochastic Domains. Art. Intell. 101, 99-134. (Pubitemid 128387390)
- (1998) Artificial Intelligence , vol.101 , Issue.1-2 , pp. 99-134
- Kaelbling, L.P.¹ Littman, M.L.² Cassandra, A.R.³

13
- 0003857779
- Ph.D. dissertation. Cambridge University
- KAPADIA, S. 1998. Discriminative training of hidden Markov models. Ph.D. dissertation. Cambridge University.
- (1998) Discriminative Training of Hidden Markov Models
- Kapadia, S.¹

14
- 84867191183
- Effects of user modeling on POMDP-based dialogue systems
- KIM, D.,SIM, H. S.,KIM, K.,KIM, J. H.,KIM, H., AND SUNG, J.W. 2008. Effects of User Modeling on POMDP-based Dialogue Systems. In Proceedings of Interspeech.
- (2008) Proceedings of Interspeech
- Kim, D.¹ Sim, H.S.² Kim, K.³ Kim, J.H.⁴ Kim, H.⁵ Sung, J.W.⁶

15
- 84898938510
- Actor-critic algorithms
- MIT Press
- KONDA, V. AND TSITSIKLIS, J. 2000. Actor-critic algorithms. In Advances in Neural Information Processing Systems 12. MIT Press.
- (2000) Advances in Neural Information Processing Systems , vol.12
- Konda, V.¹ Tsitsiklis, J.²

16
- 34247500374
- Python for scientific computing
- DOI 10.1109/MCSE.2007.58, 4160250
- OLIPHANT, T. E. 2007. Python for scientific computing. Comput. Sci. Engin. 9, 3, 10-20. (Pubitemid 46646860)
- (2007) Computing in Science and Engineering , vol.9 , Issue.3 , pp. 10-20
- Oliphant, T.E.¹

17
- 40649106649
- Natural actor-critic
- PETERS, J. AND SCHAAL, S. 2008a. Natural actor-critic. Neurocomput. 71, 7-9, 1180-1190.
- (2008) Neurocomput. , vol.71 , Issue.7-9 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

18
- 44949241322
- Reinforcement learning of motor skills with policy gradients
- PETERS, J. AND SCHAAL, S. 2008b. Reinforcement learning of motor skills with policy gradients. Neural Netw. 21, 4, 682-697.
- (2008) Neural Netw. , vol.21 , Issue.4 , pp. 682-697
- Peters, J.¹ Schaal, S.²

19
- 34447553096
- Reinforcement learning for humanoid robotics
- PETERS, J., VIJAYAKUMAR, S., AND SCHAAL, S. 2003. Reinforcement Learning for Humanoid Robotics. In Proceedings of the 3rd IEEE-RAS International Conference on Humanoid Robots.
- (2003) Proceedings of the 3rd IEEE-RAS International Conference on Humanoid Robots
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

20
- 33646413135
- Natural actor-critic
- Springer
- PETERS, J., VIJAYAKUMAR, S., AND SCHAAL, S. 2005. Natural Actor-Critic. In Proceedings of the European Conference on Machine Learning (ECML). Springer, 280-291.
- (2005) Proceedings of the European Conference on Machine Learning (ECML) , pp. 280-291
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

21
- 0024610919
- A tutorial on hidden Markov models and selected applications in speech recognition
- RABINER, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. In Proc. IEEE. 257-286.
- (1989) Proc. IEEE. , pp. 257-286
- Rabiner, L.R.¹

22
- 44949150573
- Lets go public! taking a spoken dialog system to the real world
- RAUX, A., LANGNER, B., BOHUS, D., BLACK, A. W., AND ESKENAZI, M. 2005. Lets go public! taking a spoken dialog system to the real world. In Proceedings of Interspeech.
- (2005) Proceedings of Interspeech
- Raux, A.¹ Langner, B.² Bohus, D.³ Black, A.W.⁴ Eskenazi, M.⁵

23
- 84880707672
- Spoken dialogue management using probabilistic reasoning
- Association for Computational Linguistics, Morristown, NJ
- ROY, N., PINEAU, J., AND THRUN, S. 2000. Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL'00). Association for Computational Linguistics, Morristown, NJ, 93-100.
- (2000) Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL'00) , pp. 93-100
- Roy, N.¹ Pineau, J.² Thrun, S.³

24
- 33846257740
- Effects of the user model on simulationbased learning of dialogue strategies
- SCHATZMANN, J., STUTTLE, M. N.,WEILHAMMER, K., AND YOUNG, S. 2005. Effects of the user model on simulationbased learning of dialogue strategies. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding.
- (2005) Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding
- Schatzmann, J.¹ Stuttle, M.N.² Weilhammer, K.³ Young, S.⁴

25
- 0013025914
- John Wiley & Sons, Inc., New York, NY
- SPALL, J. C. 2003. Introduction to Stochastic Search and Optimization. John Wiley & Sons, Inc., New York, NY.
- (2003) Introduction to Stochastic Search and Optimization
- Spall, J.C.¹

26
- 0004102479
- Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA
- SUTTON, R. AND BARTO, A. 1998. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.²

27
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- MIT Press
- SUTTON, R., MCALLESTER, D., SINGH, S., AND MANSOUR, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, MIT Press, 1057-1063.
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
- Sutton, R.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

28
- 79959834356
- Using automatically transcribed dialogs to learn user models in a spoken dialog system
- SYED, U. AND WILLIAMS, J. D. 2008. Using automatically transcribed dialogs to learn user models in a spoken dialog system. In Proceedings of the Annual Meeting of the Association for Computational Linguistics on Human Language Technologies. 121-124.
- (2008) Proceedings of the Annual Meeting of the Association for Computational Linguistics on Human Language Technologies , pp. 121-124
- Syed, U.¹ Williams, J.D.²

29
- 79951783485
- Ph.D. thesis, University of Cambridge
- THOMSON, B. 2010. Statistical methods for spoken dialogue management. Ph.D. thesis, University of Cambridge.
- (2010) Statistical Methods for Spoken Dialogue Management
- Thomson, B.¹

30
- 77950862681
- Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems
- THOMSON, B. AND YOUNG, S. 2010. Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems. Comput. Speech Lang. 24, 4, 562-588.
- (2010) Comput. Speech Lang. , vol.24 , Issue.4 , pp. 562-588
- Thomson, B.¹ Young, S.²

31
- 38149018611
- Solving deep memory POMDPs with recurrent policy gradients
- WIERSTRA, D., FOERSTER, A., PETERS, J., AND SCHMIDHUBER, J. 2007. Solving deep memory POMDPs with recurrent policy gradients. In Proceedings of the International Conference on Artificial Neural Networks.
- (2007) Proceedings of the International Conference on Artificial Neural Networks
- Wierstra, D.¹ Foerster, A.² Peters, J.³ Schmidhuber, J.⁴

32
- 77957283019
- Recurrent policy gradients
- WIERSTRA, D.,FÖRSTER, A.,PETERS, J., AND SCHMIDHUBER, J. 2010. Recurrent policy gradients. Logic J. IGPL 18, 5, 620-634.
- (2010) Logic J. IGPL , vol.18 , Issue.5 , pp. 620-634
- Wierstra, D.¹ Förster, A.² Peters, J.³ Schmidhuber, J.⁴

33
- 80052071838
- Demonstration of a POMDP voice dialer
- Association for Computational Linguistics, Morristown, NJ
- WILLIAMS, J. 2008a. Demonstration of a POMDP voice dialer. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (HLT'08). Association for Computational Linguistics, Morristown, NJ, 1-4.
- (2008) Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (HLT'08) , pp. 1-4
- Williams, J.¹

34
- 66149160386
- Integrating expert knowledge into POMDP optimization for spoken dialog systems
- WILLIAMS, J. D. 2008b. Integrating expert knowledge into POMDP optimization for spoken dialog systems. In Proceedings of the AAAI-08 Workshop on Advancements in POMDP Solvers.
- (2008) Proceedings of the AAAI-08 Workshop on Advancements in POMDP Solvers
- Williams, J.D.¹

35
- 33846220727
- Scaling Up POMDPs for dialog management: The summary POMDP method
- WILLIAMS, J. D. AND YOUNG, S. 2005. Scaling Up POMDPs for dialog management: The Summary POMDP Method. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
- (2005) Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
- Williams, J.D.¹ Young, S.²

36
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- WILLIAMS, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8.
- (1992) Mach. Learn. , vol.8
- Williams, R.J.¹

37
- 78549277875
- Tech. rep., Engineering Department, Cambridge University
- YOUNG, S. 2007. CUED standard dialogue acts: http://mi.eng.cam.ac.uk/ research/dialogue/LocalDocs/dastd.pdf. Tech. rep., Engineering Department, Cambridge University.
- (2007) CUED Standard Dialogue Acts
- Young, S.¹

38
- 70349231178
- The Hidden Information State Model: A practical framework for POMDP-based spoken dialogue management
- YOUNG, S.,GAŠIĆ, M.,KEIZER, S.,MAIRESSE, F., SCHATZMANN, J., THOMSON, B., AND YU, K. 2010. The Hidden Information State Model: a practical framework for POMDP-based spoken dialogue management. Comput. Speech Lang. 24, 2, 150-174.
- (2010) Comput. Speech Lang. , vol.24 , Issue.2 , pp. 150-174
- Young, S.¹ Gašić, M.² Keizer, S.³ Mairesse, F.⁴ Schatzmann, J.⁵ Thomson, B.⁶ Yu, K.⁷

39
- 33645958652
- Planning and acting under uncertainty: A new model for spoken dialogue system
- ZHANG, B.,CAI, Q.,MAO, J., AND GUO, B. 2001. Planning and acting under uncertainty: A new model for spoken dialogue system. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence (UAI).
- (2001) Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence (UAI)
- Zhang, B.¹ Cai, Q.² Mao, J.³ Guo, B.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.