SCOPUS 정보 검색 플랫폼

IEEE Journal on Selected Topics in Signal Processing

Volumn 6, Issue 8, 2012, Pages 891-902

A comprehensive reinforcement learning framework for dialogue management optimization

(4) Daubigney, Lucie a,b Geist, Matthieu a Chandramohan, Senthilkumar a,c Pietquin, Olivier a,d

a UMI Georgia Tech CNRS 2958 (France)

b INRIA (France)

c UNIVERSITY OF AVIGNON (France)

d CNRS (France)

Author keywords

Dialogue management; Reinforcement learning; Spoken dialogue system

Indexed keywords

DIALOGUE MANAGEMENT; DIALOGUE STRATEGY; DIALOGUE SYSTEMS; GOAL-ORIENTED; INTERACTION STRATEGY; NON-STATIONARITIES; SPOKEN DIALOGUE SYSTEM; TEMPORAL DIFFERENCES;

ALGORITHMS; OPTIMIZATION; SPEECH PROCESSING;

REINFORCEMENT LEARNING;

EID: 84872138024 PISSN: 19324553 EISSN: None Source Type: Journal
DOI: 10.1109/JSTSP.2012.2229257 Document Type: Article

Times cited : (46)

References (48)

1
- 0030635367
- Learning dialogue strategies within the Markov decision process framework
- E. Levin, R. Pieraccini, and W. Eckert, "Learning dialogue strategies within the Markov decision process framework," in Proc. Autom. Speech Recognit. Understand. Workshop (ASRU'97), 1997.
- (1997) Proc. Autom. Speech Recognit. Understand. Workshop (ASRU'97)
- Levin, E.¹ Pieraccini, R.² Eckert, W.³

2
- 0001700171
- A markovian decision process
- R. Bellman, "A Markovian Decision Process," J. Math. Mech., vol. 6, pp. 679-684, 1957.
- (1957) J. Math. Mech. , vol.6 , pp. 679-684
- Bellman, R.¹

3
- 0004102479
- Cambridge MA: MIT Press
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An introduction. Cambridge, MA: MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

4
- 0031624616
- Using Markov decision process for learning dialogue strategies
- Speech, Signal Process. (ICASSP'98)
- E. Levin and R. Pieraccini, "Using Markov decision process for learning dialogue strategies," in Proc. Int. Conf. Accoust., Speech, Signal Process. (ICASSP'98), 1998, pp. 201-204.
- (1998) Proc. Int. Conf. Accoust. , pp. 201-204
- Levin, E.¹ Pieraccini, R.²

5
- 0033894474
- Stochastic model of human-machine interaction for learning dialog strategies
- DOI 10.1109/89.817450
- E. Levin, R. Pieraccini, andW. Eckert, "A stochastic model of humanmachine interaction for learning dialog strategies," IEEE Trans. Speech Audio Process., vol. 8, no. 1, pp. 11-23, Jan. 2000. (Pubitemid 30540744)
- (2000) IEEE Transactions on Speech and Audio Processing , vol.8 , Issue.1 , pp. 11-23
- Levin Esther¹ Pieraccini Roberto² Eckert Wieland³

6
- 0030638118
- User modeling for spoken dialogue system evaluation
- W. Eckert, E. Levin, and R. Pieraccini, "User modeling for spoken dialogue system evaluation," in Proc. Autom. Speech Recognit. Understand. Workshop (ASRU'97), 1997.
- (1997) Proc. Autom. Speech Recognit. Understand. Workshop (ASRU'97)
- Eckert, W.¹ Levin, E.² Pieraccini, R.³

7
- 33750253118
- A probabilistic framework for dialog simulation and optimal strategy learning
- DOI 10.1109/TSA.2005.855836
- O. Pietquin and T. Dutoit, "A probabilistic framework for dialog simulation and optimal strategy learning," IEEE Trans. Speech Audio Process., vol. 14, no. 2, pp. 589-599, Mar. 2006. (Pubitemid 46405357)
- (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.2 , pp. 589-599
- Pietquin, O.¹ Dutoit, T.²

8
- 33747607273
- A survey of statistical user simulation techniques for reinforcement- learning of dialogue management strategies
- DOI 10.1017/S0269888906000944
- J. Schatzmann, K.Weilhammer, M. Stuttle, and S. Young, "A survey of statistical user simulation techniques for RL of dialogue management strategies," Knowl. Eng. Rev., vol. 21, no. 2, pp. 97-126, 2006. (Pubitemid 44266297)
- (2006) Knowledge Engineering Review , vol.21 , Issue.2 , pp. 97-126
- Schatzmann, J.¹ Weilhammer, K.² Stuttle, M.³ Young, S.⁴

9
- 33846257740
- Effects of the user model on simulation-based learning of dialogue strategies
- J. Schatzmann, M. N. Stuttle, K. Weilhammer, and S. Young, "Effects of the user model on simulation-based learning of dialogue strategies," in Proc. Autom. Speech Recognit. Understand. Workshop (ASRU'05), 2005.
- (2005) Proc. Autom. Speech Recognit. Understand. Workshop (ASRU'05)
- Schatzmann, J.¹ Stuttle, M.N.² Weilhammer, K.³ Young, S.⁴

10
- 84865777012
- A survey on metrics for the evaluation of user simulations
- O. Pietquin and H. Hastie, "A survey on metrics for the evaluation of user simulations," Knowledge Eng. Rev., 2011.
- (2011) Knowledge Eng. Rev.
- Pietquin, O.¹ Hastie, H.²

11
- 84870232008
- New York: Springer, 2011, Theory and Applications of Natural Language Processing
- V. Rieser and O. Lemon, Reinforcement learning for adaptive dialogue systems: A data-driven methodology for dialogue management and natural language generation. New York: Springer, 2011, Theory and Applications of Natural Language Processing.
- Reinforcement Learning for Adaptive Dialogue Systems: A Data-driven Methodology for Dialogue Management and Natural Language Generation
- Rieser, V.¹ Lemon, O.²

12
- 84898955256
- Reinforcement learning for spoken dialogue systems
- S. Singh, M. Kearns, D. Litman, and M. Walker, "Reinforcement learning for spoken dialogue systems," in Proc. Adv. Neural Inf. Process. Syst. (NIPS'99), 1999.
- (1999) Proc. Adv. Neural Inf. Process. Syst. (NIPS'99)
- Singh, S.¹ Kearns, M.² Litman, D.³ Walker, M.⁴

13
- 84872122412
- 6th ed. New York: Dover
- R. Bellman, Dynamic Programming, 6th ed. New York: Dover, 1957.
- (1957) Dynamic Programming
- Bellman, R.¹

14
- 70450186275
- Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection
- L. Li, S. Balakrishnan, and J. Williams, "Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection," in Proc. Interspeech'09, 2009.
- (2009) Proc. Interspeech'09
- Li, L.¹ Balakrishnan, S.² Williams, J.³

15
- 80052060715
- Sample-efficient batch reinforcement learning for dialogue management optimization
- O. Pietquin, M. Geist, S. Chandramohan, and H. Frezza-Buet, "Sample-efficient batch reinforcement learning for dialogue management optimization," ACM Trans. Speech Audio Process., vol. 7, no. 3, pp. 1-21, 2011.
- (2011) ACM Trans. Speech Audio Process. , vol.7 , Issue.3 , pp. 1-21
- Pietquin, O.¹ Geist, M.² Chandramohan, S.³ Frezza-Buet, H.⁴

16
- 51449120317
- Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets
- J. Henderson, O. Lemon, and K. Georgila, "Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets," Comput. Linguist., vol. 34, no. 4, pp. 487-511, 2008.
- (2008) Comput. Linguist. , vol.34 , Issue.4 , pp. 487-511
- Henderson, J.¹ Lemon, O.² Georgila, K.³

17
- 84867601978
- Managing uncertainty within the KTD framework
- Journal of Machine Learning Research C& WP
- M. Geist and O. Pietquin, "Managing uncertainty within the KTD framework," in Proc. AL&E Workshop, 2011, Journal of Machine Learning Research C& WP
- (2011) Proc. AL&E Workshop
- Geist, M.¹ Pietquin, O.²

18
- 84857755225
- Gaussian processes for fast policy optimisation of POMDPbased dialogue managers
- M. Gašić, F. Jurčíček, S. Keizer, F.Mairesse, B. Thomson, K. Yu, and S. Young, "Gaussian processes for fast policy optimisation of POMDPbased dialogue managers," in Proc. SIGdial'10, 2010.
- (2010) Proc. SIGdial'10
- Gašić, M.¹ Jurčíček, F.² Keizer, S.³ Mairesse, F.⁴ Thomson, B.⁵ Yu, K.⁶ Young, S.⁷

19
- 84867619228
- Off-policy learning in largescale POMDP-based dialogue systems
- L.Daubigney,M. Geist, andO. Pietquin, "Off-policy learning in largescale POMDP-based dialogue systems," in Proc. Int. Conf. Acoust., Speech Signal Process. (ICASSP'12), 2012, pp. 4989-4992.
- (2012) Proc. Int. Conf. Acoust., Speech Signal Process. (ICASSP'12) , pp. 4989-4992
- Daubigney, L.¹ Geist, M.² Pietquin, O.³

20
- 79959813974
- Natural belief-critic: A reinforcement algorithm for parameter estimation in statistical spoken dialogue systems
- F. Jurčíček, B. Thomson, S. Keizer, M. Gašić, F.Mairesse, K.Yu, and S. Young, "Natural belief-critic: A reinforcement algorithm for parameter estimation in statistical spoken dialogue systems," in Proc. Interspeech' 10, 2010.
- (2010) Proc. Interspeech' , vol.10
- Jurčíček, F.¹ Thomson, B.² Keizer, S.³ Gašić, M.⁴ Mairesse, F.⁵ Yu, K.⁶ Young, S.⁷

21
- 78651465938
- Kalman temporal differences
- (JAIR)
- M. Geist and O. Pietquin, "Kalman temporal differences," J. Artif. Intell. Res. (JAIR), vol. 39, pp. 483-532, 2010.
- (2010) J. Artif. Intell. Res. , vol.39 , pp. 483-532
- Geist, M.¹ Pietquin, O.²

22
- 51349089807
- DIPPER: Description and formalisation of an information-state update dialogue system architecture
- J. Bos, E. Klein, O. Lemon, and T. Oka, "DIPPER: Description and formalisation of an information-state update dialogue system architecture," in Proc. SIGdial'03, 2003.
- (2003) Proc. SIGdial'03
- Bos, J.¹ Klein, E.² Lemon, O.³ Oka, T.⁴

23
- 84872163482
- The HIS dialogue manager
- S. Young, J. Schatzmann, B. Thomson, H. Ye, and K. Weilhammer, "The HIS dialogue manager," in Proc. IEEE/ACL Workshop Spoken Lang. Technol. (SLT'06), 2006.
- (2006) Proc. IEEE/ACL Workshop Spoken Lang. Technol. (SLT'06)
- Young, S.¹ Schatzmann, J.² Thomson, B.³ Ye, H.⁴ Weilhammer, K.⁵

24
- 0031143730
- An analysis of temporal-difference learning with function approximation
- PII S0018928697034375
- J. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Autom. Control, vol. 42, no. 5, pp. 674-690, May 1997. (Pubitemid 127760263)
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

25
- 80051605697
- Bayesian reinforcement learning for POMDPbased dialogue systems
- Speech, Signal Process. (ICASSP'11)
- S. Png and J. Pineau, "Bayesian reinforcement learning for POMDPbased dialogue systems," in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP'11), 2011, pp. 2156-2159.
- (2011) Proc. Int. Conf. Acoust. , pp. 2156-2159
- Png, S.¹ Pineau, J.²

26
- 85009087667
- Information state and dialogue management in the trindi dialogue move engine toolkit
- S. Larsson and D. R. Traum, "Information state and dialogue management in the trindi dialogue move engine toolkit," Natural Lang. Eng., vol. 6, pp. 323-340, 2000.
- (2000) Natural Lang. Eng. , vol.6 , pp. 323-340
- Larsson, S.¹ Traum, D.R.²

27
- 33846220727
- Scaling up POMDPs for dialogue management: The summary POMDP method
- J. Williams and S. Young, "Scaling up POMDPs for dialogue management: The summary POMDP method," in Proc. Autom. Speech Recognit. Understanding Workshop (ASRU'05), 2005.
- (2005) Proc. Autom. Speech Recognit. Understanding Workshop (ASRU'05)
- Williams, J.¹ Young, S.²

28
- 85065183198
- PARADISE: A framework for evaluating spoken dialogue agents
- M. Walker, D. Litman, C. Kamm, and A. Abella, "PARADISE: A framework for evaluating spoken dialogue agents," in Proc. Meeting Assoc. Comput. Linguist. (ACL'97), 1997.
- (1997) Proc. Meeting Assoc. Comput. Linguist. (ACL'97)
- Walker, M.¹ Litman, D.² Kamm, C.³ Abella, A.⁴

29
- 84872159748
- Learning the reward model of dialogue POMDPs from data
- A. Boularias, H. Chinaei, and B. Chaib-draa, "Learning the reward model of dialogue POMDPs from data," in Proc. NIPS Workshop of Mach. Learn. for Assistive Tech., 2010.
- (2010) Proc. NIPS Workshop of Mach. Learn. for Assistive Tech.
- Boularias, A.¹ Chinaei, H.² Chaib-Draa, B.³

30
- 84879078248
- Reward function learning for dialogue management
- L. E. Asri, R. Laroche, and O. Pietquin, "Reward function learning for dialogue management," in Proc. Starting Artif. Intell. Res. Symp. (STAIRS'12), 2012, pp. 95-106.
- (2012) Proc. Starting Artif. Intell. Res. Symp. (STAIRS'12) , pp. 95-106
- Asri, L.E.¹ Laroche, R.² Pietquin, O.³

31
- 85024429815
- A new approach to linear filtering and prediction problems
- no. Series D
- R. Kalman, "A new approach to linear filtering and prediction problems," Trans. ASME-J. Basic Eng., vol. 82, no. Series D, pp. 35-45, 1960.
- (1960) Trans. ASME-J. Basic Eng. , vol.82 , pp. 35-45
- Kalman, R.¹

32
- 84962432583
- The unscented Kalman filter for nonlinear estimation
- E. Wan and R. Van Der Merwe, "The unscented Kalman filter for nonlinear estimation," in Adaptive Syst. for Signal Process., Commun., Control Symp. (AS-SPCC'00), 2000, pp. 153-158.
- (2000) Adaptive Syst. for Signal Process., Commun., Control Symp. (AS-SPCC'00) , pp. 153-158
- Wan, E.¹ Merwe Der R.Van²

33
- 84865703906
- Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system
- L. Daubigney, M. Gašić, S. Chandramohan, M. Geist, O. Pietquin, and S. Young, "Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system," in Proc. Interspeech' 11, 2011.
- (2011) Proc. Interspeech' , vol.11
- Daubigney, L.¹ Gašić, M.² Chandramohan, S.³ Geist, M.⁴ Pietquin, O.⁵ Young, S.⁶

34
- 33745211240
- Learning user simulations for information state update dialogue systems
- (Interspeech - Eurospeech'05)
- K. Georgila, J. Henderson, and O. Lemon, "Learning user simulations for information state update dialogue systems," in Proc. Eur. Conf. Speech Commun. Technol. (Interspeech - Eurospeech'05), 2005.
- (2005) Proc. Eur. Conf. Speech Commun. Technol.
- Georgila, K.¹ Henderson, J.² Lemon, O.³

35
- 84859945481
- Adaptive information presentation for spoken dialogue systems: Evaluation with human subjects
- V. Rieser, S. Keizer, X. Liu, and O. Lemon, "Adaptive information presentation for spoken dialogue systems: Evaluation with human subjects," in Proc. Eur. Workshop Natural Lang. Generat. (ENLG'11), 2011.
- (2011) Proc. Eur. Workshop Natural Lang. Generat. (ENLG'11)
- Rieser, V.¹ Keizer, S.² Liu, X.³ Lemon, O.⁴

36
- 85048464801
- Agenda-based user simulation for bootstrapping a POMDP dialogue system
- J. Schatzmann, B. Thomson, K. Weilhammer, H. Ye, and S. Young, "Agenda-based user simulation for bootstrapping a POMDP dialogue system," in Proc. NAACL Conf. Human Lang. Technol. (HLT/NAACL'07), 2007.
- (2007) Proc. NAACL Conf. Human Lang. Technol. (HLT/NAACL'07)
- Schatzmann, J.¹ Thomson, B.² Weilhammer, K.³ Ye, H.⁴ Young, S.⁵

37
- 84865718200
- Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk
- F. Jurčíček, S. Keizer, M. Gašić, F. Mairesse, B. Thomson, K. Yu, and S. Young, "Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk," in Proc. Interspeech'11, 2011.
- (2011) Proc. Interspeech'11
- Jurčíček, F.¹ Keizer, S.² Gašić, M.³ Mairesse, F.⁴ Thomson, B.⁵ Yu, K.⁶ Young, S.⁷

38
- 84858956984
- On-line policy optimisation of spoken dialogue systems via live interaction with human subjects
- M. Gašić, F. Jurčíček, B. Thomson, K. Yu, and S. Young, "On-line policy optimisation of spoken dialogue systems via live interaction with human subjects," in Proc. Autom. Speech Recognit. Understand. Workshop (ASRU'11), 2011, pp. 312-317.
- (2011) Proc. Autom. Speech Recognit. Understand. Workshop (ASRU'11) , pp. 312-317
- Gašić, M.¹ Jurčíček, F.² Thomson, B.³ Yu, K.⁴ Young, S.⁵

39
- 70349231178
- The hidden information state model: A practical framework for POMDP-based spoken dialogue management
- S. Young, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, and K. Yu, "The hidden information state model: A practical framework for POMDP-based spoken dialogue management," Comput. Speech Lang., vol. 24, no. 2, pp. 150-174, 2010.
- (2010) Comput. Speech Lang. , vol.24 , Issue.2 , pp. 150-174
- Young, S.¹ Gašić, M.² Keizer, S.³ Mairesse, F.⁴ Schatzmann, J.⁵ Thomson, B.⁶ Yu, K.⁷

40
- 84880694195
- Stable function approximation in dynamic programming
- G.Gordon, "Stable function approximation in dynamic programming," in Proc. Int. Conf. Mach. Learn. (ICML'95), 1995.
- (1995) Proc. Int. Conf. Mach. Learn. (ICML'95)
- Gordon, G.¹

41
- 4644323293
- Least-squares policy iteration
- M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Mach. Learn. Res. (JMLR), vol. 4, pp. 1107-1149, 2003.
- (2003) J. Mach. Learn. Res. (JMLR) , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

42
- 84899834143
- Online exploration in leastsquares policy iteration
- L. Li, M. L. Littman, and C. R. Mansley, "Online exploration in leastsquares policy iteration," in Proc. Int. Conf. Autonom. Agents Multiagent Syst. (AAMAS'09), 2009, vol. 2, pp. 733-739.
- (2009) Proc. Int. Conf. Autonom. Agents Multiagent Syst. (AAMAS'09) , vol.2 , pp. 733-739
- Li, L.¹ Littman, M.L.² Mansley, C.R.³

43
- 31844451013
- Reinforcement learning with Gaussian processes
- Y. Engel, S. Mannor, and R. Meir, "Reinforcement learning with Gaussian processes," in Proc. Int. Conf. Mach. Learn. (ICML'05), 2005.
- (2005) Proc. Int. Conf. Mach. Learn. (ICML'05)
- Engel, Y.¹ Mannor, S.² Meir, R.³

44
- 84945284029
- Sparse online greedy support vector regression
- Y. Engel, S. Mannor, and R. Meir, "Sparse online greedy support vector regression," in Proc. Eur. Conf. Mach. Learn. (ECML'02, 2002, vol. 2430, pp. 84-96.
- (2002) Proc. Eur. Conf. Mach. Learn. (ECML'02 , vol.2430 , pp. 84-96
- Engel, Y.¹ Mannor, S.² Meir, R.³

45
- 84893350028
- An ISU dialogue system exhibiting reinforcement learning of dialogue policies: Generic slot-filling in the TALK in-car system
- O. Lemon,K. Georgila, J.Henderson, and M. Stuttle, "An ISU dialogue system exhibiting reinforcement learning of dialogue policies: Generic slot-filling in the TALK in-car system," in Proc. Conf. Eur. Chapter Assoc. for Comput. Linguist. (EACL'06), 2006, pp. 119-122.
- (2006) Proc. Conf. Eur. Chapter Assoc. for Comput. Linguist. (EACL'06) , pp. 119-122
- Lemon, O.¹ Georgila, K.² Henderson, J.³ Stuttle, M.⁴

46
- 84881039547
- Sample efficient on-line learning of optimal dialogue policies with Kalman temporal differences
- O. Pietquin,M. Geist, and S. Chandramohan, "Sample efficient on-line learning of optimal dialogue policies with Kalman temporal differences," in Proc. Int. Joint Conf. Artif. Intell. (IJCAI'11), 2011.
- (2011) Proc. Int. Joint Conf. Artif. Intell. (IJCAI'11)
- Pietquinm. Geist, O.¹ Chandramohan, S.²

47
- 0029276036
- Temporal difference learning and TD-Gammon
- G. Tesauro, "Temporal difference learning and TD-Gammon," Commun. Assoc. for Comput. Mach. (ACM), vol. 38, no. 3, pp. 58-68, 1995.
- (1995) Commun. Assoc. for Comput. Mach. (ACM) , vol.38 , Issue.3 , pp. 58-68
- Tesauro, G.¹

48
- 79951499926
- Statistically linearized least-squares temporal differences
- M. Geist and O. Pietquin, "Statistically linearized least-squares temporal differences," in Proc. IEEE Int. Conf. UltraModern Control Syst (ICUMT'10), 2010, pp. 450-457.
- (2010) Proc. IEEE Int. Conf. UltraModern Control Syst (ICUMT'10) , pp. 450-457
- Geist, M.¹ Pietquin, O.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.