SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011

Volumn , Issue , 2011, Pages

Speedy Q-learning

(4) Gheshlaghi Azar, Mohammad a Munos, Remi b Ghavamzadeh, Mohammad b Kappen, Hilbert J a

a RADBOUD UNIVERSITY NIJMEGEN (Netherlands)

b INRIA (France)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; LEARNING ALGORITHMS;

DISCOUNT FACTORS; HIGH PROBABILITY; MODEL FREE; OPTIMAL ACTIONS; PAC BOUNDS; PERFORMANCE; Q-LEARNING; Q-LEARNING ALGORITHMS; SLOW CONVERGENCES; VALUE FUNCTIONS;

ITERATIVE METHODS;

EID: 85162416897 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (131)

References (20)

1
- 85161978146
- Fitted Q-iteration in continuous action-space MDPs
- A. Antos, R. Munos, and Cs. Szepesvári. Fitted Q-iteration in continuous action-space MDPs. In Proceedings of the 21st Annual Conference on Neural Information Processing Systems, 2007.
- (2007) Proceedings of the 21st Annual Conference on Neural Information Processing Systems
- Antos, A.¹ Munos, R.² Szepesvári, Cs.³

2
- 84860643935
- Technical Report inria-00636615 INRIA
- M. Gheshlaghi Azar, R. Munos, M. Ghavamzadeh, and H.J. Kappen. Reinforcement learning with a near optimal rate of convergence. Technical Report inria-00636615, INRIA, 2011.
- (2011) Reinforcement Learning with A Near Optimal Rate of Convergence
- Gheshlaghi Azar, M.¹ Munos, R.² Ghavamzadeh, M.³ Kappen, H.J.⁴

3
- 80053161827
- REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs
- P. L. Bartlett and A. Tewari. REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009.
- (2009) Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence
- Bartlett, P.L.¹ Tewari, A.²

4
- 0003565783
- Athena Scientific, Belmount, Massachusetts, third edition
- D. P. Bertsekas. Dynamic Programming and Optimal Control, volume II. Athena Scientific, Belmount, Massachusetts, third edition, 2007.
- (2007) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

5
- 0003487482
- Athena Scientific, Belmont, Massachusetts
- D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, Massachusetts, 1996.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

6
- 84926078662
- Cambridge University Press, New York, NY, USA
- N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006.
- (2006) Prediction, Learning, and Games
- Cesa-Bianchi, N.¹ Lugosi, G.²

7
- 84937398609
- PAC bounds for multi-armed bandit and Markov decision processes
- E. Even-Dar, S. Mannor, and Y. Mansour. PAC bounds for multi-armed bandit and Markov decision processes. In 15th Annual Conference on Computational Learning Theory, pages 255-270, 2002.
- (2002) 15th Annual Conference on Computational Learning Theory , pp. 255-270
- Even-Dar, E.¹ Mannor, S.² Mansour, Y.³

8
- 14344266002
- Learning rates for q-learning
- E. Even-Dar and Y. Mansour. Learning rates for Q-learning. Journal of Machine Learning Research, 5:1-25, 2003.
- (2003) Journal of Machine Learning Research , vol.5 , pp. 1-25
- Even-Dar, E.¹ Mansour, Y.²

9
- 0003421261
- Wiley
- W. Feller. An Introduction to Probability Theory and Its Applications, volume 1. Wiley, 1968.
- (1968) An Introduction to Probability Theory and Its Applications , vol.1
- Feller, W.¹

10
- 0000439891
- On the convergence of stochastic iterative dynamic programming
- T. Jaakkola, M. I. Jordan, and S. Singh. On the convergence of stochastic iterative dynamic programming. Neural Computation, 6(6):1185-1201, 1994.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.³

11
- 77951952841
- Near-optimal regret bounds for reinforcement learning
- T. Jaksch, R. Ortner, and P. Auer. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11:1563-1600, 2010.
- (2010) Journal of Machine Learning Research , vol.11 , pp. 1563-1600
- Jaksch, T.¹ Ortner, R.² Auer, P.³

12
- 84899026236
- Finite-sample convergence rates for Q-learning and indirect algorithms
- MIT Press
- M. Kearns and S. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 12, pages 996-1002. MIT Press, 1999.
- (1999) Advances in Neural Information Processing Systems , vol.12 , pp. 996-1002
- Kearns, M.¹ Singh, S.²

13
- 44649189852
- Finite-time bounds for fitted value iteration
- R.Munos and Cs. Szepesvári. Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, 9:815-857, 2008.
- (2008) Journal of Machine Learning Research , vol.9 , pp. 815-857
- Munos, R.¹ Szepesvári, Cs.²

14
- 0000955979
- Incremental multi-step q-learning
- J. Peng and R. J. Williams. Incremental multi-step Q-learning. Machine Learning, 22(1-3):283-290, 1996.
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 283-290
- Peng, J.¹ Williams, R.J.²

15
- 73549084301
- Reinforcement learning in finite MDPs: PAC analysis
- A. L. Strehl, L. Li, and M. L. Littman. Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10:2413-2444, 2009.
- (2009) Journal of Machine Learning Research , vol.10 , pp. 2413-2444
- Strehl, A.L.¹ Li, L.² Littman, M.L.³

16
- 0004102479
- MIT Press, Cambridge, Massachusetts
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, Massachusetts, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

17
- 85064804512
- The asymptotic convergence-rate of Q-learning
- Cs. Szepesvári. The asymptotic convergence-rate of Q-learning. In Advances in Neural Information Processing Systems 10, Denver, Colorado, USA, 1997, 1997.
- (1997) Advances in Neural Information Processing Systems 10, Denver, Colorado, USA 1997
- Szepesvári, Cs.¹

18
- 77956520676
- Model-based reinforcement learning with nearly tight exploration complexity bounds
- Omnipress
- I. Szita and Cs. Szepesvári. Model-based reinforcement learning with nearly tight exploration complexity bounds. In Proceedings of the 27th International Conference onMachine Learning, pages 1031-1038. Omnipress, 2010.
- (2010) Proceedings of the 27th International Conference on Machine Learning , pp. 1031-1038
- Szita, I.¹ Szepesvári, Cs.²

19
- 85161998941
- Double q-learning
- H. van Hasselt. Double Q-learning. In Advances in Neural Information Processing Systems 23, pages 2613-2621, 2010.
- (2010) Advances in Neural Information Processing Systems , vol.23 , pp. 2613-2621
- Van Hasselt, H.¹

20
- 0004049893
- PhD thesis, Kings College, Cambridge, England
- C.Watkins. Learning from Delayed Rewards. PhD thesis, Kings College, Cambridge, England, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.