SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010

Volumn , Issue , 2010, Pages

Double Q-learning

(1) Van Hasselt, Hado a

a CWI (Netherlands)

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; REINFORCEMENT LEARNING;

EXPECTED VALUES; OPTIMAL POLICIES; POOR PERFORMANCE; POSITIVE BIAS; Q-LEARNING; REINFORCEMENT LEARNING ALGORITHMS; STOCHASTIC ENVIRONMENT;

STOCHASTIC SYSTEMS;

EID: 85161998941 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1644)

References (24)

1
- 0004049893
- PhD thesis, King's College, Cambridge, England
- C. J. C. H.Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, England, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

2
- 34249833101
- Q-learning
- C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8:279-292, 1992.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

3
- 85012688561
- Princeton University Press
- R. Bellman. Dynamic Programming. Princeton University Press, 1957.
- (1957) Dynamic Programming
- Bellman, R.¹

4
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6:1185-1201, 1994.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

5
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- J. N. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 16:185-202, 1994.
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

6
- 0001961616
- A generalized reinforcement-learning model: Convergence and applications
- L. Saitta, editor, Bari, Italy. Morgan Kaufmann
- M. L. Littman and C. Szepesvári. A generalized reinforcement-learning model: Convergence and applications. In L. Saitta, editor, Proceedings of the 13th International Conference on Machine Learning (ICML-96), pages 310-318, Bari, Italy, 1996. Morgan Kaufmann.
- (1996) Proceedings of the 13th International Conference on Machine Learning (ICML-96) , pp. 310-318
- Littman, M.L.¹ Szepesvári, C.²

7
- 85156187730
- Improving elevator performance using reinforcement learning
- D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Cambridge MA. MIT Press
- R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017-1023, Cambridge MA, 1996. MIT Press.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1017-1023
- Crites, R.H.¹ Barto, A.G.²

8
- 0036058423
- Effective reinforcement learning for mobile robots
- Washington, DC, USA
- W. D. Smart and L. P. Kaelbling. Effective reinforcement learning for mobile robots. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation (ICRA 2002), pages 3404-3410, Washington, DC, USA, 2002.
- (2002) Proceedings of the 2002 IEEE International Conference on Robotics and Automation (ICRA 2002) , pp. 3404-3410
- Smart, W.D.¹ Kaelbling, L.P.²

9
- 49049105169
- Ensemble algorithms in reinforcement learning
- M. A. Wiering and H. P. van Hasselt. Ensemble algorithms in reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 38(4):930-936, 2008.
- (2008) IEEE Transactions on Systems, Man, and Cybernetics, Part B , vol.38 , Issue.4 , pp. 930-936
- Wiering, M.A.¹ Van Hasselt, H.P.²

10
- 34250700033
- PAC model-free reinforcement learning
- ACM
- A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman. PAC model-free reinforcement learning. In Proceedings of the 23rd international conference onMachine learning, pages 881-888. ACM, 2006.
- (2006) Proceedings of the 23rd International Conference OnMachine Learning , pp. 881-888
- Strehl, A.L.¹ Li, L.² Wiewiora, E.³ Langford, J.⁴ Littman, M.L.⁵

11
- 84899026236
- Finite-sample convergence rates for Q-learning and indirect algorithms
- MIT Press
- M. J. Kearns and S. P. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Neural Information Processing Systems 12, pages 996-1002. MIT Press, 1999.
- (1999) Neural Information Processing Systems , vol.12 , pp. 996-1002
- Kearns, M.J.¹ Singh, S.P.²

12
- 21844465127
- Tree-based batch mode reinforcement learning
- D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(1):503-556, 2005.
- (2005) Journal of Machine Learning Research , vol.6 , Issue.1 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

13
- 84898998140
- The asymptotic convergence-rate of Q-learning
- Cambridge, MA, USA. MIT Press
- C. Szepesvári. The asymptotic convergence-rate of Q-learning. In NIPS '97: Proceedings of the 1997 conference on Advances in neural information processing systems 10, pages 1064- 1070, Cambridge, MA, USA, 1998. MIT Press.
- (1998) NIPS '97: Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems , vol.10 , pp. 1064-1070
- Szepesvári, C.¹

14
- 14344266002
- Learning rates for Q-learning
- E. Even-Dar and Y. Mansour. Learning rates for Q-learning. Journal of Machine Learning Research, 5:1-25, 2003.
- (2003) Journal of Machine Learning Research , vol.5 , pp. 1-25
- Even-Dar, E.¹ Mansour, Y.²

15
- 31344446857
- Rational overoptimism (and other biases
- September
- E. Van den Steen. Rational overoptimism (and other biases). American Economic Review, 94(4):1141-1151, September 2004.
- (2004) American Economic Review , vol.94 , Issue.4 , pp. 1141-1151
- Van Den Steen, E.¹

16
- 33644898597
- The optimizer's curse: Skepticism and postdecision surprise in decision analysis
- J. E. Smith and R. L. Winkler. The optimizer's curse: Skepticism and postdecision surprise in decision analysis. Management Science, 52(3):311-322, 2006.
- (2006) Management Science , vol.52 , Issue.3 , pp. 311-322
- Smith, J.E.¹ Winkler, R.L.²

17
- 0015071523
- Bidding in high risk situations
- E. Capen, R. Clapp, and T. Campbell. Bidding in high risk situations. Journal of Petroleum Technology, 23:641-653, 1971.
- (1971) Journal of Petroleum Technology , vol.23 , pp. 641-653
- Capen, E.¹ Clapp, R.² Campbell, T.³

18
- 0001520893
- Anomalies: The winner's curse
- Winter
- R. H. Thaler. Anomalies: The winner's curse. Journal of Economic Perspectives, 2(1):191-202, Winter 1988.
- (1988) Journal of Economic Perspectives , vol.2 , Issue.1 , pp. 191-202
- Thaler, R.H.¹

19
- 34250609333
- Sur les fonctions convexes et les inégalités entre les valeurs moyennes
- J. L. W. V. Jensen. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Journal Acta Mathematica, 30(1):175-193, 1906.
- (1906) Journal Acta Mathematica , vol.30 , Issue.1 , pp. 175-193
- Jensen, J.L.W.V.¹

20
- 0033901602
- Convergence results for single-step on-policy reinforcement-learning algorithms
- S. P. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000.
- (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
- Singh, S.P.¹ Jaakkola, T.² Littman, M.L.³ Szepesvári, C.⁴

21
- 0029679044
- Reinforcement learning: A survey
- L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237-285, 1996.
- (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

22
- 0004102479
- The MIT press, Cambridge MA
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT press, Cambridge MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

23
- 34547966991
- Multi-armed bandit problems with dependent arms
- ACM
- S. Pandey, D. Chakrabarti, and D. Agarwal. Multi-armed bandit problems with dependent arms. In Proceedings of the 24th international conference on Machine learning, pages 721- 728. ACM, 2007.
- (2007) Proceedings of the 24th International Conference on Machine Learning , pp. 721-728
- Pandey, S.¹ Chakrabarti, D.² Agarwal, D.³

24
- 77956890234
- Monte Carlo sampling methods using Markov chains and their applications
- W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, pages 97-109, 1970.
- (1970) Biometrika , pp. 97-109
- Hastings, W.K.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.