SCOPUS 정보 검색 플랫폼

Proceedings of the International Joint Conference on Neural Networks

Volumn , Issue , 2012, Pages

The divergence of reinforcement learning algorithms with value- iteration and function approximation

(2) Fairbank, Michael a Alonso, Eduardo a

a CITY UNIVERSITY (United Kingdom)

Author keywords

Adaptive Dynamic Programming; Divergence; Greedy Policy; Reinforcement Learning; Value Iteration

Indexed keywords

ADAPTIVE DYNAMIC PROGRAMMING; DIVERGENCE; FUNCTION APPROXIMATION; FUNCTION APPROXIMATORS; GREEDY POLICY; VALUE FUNCTIONS; VALUE ITERATION;

DYNAMIC PROGRAMMING; LEARNING ALGORITHMS; NEURAL NETWORKS; REINFORCEMENT LEARNING;

ADAPTIVE ALGORITHMS;

EID: 84865066281 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IJCNN.2012.6252792 Document Type: Conference Paper

Times cited : (23)

References (21)

1
- 66449130966
- Adaptive dynamic programming: An introduction
- F.-Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: An introduction," IEEE Computational Intelligence Magazine, pp. 39-47, 2009.
- (2009) IEEE Computational Intelligence Magazine , pp. 39-47
- Wang, F.-Y.¹ Zhang, H.² Liu, D.³

2
- 0004102479
- Cambridge Massachussetts USA: The MIT Press
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, Massachussetts, USA: The MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

3
- 85012688561
- Princeton NJ USA: Princeton University Press
- R. E. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1957.
- (1957) Dynamic Programming
- Bellman, R.E.¹

4
- 0003636089
- On-line q-learning using connectionist systems
- Cambridge University Engineering Department
- G. Rummery and M. Niranjan, "On-line q-learning using connectionist systems," Tech. Rep. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.
- (1994) Tech. Rep. Technical Report CUED/F-INFENG/TR 166
- Rummery, G.¹ Niranjan, M.²

5
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

6
- 0002031779
- Approximating dynamic programming for real-time control and neural modeling
- editors White and Sofge, Chapter 13
- P. J. Werbos, "Approximating dynamic programming for real-time control and neural modeling." Handbook of Intelligent Control, editors White and Sofge, Chapter 13, pp. 493-525, 1992.
- (1992) Handbook of Intelligent Control , pp. 493-525
- Werbos, P.J.¹

7
- 0031236002
- Adaptive critic designs
- September
- D. Prokhorov and D. Wunsch, "Adaptive critic designs," IEEE Transactions on Neural Networks, vol. September, pp. 997-1007, 1997.
- (1997) IEEE Transactions on Neural Networks , pp. 997-1007
- Prokhorov, D.¹ Wunsch, D.²

8
- 85032189594
- Model-based adaptive critic designs
- editors Jennie Si et al.
- S. Ferrari and R. F. Stengel, "Model-based adaptive critic designs," Handbook of learning and approximate dynamic programming, editors Jennie Si et al., pp. 65-96, 2004.
- (2004) Handbook of Learning and Approximate Dynamic Programming , pp. 65-96
- Ferrari, S.¹ Stengel, R.F.²

9
- 84865070696
- eprint arXiv: 1101.0428
- M. Fairbank and E. Alonso, "The local optimality of reinforcement learning by value gradients and its relationship to policy gradient learning," eprint arXiv:1101.0428, 2011.
- (2011) The Local Optimality of Reinforcement Learning by Value Gradients and Its Relationship to Policy Gradient Learning
- Fairbank, M.¹ Alonso, E.²

10
- 84865069763
- Value-gradient learning
- IEEE Press
- - "Value-gradient learning," in Proceedings of the IEEE International Joint Conference on Neural Networks 2012 (IJCNN'12). IEEE Press, 2012.
- (2012) Proceedings of the IEEE International Joint Conference on Neural Networks 2012 (IJCNN'12)
- Fairbank, M.¹ Alonso, E.²

11
- 0008813539
- An analysis of temporal-difference learning with function approximation
- J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Transactions on Automatic Control, Tech. Rep., 1996.
- (1996) IEEE Transactions on Automatic Control, Tech. Rep.
- Tsitsiklis, J.N.¹ Van Roy, B.²

12
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- New York, NY, USA: ACM
- R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora, "Fast gradient-descent methods for temporal-difference learning with linear function approximation," in Proceedings of the 26th Annual International Conference on Machine Learning, ser. ICML '09. New York, NY, USA: ACM, 2009, pp. 993-1000.
- (2009) Proceedings of the 26th Annual International Conference on Machine Learning, Ser. ICML '09 , pp. 993-1000
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvári, C.⁶ Wiewiora, E.⁷

13
- 79951481923
- Convergent temporal-difference learning with arbitrary smooth function approximation
- MIT Press
- H. Maei, C. Szepesvari, S. Bhatnager, D. Precup, D. Silver, and R. Sutton, "Convergent temporal-difference learning with arbitrary smooth function approximation," in Advances in Neural Information Processing Systems (NIPS'09). MIT Press, 2009.
- (2009) Advances in Neural Information Processing Systems (NIPS'09)
- Maei, H.¹ Szepesvari, C.² Bhatnager, S.³ Precup, D.⁴ Silver, D.⁵ Sutton, R.⁶

14
- 0003950434
- eprint arXiv:adap-org/9810001
- P. J. Werbos, "Stable adaptive control using new critic designs," eprint arXiv:adap-org/9810001, 1998.
- (1998) Stable Adaptive Control Using New Critic Designs
- Werbos, P.J.¹

15
- 84865080650
- eprint arXiv: 0803.3539
- M. Fairbank, "Reinforcement learning by value gradients," eprint arXiv:0803.3539, 2008.
- (2008) Reinforcement Learning by Value Gradients
- Fairbank, M.¹

16
- 0004049893
- Ph.D. dissertation, Cambridge University
- C. J. C. H. Watkins, "Learning from delayed rewards," Ph.D. dissertation, Cambridge University, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

17
- 84865077338
- A comparison of learning speed and ability to cope without exploration between DHP and TD(0)
- IEEE Press
- M. Fairbank and E. Alonso, "A comparison of learning speed and ability to cope without exploration between DHP and TD(0)," in Proceedings of the IEEE International Joint Conference on Neural Networks 2012 (IJCNN'12). IEEE Press, 2012.
- (2012) Proceedings of the IEEE International Joint Conference on Neural Networks 2012 (IJCNN'12)
- Fairbank, M.¹ Alonso, E.²

18
- 84865080654
- eprint arXiv: 1107.4606
- - "The divergence of reinforcement learning algorithms with valueiteration and function approximation," eprint arXiv:1107.4606, 2011.
- (2011) The Divergence of Reinforcement Learning Algorithms with Valueiteration and Function Approximation
- Fairbank, M.¹ Alonso, E.²

19
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- L. C. Baird, "Residual algorithms: Reinforcement learning with function approximation," in International Conference on Machine Learning, 1995, pp. 30-37.
- (1995) International Conference on Machine Learning , pp. 30-37
- Baird, L.C.¹

20
- 0029752470
- Feature-based methods for large scale dynamic programming
- J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large scale dynamic programming," Machine Learning, vol. 22, no. 1-3, pp. 59-94, 1996.
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

21
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- R. S. Sutton, D. Mcallester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems 12, vol. 12, 2000, pp. 1057-1063.
- (2000) Advances in Neural Information Processing Systems 12 , vol.12 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.