SCOPUS 정보 검색 플랫폼

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

Volumn , Issue , 2013, Pages 142-161

Approximating Optimal Control with Value Gradient Learning

(3) Fairbank, Michael a Prokhorov, Danil b Alonso, Eduardo a

b Toyota Motor North America Inc (United States)

Author keywords

ADP, DHP and "bootstrapping" parameter; Approximating optimal control with VGL; Critic learning, DHP GDHP into VGL( ); VGL and BPTT algorithms; VGL with a greedy policy, VGL( ) algorithm

Indexed keywords

EID: 84886350301 PISSN: None EISSN: None Source Type: Book
DOI: 10.1002/9781118453988.ch7 Document Type: Chapter

Times cited : (7)

References (19)

1
- 66449130966
- Adaptive dynamic programming: an introduction
- F.-Y.Wang, H. Zhang, and D. Liu. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 4(2):39-47, 2009.
- (2009) IEEE Computational Intelligence Magazine , vol.4 , Issue.2 , pp. 39-47
- Wang, F.-Y.¹ Zhang, H.² Liu, D.³

2
- 0008011457
- Neural networks, system identification, and control in the chemical process industries
- White and Sofge, editors, Van Nostrant Reinhold, New York
- P.J. Werbos. Neural networks, system identification, and control in the chemical process industries. In White and Sofge, editors. Handbook of Intelligent Control. Van Nostrant Reinhold, New York, 1992, pp. 283-356.
- (1992) Handbook of Intelligent Control. , pp. 283-356
- Werbos, P.J.¹

3
- 85012688561
- Dynamic Programming
- Princeton University Press, Princeton, NJ
- R.E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
- (1957)
- Bellman, R.E.¹

4
- 0002031779
- Approximating dynamic programming for real-time control and neural modeling
- White and Sofge, editors, Van Nostrant Reinhold,New York
- P.J.Werbos. Approximating dynamic programming for real-time control and neural modeling. In White and Sofge, editors. Handbook of Intelligent Control. Van Nostrant Reinhold,New York, 1992, pp. 493-525.
- (1992) Handbook of Intelligent Control. , pp. 493-525
- Werbos, P.J.¹

5
- 0031236002
- Adaptive critic designs
- D. Prokhorov and D. Wunsch. Adaptive critic designs. IEEE Transactions on Neural Networks, 8(5):997-1007, 1997.
- (1997) IEEE Transactions on Neural Networks , vol.8 , Issue.5 , pp. 997-1007
- Prokhorov, D.¹ Wunsch, D.²

6
- 33847202724
- Learning to predict by the methods of temporal differences
- R.S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44, 1988.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

7
- 0004049893
- Learning from Delayed Rewards
- PhD thesis, Cambridge University
- C.J.C.H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.
- (1989)
- Watkins, C.J.C.H.¹

8
- 0025503558
- Backpropagation through time: What it does and how to do it
- P.J. Werbos. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78(10):1550-1560, 1990.
- (1990) Proceedings of the IEEE , vol.78 , Issue.10 , pp. 1550-1560
- Werbos, P.J.¹

9
- 0003950434
- Stable adaptive control using new critic designs
- eprint arXiv:adaporg/ 9810001
- P.J. Werbos. Stable adaptive control using new critic designs. eprint arXiv:adaporg/ 9810001, 1998.
- (1998)
- Werbos, P.J.¹

10
- 0008813539
- An analysis of temporal-difference learning with function approximation
- Technical Report LIDS-P-2322
- J.N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. Technical Report LIDS-P-2322, 1996.
- (1996)
- Tsitsiklis, J.N.¹ Van Roy, B.²

11
- 84865080654
- The divergence of reinforcement learning algorithms with value-iteration and function approximation
- eprint arXiv:1107.4606
- M. Fairbank and E. Alonso. The divergence of reinforcement learning algorithms with value-iteration and function approximation. eprint arXiv:1107.4606, 2011.
- (2011)
- Fairbank, M.¹ Alonso, E.²

12
- 85032189594
- Model-based adaptive critic designs
- J. Si, et al., editors., Wiley-IEEE Press, New York
- S. Ferrari and R.F. Stengel. Model-based adaptive critic designs. In J. Si, et al., editors. Handbook of Learning and Approximate Dynamic Programming.Wiley-IEEE Press, New York, 2004, pp. 65-96.
- (2004) Handbook of Learning and Approximate Dynamic Programming. , pp. 65-96
- Ferrari, S.¹ Stengel, R.F.²

13
- 84865070696
- The local optimality of reinforcement learning by value gradients and its relationship to policy gradient learning
- eprint arXiv:1101.0428
- M. Fairbank and E. Alonso. The local optimality of reinforcement learning by value gradients and its relationship to policy gradient learning. eprint arXiv:1101.0428, 2011.
- (2011)
- Fairbank, M.¹ Alonso, E.²

14
- 84865080650
- Reinforcement learning by value gradients
- eprint arXiv:0803.3539
- M. Fairbank. Reinforcement learning by value gradients. eprint arXiv:0803.3539, 2008.
- (2008)
- Fairbank, M.¹

15
- 0033629916
- Reinforcement learning in continuous time and space
- K. Doya. Reinforcement learning in continuous time and space. Neural Computation, 12(1):219-245, 2000.
- (2000) Neural Computation , vol.12 , Issue.1 , pp. 219-245
- Doya, K.¹

16
- 80053166137
- Finite-horizon input-constrained nonlinear optimal control using single network adaptive critics
- Ali Heydari and S.N. Balakrishnan. Finite-horizon input-constrained nonlinear optimal control using single network adaptive critics. American Control Conference ACC, 2011, pp. 3047-3052.
- (2011) American Control Conference ACC , pp. 3047-3052
- Heydari, A.¹ Balakrishnan, S.N.²

17
- 0003000735
- Faster-learning variations on back-propagation: an empirical study
- San Mateo, CA, Morgan Kaufmann
- S.E. Fahlman. Faster-learning variations on back-propagation: an empirical study. In Proceedings of the 1988 Connectionist Summer School, pp. 38-51, San Mateo, CA, 1988. Morgan Kaufmann.
- (1988) Proceedings of the 1988 Connectionist Summer School , pp. 38-51
- Fahlman, S.E.¹

18
- 0003487601
- Neural Networks for Pattern Recognition
- Oxford University Press
- C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
- (1995)
- Bishop, C.M.¹

19
- 84943274699
- Adirect adaptive method for faster backpropagation learning: The RPROP algorithm
- San Francisco, CA
- M. Riedmiller and H. Braun.Adirect adaptive method for faster backpropagation learning: The RPROP algorithm. In Proceedings of the IEEE International Conference on Neural Networks, pp. 586-591, San Francisco, CA, 1993.
- (1993) Proceedings of the IEEE International Conference on Neural Networks , pp. 586-591
- Riedmiller, M.¹ Braun, H.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.