메뉴 건너뛰기




Volumn , Issue , 2013, Pages 142-161

Approximating Optimal Control with Value Gradient Learning

Author keywords

ADP, DHP and "bootstrapping" parameter; Approximating optimal control with VGL; Critic learning, DHP GDHP into VGL( ); VGL and BPTT algorithms; VGL with a greedy policy, VGL( ) algorithm

Indexed keywords


EID: 84886350301     PISSN: None     EISSN: None     Source Type: Book    
DOI: 10.1002/9781118453988.ch7     Document Type: Chapter
Times cited : (7)

References (19)
  • 2
    • 0008011457 scopus 로고
    • Neural networks, system identification, and control in the chemical process industries
    • White and Sofge, editors, Van Nostrant Reinhold, New York
    • P.J. Werbos. Neural networks, system identification, and control in the chemical process industries. In White and Sofge, editors. Handbook of Intelligent Control. Van Nostrant Reinhold, New York, 1992, pp. 283-356.
    • (1992) Handbook of Intelligent Control. , pp. 283-356
    • Werbos, P.J.1
  • 3
    • 85012688561 scopus 로고
    • Dynamic Programming
    • Princeton University Press, Princeton, NJ
    • R.E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
    • (1957)
    • Bellman, R.E.1
  • 4
    • 0002031779 scopus 로고
    • Approximating dynamic programming for real-time control and neural modeling
    • White and Sofge, editors, Van Nostrant Reinhold,New York
    • P.J.Werbos. Approximating dynamic programming for real-time control and neural modeling. In White and Sofge, editors. Handbook of Intelligent Control. Van Nostrant Reinhold,New York, 1992, pp. 493-525.
    • (1992) Handbook of Intelligent Control. , pp. 493-525
    • Werbos, P.J.1
  • 6
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R.S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 7
    • 0004049893 scopus 로고
    • Learning from Delayed Rewards
    • PhD thesis, Cambridge University
    • C.J.C.H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.
    • (1989)
    • Watkins, C.J.C.H.1
  • 8
    • 0025503558 scopus 로고
    • Backpropagation through time: What it does and how to do it
    • P.J. Werbos. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78(10):1550-1560, 1990.
    • (1990) Proceedings of the IEEE , vol.78 , Issue.10 , pp. 1550-1560
    • Werbos, P.J.1
  • 9
    • 0003950434 scopus 로고    scopus 로고
    • Stable adaptive control using new critic designs
    • eprint arXiv:adaporg/ 9810001
    • P.J. Werbos. Stable adaptive control using new critic designs. eprint arXiv:adaporg/ 9810001, 1998.
    • (1998)
    • Werbos, P.J.1
  • 10
    • 0008813539 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Technical Report LIDS-P-2322
    • J.N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. Technical Report LIDS-P-2322, 1996.
    • (1996)
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 11
    • 84865080654 scopus 로고    scopus 로고
    • The divergence of reinforcement learning algorithms with value-iteration and function approximation
    • eprint arXiv:1107.4606
    • M. Fairbank and E. Alonso. The divergence of reinforcement learning algorithms with value-iteration and function approximation. eprint arXiv:1107.4606, 2011.
    • (2011)
    • Fairbank, M.1    Alonso, E.2
  • 13
    • 84865070696 scopus 로고    scopus 로고
    • The local optimality of reinforcement learning by value gradients and its relationship to policy gradient learning
    • eprint arXiv:1101.0428
    • M. Fairbank and E. Alonso. The local optimality of reinforcement learning by value gradients and its relationship to policy gradient learning. eprint arXiv:1101.0428, 2011.
    • (2011)
    • Fairbank, M.1    Alonso, E.2
  • 14
    • 84865080650 scopus 로고    scopus 로고
    • Reinforcement learning by value gradients
    • eprint arXiv:0803.3539
    • M. Fairbank. Reinforcement learning by value gradients. eprint arXiv:0803.3539, 2008.
    • (2008)
    • Fairbank, M.1
  • 15
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • K. Doya. Reinforcement learning in continuous time and space. Neural Computation, 12(1):219-245, 2000.
    • (2000) Neural Computation , vol.12 , Issue.1 , pp. 219-245
    • Doya, K.1
  • 16
    • 80053166137 scopus 로고    scopus 로고
    • Finite-horizon input-constrained nonlinear optimal control using single network adaptive critics
    • Ali Heydari and S.N. Balakrishnan. Finite-horizon input-constrained nonlinear optimal control using single network adaptive critics. American Control Conference ACC, 2011, pp. 3047-3052.
    • (2011) American Control Conference ACC , pp. 3047-3052
    • Heydari, A.1    Balakrishnan, S.N.2
  • 17
    • 0003000735 scopus 로고
    • Faster-learning variations on back-propagation: an empirical study
    • San Mateo, CA, Morgan Kaufmann
    • S.E. Fahlman. Faster-learning variations on back-propagation: an empirical study. In Proceedings of the 1988 Connectionist Summer School, pp. 38-51, San Mateo, CA, 1988. Morgan Kaufmann.
    • (1988) Proceedings of the 1988 Connectionist Summer School , pp. 38-51
    • Fahlman, S.E.1
  • 18
    • 0003487601 scopus 로고
    • Neural Networks for Pattern Recognition
    • Oxford University Press
    • C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
    • (1995)
    • Bishop, C.M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.