메뉴 건너뛰기




Volumn , Issue , 2012, Pages

The divergence of reinforcement learning algorithms with value- iteration and function approximation

Author keywords

Adaptive Dynamic Programming; Divergence; Greedy Policy; Reinforcement Learning; Value Iteration

Indexed keywords

ADAPTIVE DYNAMIC PROGRAMMING; DIVERGENCE; FUNCTION APPROXIMATION; FUNCTION APPROXIMATORS; GREEDY POLICY; VALUE FUNCTIONS; VALUE ITERATION;

EID: 84865066281     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IJCNN.2012.6252792     Document Type: Conference Paper
Times cited : (23)

References (21)
  • 3
    • 85012688561 scopus 로고
    • Princeton NJ USA: Princeton University Press
    • R. E. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1957.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 4
    • 0003636089 scopus 로고
    • On-line q-learning using connectionist systems
    • Cambridge University Engineering Department
    • G. Rummery and M. Niranjan, "On-line q-learning using connectionist systems," Tech. Rep. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.
    • (1994) Tech. Rep. Technical Report CUED/F-INFENG/TR 166
    • Rummery, G.1    Niranjan, M.2
  • 5
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 6
    • 0002031779 scopus 로고
    • Approximating dynamic programming for real-time control and neural modeling
    • editors White and Sofge, Chapter 13
    • P. J. Werbos, "Approximating dynamic programming for real-time control and neural modeling." Handbook of Intelligent Control, editors White and Sofge, Chapter 13, pp. 493-525, 1992.
    • (1992) Handbook of Intelligent Control , pp. 493-525
    • Werbos, P.J.1
  • 19
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • L. C. Baird, "Residual algorithms: Reinforcement learning with function approximation," in International Conference on Machine Learning, 1995, pp. 30-37.
    • (1995) International Conference on Machine Learning , pp. 30-37
    • Baird, L.C.1
  • 20
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large scale dynamic programming," Machine Learning, vol. 22, no. 1-3, pp. 59-94, 1996.
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.