메뉴 건너뛰기




Volumn 15, Issue , 2011, Pages 119-127

Dynamic policy programming with function approximation

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATE DYNAMIC PROGRAMMING; ASYMPTOTIC BOUNDS; DYNAMIC POLICY; FUNCTION APPROXIMATION; FUNCTION APPROXIMATION TECHNIQUES; INFINITE HORIZONS; LOSS BOUNDS; MARKOV DECISION PROBLEM; OPTIMAL POLICIES;

EID: 84862300689     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Conference Paper
Times cited : (33)

References (22)
  • 2
    • 35248866146 scopus 로고    scopus 로고
    • An introduction to reinforcement learning theory: Value function methods
    • Bartlett, P. L. (2003). An introduction to reinforcement learning theory: Value function methods. Lecture Notes in Artificial Intelligence, 2600/2003:184-202.
    • (2003) Lecture Notes in Artificial Intelligence , vol.2600 , Issue.2003 , pp. 184-202
    • Bartlett, P.L.1
  • 4
  • 7
    • 85153940465 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • Boyan, J. A. and Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems, pages 369-376.
    • (1995) Advances in Neural Information Processing Systems , pp. 369-376
    • Boyan, J.A.1    Moore, A.W.2
  • 8
    • 0034342516 scopus 로고    scopus 로고
    • On the existence of fixed points for approximate value iteration and temporal-difference learning
    • Farias, D. P. and Roy, B. V. (2000). On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105(3):589-608.
    • (2000) Journal of Optimization Theory and Applications , vol.105 , Issue.3 , pp. 589-608
    • Farias, D.P.1    Roy, B.V.2
  • 9
    • 33646243319 scopus 로고    scopus 로고
    • Natural policy gradient
    • Vancouver, British Columbia, Canada
    • Kakade, S. (2001). Natural policy gradient. In Advances in Neural Information Processing Systems 14, pages 1531-1538, Vancouver, British Columbia, Canada.
    • (2001) Advances in Neural Information Processing Systems , vol.14 , pp. 1531-1538
    • Kakade, S.1
  • 10
    • 29044440299 scopus 로고    scopus 로고
    • Path integrals and symmetry breaking for optimal control theory
    • Kappen, H. J. (2005). Path integrals and symmetry breaking for optimal control theory. Statistical Mechanics, 2005(11):P11011.
    • (2005) Statistical Mechanics , vol.2005 , Issue.11 , pp. 11011
    • Kappen, H.J.1
  • 15
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • Peters, J. and Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71(7-9):1180-1190.
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 16
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Succesful examples using sparse coarse coding
    • Sutton, R. S. (1996). Generalization in reinforcement learning: succesful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 9, pages 1038-1044.
    • (1996) Advances in Neural Information Processing Systems , vol.9 , pp. 1038-1044
    • Sutton, R.S.1
  • 18
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Denver, Colorado, USA
    • Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057-1063, Denver, Colorado, USA.
    • (1999) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 19
    • 77956531256 scopus 로고    scopus 로고
    • Reinforcement learning algorithms for mdps - A survey
    • University of Alberta, Edmonton, Alberta, Canada
    • Szepesvari, C. (2009). Reinforcement learning algorithms for mdps - a survey. Technical Report TR09-13, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
    • (2009) Technical Report TR09-13, Department of Computing Science
    • Szepesvari, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.