메뉴 건너뛰기




Volumn 13, Issue , 2012, Pages 3207-3245

Dynamic policy programming

Author keywords

Approximate dynamic programming; Function approximation; Markov decision processes; Monte Carlo methods; Reinforcement learning

Indexed keywords

ACCUMULATED ERRORS; APPROXIMATE DYNAMIC PROGRAMMING; AVERAGE ERRORS; BENCH-MARK PROBLEMS; DYNAMIC POLICY; ESTIMATION ERRORS; FUNCTION APPROXIMATION; GRADUAL CHANGES; INCREMENTAL ALGORITHM; INFINITE HORIZONS; MARKOV DECISION PROCESSES; NUMBER OF SAMPLES; OPTIMAL POLICIES; POLICY ITERATION; REINFORCEMENT LEARNING METHOD; SAMPLING-BASED; SUPREMUM; THEORETICAL RESULT;

EID: 84870922246     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (125)

References (50)
  • 5
    • 0020970738 scopus 로고
    • Neuronlike adaptive elements that can solve difficult learning control problems
    • A. G. Barto, R. S. Sutton, and C.W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, SMC-13(5): 834-846, 1983. (Pubitemid 14138646)
    • (1983) IEEE Transactions on Systems, Man and Cybernetics , vol.13 , Issue.5 , pp. 834-846
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 12
    • 0034342516 scopus 로고    scopus 로고
    • On the existence of fixed points for approximate value iteration and temporal-difference learning
    • D. P. de Farias and B. Van Roy. On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105(3):589-608, 2000.
    • (2000) Journal of Optimization Theory and Applications , vol.105 , Issue.3 , pp. 589-608
    • De Farias, D.P.1    Van Roy, B.2
  • 18
    • 0000121609 scopus 로고
    • The law of large numbers and the central limit theorem in banach spaces
    • J. Hoffmann-Jørgensen and G. Pisier. The law of large numbers and the central limit theorem in banach spaces. The Annals of Probability, 4(4):587-599, 1976.
    • (1976) The Annals of Probability , vol.4 , Issue.4 , pp. 587-599
    • Hoffmann-Jørgensen, J.1    Pisier, G.2
  • 19
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming
    • T. Jaakkola, M. I. Jordan, and S. Singh. On the convergence of stochastic iterative dynamic programming. Neural Computation, 6(6):1185-1201, 1994.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.3
  • 20
  • 22
    • 29044440299 scopus 로고    scopus 로고
    • Path integrals and symmetry breaking for optimal control theory
    • H. J. Kappen. Path integrals and symmetry breaking for optimal control theory. Statistical Mechanics, 2005(11):P11011, 2005.
    • (2005) Statistical Mechanics , vol.11
    • Kappen, H.J.1
  • 23
    • 84899026236 scopus 로고    scopus 로고
    • Finite-sample convergence rates for q-learning and indirect algorithms
    • MIT Press
    • M. Kearns and S. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 12, pages 996-1002. MIT Press, 1999.
    • (1999) Advances in Neural Information Processing Systems , vol.12 , pp. 996-1002
    • Kearns, M.1    Singh, S.2
  • 31
    • 29344453913 scopus 로고    scopus 로고
    • Error bounds for approximate value iteration
    • Proceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
    • R. Munos. Error bounds for approximate value iteration. In Proceedings of the 20th national conference on Artificial intelligence - Volume 2, pages 1006-1011. AAAI Press, 2005. (Pubitemid 43006738)
    • (2005) Proceedings of the National Conference on Artificial Intelligence , vol.2 , pp. 1006-1011
    • Munos, R.1
  • 34
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71(7-9):1180-1190, 2008.
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 36
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • DOI 10.1023/A:1007678930559
    • S. Singh, T. Jaakkola, M.L. Littman, and Cs. Szepesv́ari. Convergence results for single-step onpolicy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000. (Pubitemid 30572449)
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.1    Jaakkola, T.2    Littman, M.L.3    Szepesvari, C.4
  • 37
    • 84865114997 scopus 로고    scopus 로고
    • An information-theoretic approach to curiosity-driven reinforcement learning
    • S. Still and D. Precup. An information-theoretic approach to curiosity-driven reinforcement learning. Theory in Biosciences, 131(3):139-148, 2012.
    • (2012) Theory in Biosciences , vol.131 , Issue.3 , pp. 139-148
    • Still, S.1    Precup, D.2
  • 46


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.