메뉴 건너뛰기




Volumn 7524 LNAI, Issue PART 2, 2012, Pages 211-226

Policy iteration based on a learned transition model

Author keywords

[No Author keywords available]

Indexed keywords

BASIS FUNCTIONS; COMMITMENT PROBLEM; HIGH-DIMENSIONAL; INVERTED PENDULUM; LEAST SQUARE; LINEAR APPROXIMATIONS; POLICY ITERATION; REINFORCEMENT LEARNING METHOD; RESOURCE MANAGEMENT; STATE SPACE; SWING-UP; TRANSITION MODEL; VALUE FUNCTIONS;

EID: 84866843932     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-33486-3_14     Document Type: Conference Paper
Times cited : (4)

References (19)
  • 1
    • 0001130234 scopus 로고    scopus 로고
    • A trust region method based on interior point techniques for nonlinear programming
    • Byrd, R.H., Gilbert, J.C., Nocedal, J.: A trust region method based on interior point techniques for nonlinear programming. Mathematical Programming 89, 149-185 (1996)
    • (1996) Mathematical Programming , vol.89 , pp. 149-185
    • Byrd, R.H.1    Gilbert, J.C.2    Nocedal, J.3
  • 2
    • 84861697773 scopus 로고    scopus 로고
    • Reinforcement Learning with a Bilinear Q Function
    • Sanner, S., Hutter, M. (eds.) EWRL 2011. Springer, Heidelberg
    • Elkan, C.: Reinforcement Learning with a Bilinear Q Function. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 78-88. Springer, Heidelberg (2012)
    • (2012) LNCS , vol.7188 , pp. 78-88
    • Elkan, C.1
  • 6
    • 0001455103 scopus 로고
    • Comments on the origin and application of Markov decision processes
    • Howard, R.A.: Comments on the origin and application of Markov decision processes. Management Science 14(7), 503-507 (1968)
    • (1968) Management Science , vol.14 , Issue.7 , pp. 503-507
    • Howard, R.A.1
  • 9
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
    • Mahadevan, S., Maggioni, M.: Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, 2169-2231 (2007)
    • (2007) Journal of Machine Learning Research , pp. 2169-2231
    • Mahadevan, S.1    Maggioni, M.2
  • 10
    • 56049095326 scopus 로고    scopus 로고
    • Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs
    • Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. Springer, Heidelberg
    • Melo, F.S., Lopes, M.: Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 66-81. Springer, Heidelberg (2008)
    • (2008) LNCS (LNAI) , vol.5212 , pp. 66-81
    • Melo, F.S.1    Lopes, M.2
  • 11
    • 17444414191 scopus 로고    scopus 로고
    • Basis function adaptation in temporal difference reinforcement learning
    • Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134(1), 215-238 (2005)
    • (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 13
    • 77949361320 scopus 로고    scopus 로고
    • Merging AI and or to solve high-dimensional stochastic optimization problems using approximate dynamic programming
    • Powell, W.B.: Merging AI and OR to solve high-dimensional stochastic optimization problems using approximate dynamic programming. INFORMS Journal on Computing 22(1), 2-17 (2010)
    • (2010) INFORMS Journal on Computing , vol.22 , Issue.1 , pp. 2-17
    • Powell, W.B.1
  • 17
    • 77949360515 scopus 로고    scopus 로고
    • Commentary - Perspectives on stochastic optimization over time
    • Tsitsiklis, J.N.: Commentary - perspectives on stochastic optimization over time. INFORMS Journal on Computing 22(1), 18-19 (2010)
    • (2010) INFORMS Journal on Computing , vol.22 , Issue.1 , pp. 18-19
    • Tsitsiklis, J.N.1
  • 19
    • 0030082891 scopus 로고    scopus 로고
    • An approach to fuzzy control of nonlinear systems: Stability and design issues
    • Wang, H.O., Tanaka, K., Griffin, M.F.: An approach to fuzzy control of nonlinear systems: stability and design issues. IEEE Transactions on Fuzzy Systems 4(1), 14-23 (1996)
    • (1996) IEEE Transactions on Fuzzy Systems , vol.4 , Issue.1 , pp. 14-23
    • Wang, H.O.1    Tanaka, K.2    Griffin, M.F.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.