메뉴 건너뛰기




Volumn 16, Issue , 2015, Pages 1629-1676

Approximate modified policy iteration and its application to the game of tetris

Author keywords

Approximate dynamic programming; Finite sample analysis; Game of tetris; Markov decision processes; Performance bounds; Reinforcement learning

Indexed keywords

ALGORITHMS; APPROXIMATION ALGORITHMS; DYNAMIC PROGRAMMING; MARKOV PROCESSES; PARAMETER ESTIMATION; REINFORCEMENT LEARNING; SAMPLING;

EID: 84962317462     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (147)

References (48)
  • 4
    • 0442320716 scopus 로고    scopus 로고
    • How to lose at tetris
    • H. Burgiel. How to lose at tetris. Mathematical Gazette, 81:194-200, 1997.
    • (1997) Mathematical Gazette , vol.81 , pp. 194-200
    • Burgiel, H.1
  • 5
    • 84969807826 scopus 로고    scopus 로고
    • (Approximate) iterated successive approximations algorithm for sequential decision processes
    • Pelin Canbolat and Uriel Rothblum. (Approximate) iterated successive approximations algorithm for sequential decision processes. Annals of Operations Research, pages 1-12, 2012. ISSN 0254-5330.
    • (2012) Annals of Operations Research , pp. 1-12
    • Canbolat, P.1    Rothblum, U.2
  • 6
    • 79955702502 scopus 로고    scopus 로고
    • LIBSVM: A library for support vector machines
    • May, URL
    • C. Chang and C. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27:1-27:27, May 2011. ISSN 2157-6904. doi: 10.1145/1961189.1961199. URL http://doi.acm.org/10.1145/1961189.1961199.
    • (2011) ACM Transactions on Intelligent Systems and Technology , vol.2 , Issue.3 , pp. 271-2727
    • Chang, C.1    Lin, C.2
  • 8
    • 48349140736 scopus 로고    scopus 로고
    • Rollout sampling approximate policy iteration
    • C. Dimitrakakis and M. Lagoudakis. Rollout sampling approximate policy iteration. Machine Learning Journal, 72(3):157-171, 2008.
    • (2008) Machine Learning Journal , vol.72 , Issue.3 , pp. 157-171
    • Dimitrakakis, C.1    Lagoudakis, M.2
  • 13
    • 33744466799 scopus 로고    scopus 로고
    • Approximate policy iteration with a policy language bias: Solving relational Markov decision processes
    • A. Fern, S. Yoon, and R. Givan. Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. Journal of Artificial Intelligence Research, 25:75-118, 2006.
    • (2006) Journal of Artificial Intelligence Research , vol.25 , pp. 75-118
    • Fern, A.1    Yoon, S.2    Givan, R.3
  • 18
    • 84880694195 scopus 로고
    • Stable function approximation in dynamic programming
    • G. J. Gordon. Stable function approximation in dynamic programming. In ICML, pages 261-268, 1995.
    • (1995) ICML , pp. 261-268
    • Gordon, G.J.1
  • 20
    • 0035377566 scopus 로고    scopus 로고
    • Completely derandomized self-adaptation in evolution strategies
    • N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9:159-195, 2001.
    • (2001) Evolutionary Computation , vol.9 , pp. 159-195
    • Hansen, N.1    Ostermeier, A.2
  • 29
    • 84962283397 scopus 로고    scopus 로고
    • Tight performance bounds for approximate modified policy iteration with non-stationary policies
    • 1304.5610
    • Boris Lesner and Bruno Scherrer. Tight performance bounds for approximate modified policy iteration with non-stationary policies. CoRR, abs/1304.5610, 2013.
    • (2013) CoRR
    • Lesner, B.1    Scherrer, B.2
  • 31
    • 40949107944 scopus 로고    scopus 로고
    • p-norm for approximate value iteration
    • p-norm for approximate value iteration. SIAM J. Control and Optimization, 46(2):541-561, 2007.
    • (2007) SIAM J. Control and Optimization , vol.46 , Issue.2 , pp. 541-561
    • Munos, R.1
  • 37
    • 84877625141 scopus 로고    scopus 로고
    • Performance bounds for λ-policy iteration and application to the game of tetris
    • B. Scherrer. Performance bounds for λ-policy iteration and application to the game of tetris. Journal of Machine Learning Research, 14:1175-1221, 2013.
    • (2013) Journal of Machine Learning Research , vol.14 , pp. 1175-1221
    • Scherrer, B.1
  • 40
    • 0028497385 scopus 로고
    • An upper bound on the loss from approximate optimal-value functions
    • S. Singh and R. Yee. An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16-3:227-233, 1994.
    • (1994) Machine Learning , vol.16-23 , pp. 227-233
    • Singh, S.1    Yee, R.2
  • 42
    • 33845344721 scopus 로고    scopus 로고
    • Learning tetris using the noisy cross-entropy method
    • I. Szita and A. Lorincz. Learning tetris using the noisy cross-entropy method. Neural Computation, 18(12):2936-2941, 2006.
    • (2006) Neural Computation , vol.18 , Issue.12 , pp. 2936-2941
    • Szita, I.1    Lorincz, A.2
  • 47
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • J. Tsitsiklis and B. Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94, 1996.
    • (1996) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.1    Van Roy, B.2
  • 48
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674-690, 1997.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.