메뉴 건너뛰기




Volumn 5323 LNAI, Issue , 2008, Pages 27-40

Algorithms and bounds for rollout sampling approximate policy iteration

Author keywords

[No Author keywords available]

Indexed keywords

CLASSIFIERS; LEARNING SYSTEMS; REINFORCEMENT; REINFORCEMENT LEARNING;

EID: 58449114139     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-540-89722-4_3     Document Type: Conference Paper
Times cited : (8)

References (10)
  • 1
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal 47(2-3), 235-256 (2002)
    • (2002) Machine Learning Journal , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 2
    • 38049040954 scopus 로고    scopus 로고
    • Auer, P., Ortner, R., Szepesvari, C.: Improved Rates for the Stochastic Continuum-Armed Bandit Problem. In: Bshouty, N.H., Gentile, C. (eds.) COLT 2007. LNCS, 4539, pp. 454-468. Springer, Heidelberg (2007)
    • Auer, P., Ortner, R., Szepesvari, C.: Improved Rates for the Stochastic Continuum-Armed Bandit Problem. In: Bshouty, N.H., Gentile, C. (eds.) COLT 2007. LNCS, vol. 4539, pp. 454-468. Springer, Heidelberg (2007)
  • 3
    • 58449087518 scopus 로고    scopus 로고
    • Bertsekas, D.: Dynamic programming and suboptimal control: From ADP to MFC. Fundamental Issues in Control, European Journal of Control 11(4-5) (2005): From 2005 CDC, Seville, Spain
    • Bertsekas, D.: Dynamic programming and suboptimal control: From ADP to MFC. Fundamental Issues in Control, European Journal of Control 11(4-5) (2005): From 2005 CDC, Seville, Spain
  • 4
    • 48349140736 scopus 로고    scopus 로고
    • Rollout sampling approximate policy iteration
    • September
    • Dimitrakakis, C., Lagoudakis, M.: Rollout sampling approximate policy iteration. Machine Learning 72(3) (September 2008)
    • (2008) Machine Learning , vol.72 , Issue.3
    • Dimitrakakis, C.1    Lagoudakis, M.2
  • 5
    • 33745295134 scopus 로고    scopus 로고
    • Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
    • Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research 7, 1079-1105 (2006)
    • (2006) Journal of Machine Learning Research , vol.7 , pp. 1079-1105
    • Even-Dar, E.1    Mannor, S.2    Mansour, Y.3
  • 7
    • 33744466799 scopus 로고    scopus 로고
    • Approximate policy iteration with a policy language bias: Solving relational Markov decision processes
    • Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. Journal of Artificial Intelligence Research 25, 75-118 (2006)
    • (2006) Journal of Artificial Intelligence Research , vol.25 , pp. 75-118
    • Fern, A.1    Yoon, S.2    Givan, R.3
  • 8
    • 33750293964 scopus 로고    scopus 로고
    • Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS, 4212, pp. 282-293. Springer, Heidelberg (2006)
    • Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS, vol. 4212, pp. 282-293. Springer, Heidelberg (2006)


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.