메뉴 건너뛰기




Volumn 72, Issue 3, 2008, Pages 157-171

Rollout sampling approximate policy iteration

Author keywords

Approximate policy iteration; Bandit problems; Classification; Reinforcement learning; Rollouts; Sample complexity

Indexed keywords

BOOLEAN FUNCTIONS; CLASSIFICATION (OF INFORMATION); EDUCATION; REINFORCEMENT; STANDARDS;

EID: 48349140736     PISSN: 08856125     EISSN: 15730565     Source Type: Journal    
DOI: 10.1007/s10994-008-5069-3     Document Type: Conference Paper
Times cited : (51)

References (16)
  • 1
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • 10.1007/s10994-007-5038-2 1
    • Antos, A., Szepesvári, C., & Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89-129. 10.1007/s10994-007-5038-2.
    • (2008) Machine Learning , vol.71 , pp. 89-129
    • Antos, A.1    Szepesvári, C.2    Munos, R.3
  • 2
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • 2-3
    • Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal 47(2-3), 235-256.
    • (2002) Machine Learning Journal , vol.47 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 4
    • 33745295134 scopus 로고    scopus 로고
    • Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
    • ISSN 1533-7928
    • Even-Dar, E., Mannor, S., & Mansour, Y. (2006). Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research, 7, 1079-1105. ISSN 1533-7928.
    • (2006) Journal of Machine Learning Research , vol.7 , pp. 1079-1105
    • Even-Dar, E.1    Mannor, S.2    Mansour, Y.3
  • 6
    • 33744466799 scopus 로고    scopus 로고
    • Approximate policy iteration with a policy language bias: Solving relational Markov decision processes
    • Fern, A., Yoon, S., & Givan, R. (2006). Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. Journal of Artificial Intelligence Research, 25, 75-118.
    • (2006) Journal of Artificial Intelligence Research , vol.25 , pp. 75-118
    • Fern, A.1    Yoon, S.2    Givan, R.3
  • 12
    • 31844448029 scopus 로고    scopus 로고
    • Relating reinforcement learning performance to classification performance
    • Bonn, Germany, 2005. ISBN 1-59593-180-5 doi: 10.1145/1102351.1102411
    • Langford, J., & Zadrozny, B. (2005). Relating reinforcement learning performance to classification performance. In Proceedings of the 22nd international conference on machine learning (ICML) (pp. 473-480). Bonn, Germany, 2005. ISBN 1-59593-180-5. doi: 10.1145/1102351.1102411.
    • (2005) Proceedings of the 22nd International Conference on Machine Learning (ICML) , pp. 473-480
    • Langford, J.1    Zadrozny, B.2
  • 14
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method
    • Riedmiller, M. (2005). Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In 16th European conference on machine learning (pp. 317-328).
    • (2005) 16th European Conference on Machine Learning , pp. 317-328
    • Riedmiller, M.1
  • 16
    • 0030082891 scopus 로고    scopus 로고
    • An approach to fuzzy control of nonlinear systems: Stability and design issues
    • 1
    • Wang, H. O., Tanaka, K., & Griffin, M. F. (1996). An approach to fuzzy control of nonlinear systems: Stability and design issues. IEEE Transactions on Fuzzy Systems, 4(1), 14-23.
    • (1996) IEEE Transactions on Fuzzy Systems , vol.4 , pp. 14-23
    • Wang, H.O.1    Tanaka, K.2    Griffin, M.F.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.