메뉴 건너뛰기




Volumn 6321 LNAI, Issue PART 1, 2010, Pages 312-327

Adaptive bases for reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

ACTOR CRITIC; FUNCTION APPROXIMATION; SQUARE ERRORS; VALUE FUNCTIONS;

EID: 78049343060     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-15880-3_26     Document Type: Conference Paper
Times cited : (18)

References (18)
  • 2
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke, S. J., Barto, A. G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22, 33-57 (1996)
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 6
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two time scales
    • Borkar, V.: Stochastic approximation with two time scales. Systems & Control Letters 29, 291-294 (1997)
    • (1997) Systems & Control Letters , vol.29 , pp. 291-294
    • Borkar, V.1
  • 7
    • 0033876515 scopus 로고    scopus 로고
    • The ODE method for convergence of stochastic approximation and reinforcement learning
    • Borkar, V., Meyn, S.: The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Cont. and Optim. 38, 447-469 (2000)
    • (2000) SIAM Journal on Cont. and Optim , vol.38 , pp. 447-469
    • Borkar, V.1    Meyn, S.2
  • 9
    • 58449097347 scopus 로고    scopus 로고
    • Basis expansion in natural actor critic methods
    • Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. eds., Springer, Heidelberg
    • Girgin, S., Preux, P.: Basis expansion in natural actor critic methods. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 110-123. Springer, Heidelberg (2008)
    • (2008) EWRL 2008. LNCS (LNAI) , vol.5323 , pp. 110-123
    • Girgin, S.1    Preux, P.2
  • 11
    • 0346913265 scopus 로고    scopus 로고
    • Convergent multiple-timescales reinforcement learning algorithms in normal form games
    • Leslie, D., Collins, E.: Convergent multiple-timescales reinforcement learning algorithms in normal form games. The Annals of App. Prob. 13, 1231-1251 (2003)
    • (2003) The Annals of App. Prob. , vol.13 , pp. 1231-1251
    • Leslie, D.1    Collins, E.2
  • 12
    • 17444414191 scopus 로고    scopus 로고
    • Basis function adaptation in temporal difference reinforcement learning
    • Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215-238 (2006)
    • (2006) Annals of Operations Research , vol.134 , pp. 215-238
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 13
    • 33750501334 scopus 로고    scopus 로고
    • Convergence rate and averaging of nonlinear twotime-scale stochastic approximation algorithms
    • Mokkadem, A., Pelletier, M.: Convergence rate and averaging of nonlinear twotime-scale stochastic approximation algorithms. Annals of Applied Prob. 16, 1671
    • Annals of Applied Prob , vol.16 , pp. 1671
    • Mokkadem, A.1    Pelletier, M.2
  • 14
    • 1942482175 scopus 로고    scopus 로고
    • Optimality of reinforcement learning algorithms with linear function approximation
    • Schoknecht, R.: Optimality of reinforcement learning algorithms with linear function approximation. In: Proceedings of Neural Information Processing and Systems, pp. 1555-1562 (2002)
    • (2002) Proceedings of Neural Information Processing and Systems , pp. 1555-1562
    • Schoknecht, R.1
  • 17
    • 77956513316 scopus 로고    scopus 로고
    • A convergent o (n) temporal-difference algorithm for off-policy learning with linear function approximation
    • Sutton, R. S., Szepesvari, C., Maei, H. R.: A convergent o (n) temporal-difference algorithm for off-policy learning with linear function approximation. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1609-1616 (2009b)
    • (2009) Advances in Neural Information Processing Systems , vol.21 , pp. 1609-1616
    • Sutton, R.S.1    Szepesvari, C.2    Maei, H.R.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.