SCOPUS 정보 검색 플랫폼

Volumn 6321 LNAI, Issue PART 1, 2010, Pages 312-327

Adaptive bases for reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

ACTOR CRITIC; FUNCTION APPROXIMATION; SQUARE ERRORS; VALUE FUNCTIONS;

REINFORCEMENT LEARNING;

EID: 78049343060 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-642-15880-3_26 Document Type: Conference Paper

Times cited : (18)

References (18)

1
- 0029276591
- On the generation of markov decision processes
- Archibald, T., McKinnon, K., Thomas, L.: On the Generation of Markov Decision Processes. Journal of the Operational Research Society 46, 354-361 (1995)
- (1995) Journal of the Operational Research Society , vol.46 , pp. 354-361
- Archibald, T.¹ McKinnon, K.² Thomas, L.³

2
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- Bradtke, S. J., Barto, A. G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22, 33-57 (1996)
- (1996) Machine Learning , vol.22 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

3
- 0003565783
- 3rd edn. Athena Scientific, Belmont
- Bertsekas, D.: Dynamic programming and optimal control, 3rd edn. Athena Scientific, Belmont (2007)
- (2007) Dynamic Programming and Optimal Control
- Bertsekas, D.¹

4
- 0003487482
- Athena Scinetific, Belmont
- Bertsekas, D., Tsitsiklis, J.: Neuro-dynamic programming. Athena Scinetific, Belmont (1996)
- (1996) Neuro-dynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

5
- 78049336028
- Technical report Univ. of Alberta
- Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Technical report Univ. of Alberta (2007)
- (2007) Natural Actor-critic Algorithms
- Bhatnagar, S.¹ Sutton, R.² Ghavamzadeh, M.³ Lee, M.⁴

6
- 0031076413
- Stochastic approximation with two time scales
- Borkar, V.: Stochastic approximation with two time scales. Systems & Control Letters 29, 291-294 (1997)
- (1997) Systems & Control Letters , vol.29 , pp. 291-294
- Borkar, V.¹

7
- 0033876515
- The ODE method for convergence of stochastic approximation and reinforcement learning
- Borkar, V., Meyn, S.: The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Cont. and Optim. 38, 447-469 (2000)
- (2000) SIAM Journal on Cont. and Optim , vol.38 , pp. 447-469
- Borkar, V.¹ Meyn, S.²

8
- 79551680672
- Cross-entropy optimization of control policies with adaptive basis functions
- Busoniu, L., Ernst, D., De Schutter, B., Babuska, R.: Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics (99), 1-14 (2010)
- (2010) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , Issue.99 , pp. 1-14
- Busoniu, L.¹ Ernst, D.² De Schutter, B.³ Babuska, R.⁴

10
- 9944258743
- Springer, Heidelberg
- Kushner, H., Yin, G.: Stochastic approximation and recursive algorithms and applications. Springer, Heidelberg (2003)
- (2003) Stochastic Approximation and Recursive Algorithms and Applications
- Kushner, H.¹ Yin, G.²

11
- 0346913265
- Convergent multiple-timescales reinforcement learning algorithms in normal form games
- Leslie, D., Collins, E.: Convergent multiple-timescales reinforcement learning algorithms in normal form games. The Annals of App. Prob. 13, 1231-1251 (2003)
- (2003) The Annals of App. Prob. , vol.13 , pp. 1231-1251
- Leslie, D.¹ Collins, E.²

12
- 17444414191
- Basis function adaptation in temporal difference reinforcement learning
- Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215-238 (2006)
- (2006) Annals of Operations Research , vol.134 , pp. 215-238
- Menache, I.¹ Mannor, S.² Shimkin, N.³

13
- 33750501334
- Convergence rate and averaging of nonlinear twotime-scale stochastic approximation algorithms
- Mokkadem, A., Pelletier, M.: Convergence rate and averaging of nonlinear twotime-scale stochastic approximation algorithms. Annals of Applied Prob. 16, 1671
- Annals of Applied Prob , vol.16 , pp. 1671
- Mokkadem, A.¹ Pelletier, M.²

14
- 1942482175
- Optimality of reinforcement learning algorithms with linear function approximation
- Schoknecht, R.: Optimality of reinforcement learning algorithms with linear function approximation. In: Proceedings of Neural Information Processing and Systems, pp. 1555-1562 (2002)
- (2002) Proceedings of Neural Information Processing and Systems , pp. 1555-1562
- Schoknecht, R.¹

15
- 0004102479
- MIT Press, Cambridge
- Sutton, R. S., Barto, A. G.: Reinforcement Learning - an Introduction. MIT Press, Cambridge (1998)
- (1998) Reinforcement Learning - An Introduction
- Sutton, R.S.¹ Barto, A.G.²

16
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- Sutton, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th Annual International Conference on Machine Learning (2009)
- (2009) Proceedings of the 26th Annual International Conference on Machine Learning
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvári, C.⁶ Wiewiora, E.⁷

17
- 77956513316
- A convergent o (n) temporal-difference algorithm for off-policy learning with linear function approximation
- Sutton, R. S., Szepesvari, C., Maei, H. R.: A convergent o (n) temporal-difference algorithm for off-policy learning with linear function approximation. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1609-1616 (2009b)
- (2009) Advances in Neural Information Processing Systems , vol.21 , pp. 1609-1616
- Sutton, R.S.¹ Szepesvari, C.² Maei, H.R.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.