-
1
-
-
0029276591
-
On the generation of markov decision processes
-
Archibald, T., McKinnon, K., Thomas, L.: On the Generation of Markov Decision Processes. Journal of the Operational Research Society 46, 354-361 (1995)
-
(1995)
Journal of the Operational Research Society
, vol.46
, pp. 354-361
-
-
Archibald, T.1
McKinnon, K.2
Thomas, L.3
-
2
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
Bradtke, S. J., Barto, A. G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22, 33-57 (1996)
-
(1996)
Machine Learning
, vol.22
, pp. 33-57
-
-
Bradtke, S.J.1
Barto, A.G.2
-
5
-
-
78049336028
-
-
Technical report Univ. of Alberta
-
Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Technical report Univ. of Alberta (2007)
-
(2007)
Natural Actor-critic Algorithms
-
-
Bhatnagar, S.1
Sutton, R.2
Ghavamzadeh, M.3
Lee, M.4
-
6
-
-
0031076413
-
Stochastic approximation with two time scales
-
Borkar, V.: Stochastic approximation with two time scales. Systems & Control Letters 29, 291-294 (1997)
-
(1997)
Systems & Control Letters
, vol.29
, pp. 291-294
-
-
Borkar, V.1
-
7
-
-
0033876515
-
The ODE method for convergence of stochastic approximation and reinforcement learning
-
Borkar, V., Meyn, S.: The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Cont. and Optim. 38, 447-469 (2000)
-
(2000)
SIAM Journal on Cont. and Optim
, vol.38
, pp. 447-469
-
-
Borkar, V.1
Meyn, S.2
-
8
-
-
79551680672
-
Cross-entropy optimization of control policies with adaptive basis functions
-
Busoniu, L., Ernst, D., De Schutter, B., Babuska, R.: Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics (99), 1-14 (2010)
-
(2010)
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
, Issue.99
, pp. 1-14
-
-
Busoniu, L.1
Ernst, D.2
De Schutter, B.3
Babuska, R.4
-
9
-
-
58449097347
-
Basis expansion in natural actor critic methods
-
Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. eds., Springer, Heidelberg
-
Girgin, S., Preux, P.: Basis expansion in natural actor critic methods. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 110-123. Springer, Heidelberg (2008)
-
(2008)
EWRL 2008. LNCS (LNAI)
, vol.5323
, pp. 110-123
-
-
Girgin, S.1
Preux, P.2
-
11
-
-
0346913265
-
Convergent multiple-timescales reinforcement learning algorithms in normal form games
-
Leslie, D., Collins, E.: Convergent multiple-timescales reinforcement learning algorithms in normal form games. The Annals of App. Prob. 13, 1231-1251 (2003)
-
(2003)
The Annals of App. Prob.
, vol.13
, pp. 1231-1251
-
-
Leslie, D.1
Collins, E.2
-
12
-
-
17444414191
-
Basis function adaptation in temporal difference reinforcement learning
-
Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215-238 (2006)
-
(2006)
Annals of Operations Research
, vol.134
, pp. 215-238
-
-
Menache, I.1
Mannor, S.2
Shimkin, N.3
-
13
-
-
33750501334
-
Convergence rate and averaging of nonlinear twotime-scale stochastic approximation algorithms
-
Mokkadem, A., Pelletier, M.: Convergence rate and averaging of nonlinear twotime-scale stochastic approximation algorithms. Annals of Applied Prob. 16, 1671
-
Annals of Applied Prob
, vol.16
, pp. 1671
-
-
Mokkadem, A.1
Pelletier, M.2
-
14
-
-
1942482175
-
Optimality of reinforcement learning algorithms with linear function approximation
-
Schoknecht, R.: Optimality of reinforcement learning algorithms with linear function approximation. In: Proceedings of Neural Information Processing and Systems, pp. 1555-1562 (2002)
-
(2002)
Proceedings of Neural Information Processing and Systems
, pp. 1555-1562
-
-
Schoknecht, R.1
-
16
-
-
71149099079
-
Fast gradient-descent methods for temporal-difference learning with linear function approximation
-
Sutton, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th Annual International Conference on Machine Learning (2009)
-
(2009)
Proceedings of the 26th Annual International Conference on Machine Learning
-
-
Sutton, R.S.1
Maei, H.R.2
Precup, D.3
Bhatnagar, S.4
Silver, D.5
Szepesvári, C.6
Wiewiora, E.7
-
17
-
-
77956513316
-
A convergent o (n) temporal-difference algorithm for off-policy learning with linear function approximation
-
Sutton, R. S., Szepesvari, C., Maei, H. R.: A convergent o (n) temporal-difference algorithm for off-policy learning with linear function approximation. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1609-1616 (2009b)
-
(2009)
Advances in Neural Information Processing Systems
, vol.21
, pp. 1609-1616
-
-
Sutton, R.S.1
Szepesvari, C.2
Maei, H.R.3
|