-
1
-
-
84870922246
-
Dynamic policy programming
-
Azar, M. Gheshlaghi, Gómez, V., and Kappen, H. J. Dynamic policy programming. Journal of Machine Learning Research, 13(Nov):3207-3245, 2012.
-
(2012)
Journal of Machine Learning Research
, vol.13
, Issue.NOV
, pp. 3207-3245
-
-
Azar, M.G.1
Gómez, V.2
Kappen, H.J.3
-
2
-
-
79960439729
-
Approximate policy iteration: A survey and some new methods
-
Bertsekas, D.P. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 9(3):310-335, 2011.
-
(2011)
Journal of Control Theory and Applications
, vol.9
, Issue.3
, pp. 310-335
-
-
Bertsekas, D.P.1
-
3
-
-
70349417489
-
Reinforcement learning benchmarks and bake-offs ii
-
Citeseer
-
Dutech, A., Edmunds, T., Kok, J., Lagoudakis, M., Littman, M., Riedmiller, M., Russell, B., Scherrer, B., Sutton, R., Timmer, S., et al. Reinforcement learning benchmarks and bake-offs ii. In Workshop at advances in neural information processing systems conference. Citeseer, 2005.
-
(2005)
Workshop at Advances in Neural Information Processing Systems Conference
-
-
Dutech, A.1
Edmunds, T.2
Kok, J.3
Lagoudakis, M.4
Littman, M.5
Riedmiller, M.6
Russell, B.7
Scherrer, B.8
Sutton, R.9
Timmer, S.10
-
4
-
-
80053437853
-
Classication-based policy iteration with a critic
-
Gabillon, V., Lazaric, A., Ghavamzadeh, M., and Scherrer, B. Classication-based policy iteration with a critic. In Proceedings of ICML, pp. 1049-1056, 2011.
-
(2011)
Proceedings of ICML
, pp. 1049-1056
-
-
Gabillon, V.1
Lazaric, A.2
Ghavamzadeh, M.3
Scherrer, B.4
-
5
-
-
0042706683
-
Perturbation bounds for the stationary probabilities of a finite markov chain
-
ISSN 00018678
-
Haviv, Moshe and Heyden, Ludo Van Der. Perturbation bounds for the stationary probabilities of a finite markov chain. Advances in Applied Probability, 16(4):pp. 804-818, 1984. ISSN 00018678. URL http://www.jstor.org/ stable/1427341.
-
(1984)
Advances in Applied Probability
, vol.16
, Issue.4
, pp. 804-818
-
-
Haviv, M.1
Van Der Heyden, L.2
-
7
-
-
33646243319
-
A natural policy gradient
-
Kakade, S.M. A natural policy gradient. NIPS, 14: 1531-1538, 2001.
-
(2001)
NIPS
, vol.14
, pp. 1531-1538
-
-
Kakade, S.M.1
-
9
-
-
1942514728
-
Approximately optimal approximate reinforcement learning
-
Kakade, S.M. and Langford, J. Approximately optimal approximate reinforcement learning. In Proceedings of ICML, pp. 267-274, 2002.
-
(2002)
Proceedings of ICML
, pp. 267-274
-
-
Kakade, S.M.1
Langford, J.2
-
10
-
-
0010359703
-
Policy Iteration for Factored MDPs
-
San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. ISBN 1-55860-709-9
-
Koller, Daphne and Parr, Ronald. Policy Iteration for Factored MDPs. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 326-334, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. ISBN 1-55860-709-9.
-
(2000)
Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
, pp. 326-334
-
-
Koller, D.1
Parr, R.2
-
12
-
-
77956523230
-
Analysis of a classication-based policy iteration algorithm
-
Lazaric, A., Ghavamzadeh, M., and Munos, R. Analysis of a classication-based policy iteration algorithm. In Proceedings of ICML, pp. 607-614, 2010.
-
(2010)
Proceedings of ICML
, pp. 607-614
-
-
Lazaric, A.1
Ghavamzadeh, M.2
Munos, R.3
-
13
-
-
84937824096
-
Error bounds for approximate value iteration
-
Munos, R. Error bounds for approximate value iteration. In Proceedings of AAAI, volume 20, pp. 1006, 2005.
-
(2005)
Proceedings of AAAI
, vol.20
, pp. 1006
-
-
Munos, R.1
-
14
-
-
22944468429
-
A convergent form of approximate policy iteration
-
Perkins, T.J. and Precup, D. A convergent form of approximate policy iteration. NIPS, 15:1595-1602, 2002.
-
(2002)
NIPS
, vol.15
, pp. 1595-1602
-
-
Perkins, T.J.1
Precup, D.2
-
15
-
-
33646413135
-
Natural actor-critic
-
Springer
-
Peters, J., Vijayakumar, S., and Schaal, S. Natural actor-critic. In Proceedings of ECML, volume 3720, pp. 280-291. Springer, 2005.
-
(2005)
Proceedings of ECML
, vol.3720
, pp. 280-291
-
-
Peters, J.1
Vijayakumar, S.2
Schaal, S.3
-
16
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
MIT Press
-
Sutton, R.S., McAllester, D., Singh, S., and Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 12, pp. 1057-1063. MIT Press, 2000.
-
(2000)
NIPS
, vol.12
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
17
-
-
85162533411
-
A reinterpretation of the policy oscillation phenomenon in approximate policy iteration
-
Wagner, P. A reinterpretation of the policy oscillation phenomenon in approximate policy iteration. In NIPS, 2011.
-
(2011)
NIPS
-
-
Wagner, P.1
-
18
-
-
81855211901
-
The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate
-
Ye, Y. The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate. Mathematics of Operations Research, 36(4):593-603, 2011.
-
(2011)
Mathematics of Operations Research
, vol.36
, Issue.4
, pp. 593-603
-
-
Ye, Y.1
|