-
2
-
-
35248866146
-
An introduction to reinforcement learning theory: Value function methods
-
Bartlett, P. L. (2003). An introduction to reinforcement learning theory: Value function methods. Lecture Notes in Artificial Intelligence, 2600/2003:184-202.
-
(2003)
Lecture Notes in Artificial Intelligence
, vol.2600
, Issue.2003
, pp. 184-202
-
-
Bartlett, P.L.1
-
4
-
-
0003565783
-
-
Athena Scientific, Belmount, Massachusetts, third edition
-
Bertsekas, D. P. (2007). Dynamic Programming and Optimal Control, volume II. Athena Scientific, Belmount, Massachusetts, third edition.
-
(2007)
Dynamic Programming and Optimal Control
, vol.2
-
-
Bertsekas, D.P.1
-
6
-
-
70349984547
-
Natural actor-critic algorithms
-
Bhatnagar, S., Sutton, R. S., Ghavamzadeh, M., and Lee, M. (2009). Natural actor-critic algorithms. Automatica, 45(11):2471-2482.
-
(2009)
Automatica
, vol.45
, Issue.11
, pp. 2471-2482
-
-
Bhatnagar, S.1
Sutton, R.S.2
Ghavamzadeh, M.3
Lee, M.4
-
7
-
-
85153940465
-
Generalization in reinforcement learning: Safely approximating the value function
-
Boyan, J. A. and Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems, pages 369-376.
-
(1995)
Advances in Neural Information Processing Systems
, pp. 369-376
-
-
Boyan, J.A.1
Moore, A.W.2
-
8
-
-
0034342516
-
On the existence of fixed points for approximate value iteration and temporal-difference learning
-
Farias, D. P. and Roy, B. V. (2000). On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105(3):589-608.
-
(2000)
Journal of Optimization Theory and Applications
, vol.105
, Issue.3
, pp. 589-608
-
-
Farias, D.P.1
Roy, B.V.2
-
9
-
-
33646243319
-
Natural policy gradient
-
Vancouver, British Columbia, Canada
-
Kakade, S. (2001). Natural policy gradient. In Advances in Neural Information Processing Systems 14, pages 1531-1538, Vancouver, British Columbia, Canada.
-
(2001)
Advances in Neural Information Processing Systems
, vol.14
, pp. 1531-1538
-
-
Kakade, S.1
-
10
-
-
29044440299
-
Path integrals and symmetry breaking for optimal control theory
-
Kappen, H. J. (2005). Path integrals and symmetry breaking for optimal control theory. Statistical Mechanics, 2005(11):P11011.
-
(2005)
Statistical Mechanics
, vol.2005
, Issue.11
, pp. 11011
-
-
Kappen, H.J.1
-
11
-
-
0004272772
-
-
Cambridge University Press, Cambridge, United Kingdom, first edition
-
MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge, United Kingdom, first edition.
-
(2003)
Information Theory, Inference, and Learning Algorithms
-
-
MacKay, D.J.C.1
-
14
-
-
77958569725
-
Relative entropy policy search
-
Peters, J., Mülling, K., and Altun, Y. (2010). Relative entropy policy search. In Proceedings of the Twenty- Fourth AAAI Conference on Artificial Intelligence, Atlanta, Georgia, USA.
-
(2010)
Proceedings of the Twenty- Fourth AAAI Conference on Artificial Intelligence, Atlanta, Georgia, USA
-
-
Peters, J.1
Mülling, K.2
Altun, Y.3
-
15
-
-
40649106649
-
Natural actor-critic
-
Peters, J. and Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71(7-9):1180-1190.
-
(2008)
Neurocomputing
, vol.71
, Issue.7-9
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
16
-
-
85156221438
-
Generalization in reinforcement learning: Succesful examples using sparse coarse coding
-
Sutton, R. S. (1996). Generalization in reinforcement learning: succesful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 9, pages 1038-1044.
-
(1996)
Advances in Neural Information Processing Systems
, vol.9
, pp. 1038-1044
-
-
Sutton, R.S.1
-
18
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Denver, Colorado, USA
-
Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057-1063, Denver, Colorado, USA.
-
(1999)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
19
-
-
77956531256
-
Reinforcement learning algorithms for mdps - A survey
-
University of Alberta, Edmonton, Alberta, Canada
-
Szepesvari, C. (2009). Reinforcement learning algorithms for mdps - a survey. Technical Report TR09-13, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
-
(2009)
Technical Report TR09-13, Department of Computing Science
-
-
Szepesvari, C.1
-
22
-
-
85161971158
-
Stable dual dynamic programming
-
Vancouver, British Columbia, Canada
-
Wang, T., Lizotte, D., Bowling, M., and Schuurmans, D. (2007). Stable dual dynamic programming. In Proceedings of the 21st Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada.
-
(2007)
Proceedings of the 21st Annual Conference on Neural Information Processing Systems
-
-
Wang, T.1
Lizotte, D.2
Bowling, M.3
Schuurmans, D.4
|