-
4
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6:1185-1201, 1994.
-
(1994)
Neural Computation
, vol.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
5
-
-
0028497630
-
Asynchronous stochastic approximation and Q-learning
-
J. N. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 16:185-202, 1994.
-
(1994)
Machine Learning
, vol.16
, pp. 185-202
-
-
Tsitsiklis, J.N.1
-
6
-
-
0001961616
-
A generalized reinforcement-learning model: Convergence and applications
-
L. Saitta, editor, Bari, Italy. Morgan Kaufmann
-
M. L. Littman and C. Szepesvári. A generalized reinforcement-learning model: Convergence and applications. In L. Saitta, editor, Proceedings of the 13th International Conference on Machine Learning (ICML-96), pages 310-318, Bari, Italy, 1996. Morgan Kaufmann.
-
(1996)
Proceedings of the 13th International Conference on Machine Learning (ICML-96)
, pp. 310-318
-
-
Littman, M.L.1
Szepesvári, C.2
-
7
-
-
85156187730
-
Improving elevator performance using reinforcement learning
-
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Cambridge MA. MIT Press
-
R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017-1023, Cambridge MA, 1996. MIT Press.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1017-1023
-
-
Crites, R.H.1
Barto, A.G.2
-
8
-
-
0036058423
-
Effective reinforcement learning for mobile robots
-
Washington, DC, USA
-
W. D. Smart and L. P. Kaelbling. Effective reinforcement learning for mobile robots. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation (ICRA 2002), pages 3404-3410, Washington, DC, USA, 2002.
-
(2002)
Proceedings of the 2002 IEEE International Conference on Robotics and Automation (ICRA 2002)
, pp. 3404-3410
-
-
Smart, W.D.1
Kaelbling, L.P.2
-
9
-
-
49049105169
-
Ensemble algorithms in reinforcement learning
-
M. A. Wiering and H. P. van Hasselt. Ensemble algorithms in reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 38(4):930-936, 2008.
-
(2008)
IEEE Transactions on Systems, Man, and Cybernetics, Part B
, vol.38
, Issue.4
, pp. 930-936
-
-
Wiering, M.A.1
Van Hasselt, H.P.2
-
10
-
-
34250700033
-
PAC model-free reinforcement learning
-
ACM
-
A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman. PAC model-free reinforcement learning. In Proceedings of the 23rd international conference onMachine learning, pages 881-888. ACM, 2006.
-
(2006)
Proceedings of the 23rd International Conference OnMachine Learning
, pp. 881-888
-
-
Strehl, A.L.1
Li, L.2
Wiewiora, E.3
Langford, J.4
Littman, M.L.5
-
11
-
-
84899026236
-
Finite-sample convergence rates for Q-learning and indirect algorithms
-
MIT Press
-
M. J. Kearns and S. P. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Neural Information Processing Systems 12, pages 996-1002. MIT Press, 1999.
-
(1999)
Neural Information Processing Systems
, vol.12
, pp. 996-1002
-
-
Kearns, M.J.1
Singh, S.P.2
-
12
-
-
21844465127
-
Tree-based batch mode reinforcement learning
-
D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(1):503-556, 2005.
-
(2005)
Journal of Machine Learning Research
, vol.6
, Issue.1
, pp. 503-556
-
-
Ernst, D.1
Geurts, P.2
Wehenkel, L.3
-
13
-
-
84898998140
-
The asymptotic convergence-rate of Q-learning
-
Cambridge, MA, USA. MIT Press
-
C. Szepesvári. The asymptotic convergence-rate of Q-learning. In NIPS '97: Proceedings of the 1997 conference on Advances in neural information processing systems 10, pages 1064- 1070, Cambridge, MA, USA, 1998. MIT Press.
-
(1998)
NIPS '97: Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems
, vol.10
, pp. 1064-1070
-
-
Szepesvári, C.1
-
15
-
-
31344446857
-
Rational overoptimism (and other biases
-
September
-
E. Van den Steen. Rational overoptimism (and other biases). American Economic Review, 94(4):1141-1151, September 2004.
-
(2004)
American Economic Review
, vol.94
, Issue.4
, pp. 1141-1151
-
-
Van Den Steen, E.1
-
16
-
-
33644898597
-
The optimizer's curse: Skepticism and postdecision surprise in decision analysis
-
J. E. Smith and R. L. Winkler. The optimizer's curse: Skepticism and postdecision surprise in decision analysis. Management Science, 52(3):311-322, 2006.
-
(2006)
Management Science
, vol.52
, Issue.3
, pp. 311-322
-
-
Smith, J.E.1
Winkler, R.L.2
-
18
-
-
0001520893
-
Anomalies: The winner's curse
-
Winter
-
R. H. Thaler. Anomalies: The winner's curse. Journal of Economic Perspectives, 2(1):191-202, Winter 1988.
-
(1988)
Journal of Economic Perspectives
, vol.2
, Issue.1
, pp. 191-202
-
-
Thaler, R.H.1
-
19
-
-
34250609333
-
Sur les fonctions convexes et les inégalités entre les valeurs moyennes
-
J. L. W. V. Jensen. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Journal Acta Mathematica, 30(1):175-193, 1906.
-
(1906)
Journal Acta Mathematica
, vol.30
, Issue.1
, pp. 175-193
-
-
Jensen, J.L.W.V.1
-
20
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
S. P. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000.
-
(2000)
Machine Learning
, vol.38
, Issue.3
, pp. 287-308
-
-
Singh, S.P.1
Jaakkola, T.2
Littman, M.L.3
Szepesvári, C.4
-
24
-
-
77956890234
-
Monte Carlo sampling methods using Markov chains and their applications
-
W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, pages 97-109, 1970.
-
(1970)
Biometrika
, pp. 97-109
-
-
Hastings, W.K.1
|