-
1
-
-
0000616723
-
Sample mean based index policies with O(log n) regret for the multi-Armed bandit problem
-
R. Agrawal. Sample mean based index policies with O(log n) regret for the multi-Armed bandit problem. Advances in Applied Probability, pages 1054-1078, 1995.
-
(1995)
Advances in Applied Probability
, pp. 1054-1078
-
-
Agrawal, R.1
-
2
-
-
0036568025
-
Finite-Time analysis of the multiarmed bandit problem
-
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-Time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235-256, 2002.
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
4
-
-
84879976780
-
The arcade learning environment: An evaluation platform for general agents
-
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res. (JAIR), 47:253-279, 2013.
-
(2013)
J. Artif. Intell. Res. (JAIR)
, vol.47
, pp. 253-279
-
-
Bellemare, M.G.1
Naddaf, Y.2
Veness, J.3
Bowling, M.4
-
5
-
-
0041965975
-
Max-A general polynomial time algorithm for near-optimal reinforcement learning
-
R. I. Brafman and M. Tennenholtz. R-max-A general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3:213-231, 2003.
-
(2003)
The Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
6
-
-
0023846591
-
Neocognitron: A hierarchical neural network capable of visual pattern recognition
-
K. Fukushima. Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural networks, 1(2):119-130, 1988.
-
(1988)
Neural Networks
, vol.1
, Issue.2
, pp. 119-130
-
-
Fukushima, K.1
-
8
-
-
0032203257
-
Gradient-based learning applied to document recognition
-
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
-
(1998)
Proceedings of the IEEE
, vol.86
, Issue.11
, pp. 2278-2324
-
-
LeCun, Y.1
Bottou, L.2
Bengio, Y.3
Haffner, P.4
-
9
-
-
0000123778
-
Self-improving reactive agents based on reinforcement learning, planning and teaching
-
L. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3):293-321, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3
, pp. 293-321
-
-
Lin, L.1
-
11
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
Petersen, S.11
Beattie, C.12
Sadik, A.13
Antonoglou, I.14
King, H.15
Kumaran, D.16
Wierstra, D.17
Legg, S.18
Hassabis, D.19
-
12
-
-
84980007683
-
Massively parallel methods for deep reinforcement learning
-
A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. D. Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, K. Kavukcuoglu, and D. Silver. Massively parallel methods for deep reinforcement learning. In Deep Learning Workshop, ICML, 2015.
-
(2015)
Deep Learning Workshop, ICML
-
-
Nair, A.1
Srinivasan, P.2
Blackwell, S.3
Alcicek, C.4
Fearon, R.5
Maria, A.D.6
Panneershelvam, V.7
Suleyman, M.8
Beattie, C.9
Petersen, S.10
Legg, S.11
Mnih, V.12
Kavukcuoglu, K.13
Silver, D.14
-
13
-
-
33646398129
-
Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method
-
Springer
-
M. Riedmiller. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In Proceedings of the 16th European Conference on Machine Learning, pages 317-328. Springer, 2005.
-
(2005)
Proceedings of the 16th European Conference on Machine Learning
, pp. 317-328
-
-
Riedmiller, M.1
-
16
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. S. Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9-44, 1988.
-
(1988)
Machine Learning
, vol.3
, Issue.1
, pp. 9-44
-
-
Sutton, R.S.1
-
17
-
-
85132026293
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
-
R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the seventh international conference on machine learning, pages 216-224, 1990.
-
(1990)
Proceedings of the Seventh International Conference on Machine Learning
, pp. 216-224
-
-
Sutton, R.S.1
-
21
-
-
0029276036
-
Temporal difference learning and td-gammon
-
G. Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58-68, 1995.
-
(1995)
Communications of the ACM
, vol.38
, Issue.3
, pp. 58-68
-
-
Tesauro, G.1
-
22
-
-
0003270924
-
Issues in using function approximation for reinforcement learning
-
In M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, editors, Hillsdale, NJ, Lawrence Erlbaum
-
S. Thrun and A. Schwartz. Issues in using function approximation for reinforcement learning. In M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ, 1993. Lawrence Erlbaum.
-
(1993)
Proceedings of the 1993 Connectionist Models Summer School
-
-
Thrun, S.1
Schwartz, A.2
-
23
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674-690, 1997.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
|