-
1
-
-
84864030941
-
An application of reinforcement learning to aerobatic helicopter flight
-
Abbeel, P., Coates, A., Quigley, M. and Ng, A. Y., An application of reinforcement learning to aerobatic helicopter flight, NIPS 19 (2007).
-
(2007)
NIPS
, vol.19
-
-
Abbeel, P.1
Coates, A.2
Quigley, M.3
Ng, A.Y.4
-
3
-
-
79955460528
-
Reinforcement learning with action discovery
-
Toronto, Canada, May-10th
-
Banerjee, B. and Kraemer, L., Reinforcement learning with action discovery, in Proceedings of the Adaptive and Learning Agents Workshop at AAMAS-10, Toronto, Canada, May-10th (2010), pp. 30-37.
-
(2010)
Proceedings of the Adaptive and Learning Agents Workshop at AAMAS-10
, pp. 30-37
-
-
Banerjee, B.1
Kraemer, L.2
-
4
-
-
35248823118
-
Generalized multiagent learning with performance bound
-
Banerjee, B. and Peng, J., Generalized multiagent learning with performance bound, Autonomous Agents and Multiagent Systems 15(3) (2007) 281-312.
-
(2007)
Autonomous Agents and Multiagent Systems
, vol.15
, Issue.3
, pp. 281-312
-
-
Banerjee, B.1
Peng, J.2
-
6
-
-
0031630561
-
The dynamics of reinforcement learning in cooperative multiagent systems
-
Menlo Park, CA, (AAAI Press/MIT Press)
-
Claus, C. and Boutilier, C., The dynamics of reinforcement learning in cooperative multiagent systems, in Proceedings of the 15th National Conference on Artificial Intelligence, Menlo Park, CA, (AAAI Press/MIT Press, 1998), pp. 746-752.
-
(1998)
Proceedings of the 15th National Conference on Artificial Intelligence
, pp. 746-752
-
-
Claus, C.1
Boutilier, C.2
-
7
-
-
85156187730
-
Improving elevator performance using reinforcement learning
-
Crites, R. H. and Barto, A. G., Improving elevator performance using reinforcement learning, in Advances in Neural Information Processing Systems 8, 8 (1996) 1017-1023.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, Issue.8
, pp. 1017-1023
-
-
Crites, R.H.1
Barto, A.G.2
-
8
-
-
4644369748
-
Nash Q-learning for general-sum stochastic games
-
Hu, J. and Wellman, M. P., Nash Q-learning for general-sum stochastic games, J. Mach. Learn. Res. 4 (2003) 1039-1069.
-
(2003)
J. Mach. Learn. Res.
, vol.4
, pp. 1039-1069
-
-
Hu, J.1
Wellman, M.P.2
-
9
-
-
85161968592
-
Reinforcement learning in continuous action spaces through sequential monte carlo methods
-
Platt, J., Koller, D., Singer, Y. and Roweis, S., (eds.), Cambridge, MA, MIT Press
-
Lazaric, A., Restelli, M. and Bonarini, A., Reinforcement learning in continuous action spaces through sequential monte carlo methods, Platt, J., Koller, D., Singer, Y. and Roweis, S., (eds.), Advances in Neural Information Processing Systems 20, Cambridge, MA, (MIT Press, 2008), pp. 833-840.
-
(2008)
Advances in Neural Information Processing Systems
, vol.20
, pp. 833-840
-
-
Lazaric, A.1
Restelli, M.2
Bonarini, A.3
-
10
-
-
85149834820
-
Markov games as a framework for multi-agent reinforcement learning
-
San Mateo, CA (Morgan Kaufmann)
-
Littman, M. L., Markov games as a framework for multi-agent reinforcement learning, in Proc. of the 11th Int. Conf. on Machine Learning, San Mateo, CA (Morgan Kaufmann, 1994), pp. 157-163.
-
(1994)
Proc. of the 11th Int. Conf. on Machine Learning
, pp. 157-163
-
-
Littman, M.L.1
-
11
-
-
0141596576
-
Policy invariance under reward transformations: Theory and application to reward shaping
-
Morgan Kaufmann
-
Ng, A. Y., Harada, D. and Russell, S., Policy invariance under reward transformations: Theory and application to reward shaping, in Proc. 16th International Conf. on Machine Learning (Morgan Kaufmann, 1999), pp. 278-287.
-
(1999)
Proc. 16th International Conf. on Machine Learning
, pp. 278-287
-
-
Ng, A.Y.1
Harada, D.2
Russell, S.3
-
12
-
-
0013530450
-
Lyapunov-constrained action sets for reinforcement learning
-
Perkins, T. J. and Barto, A. G., Lyapunov-constrained action sets for reinforcement learning, in Proceedings of the ICML (2001), pp. 409-416.
-
(2001)
Proceedings of the ICML
, pp. 409-416
-
-
Perkins, T.J.1
Barto, A.G.2
-
15
-
-
0033170372
-
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
-
DOI 10.1016/S0004-3702(99)00052-1
-
Sutton, R., Precup, D. and Singh, S., Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artif. Intel. 112 (1999) 181-211. (Pubitemid 32079890)
-
(1999)
Artificial Intelligence
, vol.112
, Issue.1
, pp. 181-211
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.3
-
16
-
-
0029276036
-
Temporal difference learning and TD-gammon
-
Tesauro, G., Temporal difference learning and TD-gammon, Commun. ACM 38(3) (1995) 58-68.
-
(1995)
Commun. ACM
, vol.38
, Issue.3
, pp. 58-68
-
-
Tesauro, G.1
-
17
-
-
27344453198
-
Potential-based shaping and Q-value initialization are equivalent
-
Wiewiora, E., Potential based shaping and Q-value initialization are equivalent, J. Artif. Intel. Res. 19 (2003) 205-208. (Pubitemid 41525920)
-
(2003)
Journal of Artificial Intelligence Research
, vol.19
, pp. 205-208
-
-
Wiewiora, E.1
|