-
1
-
-
34547996078
-
QUICR-learning for multi-agent coordination
-
Agogino, A., & Turner, K. (2006). QUICR-learning for multi-agent coordination. AAAI 2006.
-
(2006)
AAAI 2006
-
-
Agogino, A.1
Turner, K.2
-
2
-
-
1642401055
-
Learning to drive a bicycle using reinforcement learning and shaping
-
Alstrom, J. R. P. (1998). Learning to drive a bicycle using reinforcement learning and shaping. ICML 1998.
-
(1998)
ICML 1998
-
-
Alstrom, J.R.P.1
-
4
-
-
34547989671
-
On local rewards and scaling distributed reinforcement learning
-
MIT Press
-
Bagnell, J., & Ng, A. (2006). On local rewards and scaling distributed reinforcement learning. Neural Information Processing Systems. MIT Press.
-
(2006)
Neural Information Processing Systems
-
-
Bagnell, J.1
Ng, A.2
-
6
-
-
84899032145
-
All learning is local: Multi-agent learning in global reward games
-
S. Thrun, L. Saul and B. Scholkopf Eds, Cambridge, MA: MIT Press
-
Chang, Y.-H., Ho, T., & Kaelbling, L. P. (2004). All learning is local: Multi-agent learning in global reward games. In S. Thrun, L. Saul and B. Scholkopf (Eds.), Advances in neural information processing systems 16. Cambridge, MA: MIT Press.
-
(2004)
Advances in neural information processing systems 16
-
-
Chang, Y.-H.1
Ho, T.2
Kaelbling, L.P.3
-
7
-
-
0026206780
-
An optimal one-way multigrid algorithm for discrete-time stochastic control
-
Chow, C., & Tsitsiklis, J. (1991). An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE transactions on automatic control, 36, 898-914.
-
(1991)
IEEE transactions on automatic control
, vol.36
, pp. 898-914
-
-
Chow, C.1
Tsitsiklis, J.2
-
8
-
-
0002278788
-
Hierarchical reinforcement learning with the MAXQ value function decomposition
-
Dietterieh, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. JAIR, 13.
-
(2000)
JAIR
, vol.13
-
-
Dietterieh, T.G.1
-
9
-
-
4544318426
-
Efficient solution algorithms for factored MDPs
-
Guestrin, C., Koller, D., Parr, R., & Venkataraman, S. (2003). Efficient solution algorithms for factored MDPs. JAIR, 19.
-
(2003)
JAIR
, vol.19
-
-
Guestrin, C.1
Koller, D.2
Parr, R.3
Venkataraman, S.4
-
10
-
-
0006419533
-
Hierarchical solution of Markov decision processes using macro-actions
-
Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. Proceedings of the fourteenth conference on Uncertainty in Artificial Intelligence (pp. 220-229).
-
(1998)
Proceedings of the fourteenth conference on Uncertainty in Artificial Intelligence
, pp. 220-229
-
-
Hauskrecht, M.1
Meuleau, N.2
Kaelbling, L.P.3
Dean, T.4
Boutilier, C.5
-
12
-
-
1942482706
-
Reinforcement learning and shaping: Encouraging intended behaviors
-
Laud, A., & DeJong, G. (2002). Reinforcement learning and shaping: Encouraging intended behaviors. ICML (pp. 355-362).
-
(2002)
ICML
, pp. 355-362
-
-
Laud, A.1
DeJong, G.2
-
13
-
-
1942484890
-
The influence of reward on the speed of reinforcement learning: An analysis of shaping
-
Laud, A., & Dejong, G. (2003). The influence of reward on the speed of reinforcement learning: An analysis of shaping. ICML 2003.
-
(2003)
ICML 2003
-
-
Laud, A.1
Dejong, G.2
-
15
-
-
0343048727
-
A distributed reinforcement learning scheme for network routing
-
Carnegie Mellon University, Pittsburgh, PA, USA
-
Littman, M., & Boyan, J. (1993). A distributed reinforcement learning scheme for network routing (Technical Report). Carnegie Mellon University, Pittsburgh, PA, USA.
-
(1993)
Technical Report
-
-
Littman, M.1
Boyan, J.2
-
17
-
-
34547998189
-
Concurrent hierarchical reinforcement learning
-
Marthi, B., Russell, S., Latham, D., & Guestrin, C. (2005). Concurrent hierarchical reinforcement learning. ICML 2005.
-
(2005)
ICML 2005
-
-
Marthi, B.1
Russell, S.2
Latham, D.3
Guestrin, C.4
-
18
-
-
84957895797
-
Reward functions for accelerated learning
-
Mataric, M. J. (1994). Reward functions for accelerated learning. ICML 1994,
-
(1994)
ICML 1994
-
-
Mataric, M.J.1
-
19
-
-
0141596576
-
Policy invariance under reward transformations: Theory and application to reward shaping
-
Ng, A., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. ICML 1999.
-
(1999)
ICML 1999
-
-
Ng, A.1
Harada, D.2
Russell, S.3
-
20
-
-
1942484759
-
Q-decomposition for reinforcement learning agents
-
Russell, S., & Zimdars, A. (2003). Q-decomposition for reinforcement learning agents. ICML 2003.
-
(2003)
ICML 2003
-
-
Russell, S.1
Zimdars, A.2
-
21
-
-
0001395498
-
Distributed value functions
-
Schneider, J., Wong, W., Moore, A., & Riedmiller, M. (1999). Distributed value functions. ICML 1999 (pp. 371-378).
-
(1999)
ICML 1999
, pp. 371-378
-
-
Schneider, J.1
Wong, W.2
Moore, A.3
Riedmiller, M.4
-
22
-
-
0028497385
-
An upper bound on the loss from approximate optimal-value functions
-
Singh, S. P., & Yee, R. C. (1994). An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16, 227-233.
-
(1994)
Machine Learning
, vol.16
, pp. 227-233
-
-
Singh, S.P.1
Yee, R.C.2
-
23
-
-
34547989108
-
Combining dynamic abstractions in large MDPs
-
MIT
-
Steinkraus, K., & Kaelbling, L. (2004). Combining dynamic abstractions in large MDPs (Technical Report). MIT.
-
(2004)
Technical Report
-
-
Steinkraus, K.1
Kaelbling, L.2
-
24
-
-
0013528313
-
Scaling reinforcement learning toward RoboCup soccer
-
Stone, P., & Sutton, R. S. (2001). Scaling reinforcement learning toward RoboCup soccer. ICML 2001.
-
(2001)
ICML 2001
-
-
Stone, P.1
Sutton, R.S.2
-
25
-
-
0033170372
-
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
-
Sutton, R. S., Precup, D., & Singh, S. P. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181-211.
-
(1999)
Artificial Intelligence
, vol.112
, pp. 181-211
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.P.3
-
26
-
-
34547996491
-
Reinforcement learning and its application to Othello
-
Erasmus University
-
VanEck, N., & VanWezel, M. (2005). Reinforcement learning and its application to Othello (Technical Report). Erasmus University.
-
(2005)
Technical Report
-
-
VanEck, N.1
VanWezel, M.2
-
27
-
-
27344453198
-
Potential-based shaping and Q-value initialization are equivalent
-
Wiewiora, E. (2003). Potential-based shaping and Q-value initialization are equivalent. Journal of Artificial Intelligence Research, 19, 205-208.
-
(2003)
Journal of Artificial Intelligence Research
, vol.19
, pp. 205-208
-
-
Wiewiora, E.1
|