-
2
-
-
57749177069
-
Potential-based shaping in model-based reinforcement learning
-
In Proceedings of AAAI conference on artificial intelligence
-
Asmuth, J., Littman, M. L., & Zinkov, R. (2008). Potential-based shaping in model-based reinforcement learning. In Proceedings of AAAI conference on artificial intelligence.
-
(2008)
-
-
Asmuth, J.1
Littman, M.L.2
Zinkov, R.3
-
6
-
-
84910019014
-
An evolutionary approach to tetris
-
In Proceedings of the sixth metaheuristics international conference
-
Böhm, N., Kókai, G., & Mandl, S. (2005). An evolutionary approach to tetris. In Proceedings of the sixth metaheuristics international conference.
-
(2005)
-
-
Böhm, N.1
Kókai, G.2
Mandl, S.3
-
8
-
-
0041965975
-
R-max - A general polynomial time algorithm for near-optimal reinforcement learning
-
Brafman R.I., Tennenholtz M. R-max - A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 2002, 213-231.
-
(2002)
Journal of Machine Learning Research
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
9
-
-
0026206780
-
An optimal one-way multigrid algorithm for discrete-time stochastic control
-
Chow C.S., Tsitsiklis J.N. An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control 1991, 36(8):898-914.
-
(1991)
IEEE Transactions on Automatic Control
, vol.36
, Issue.8
, pp. 898-914
-
-
Chow, C.S.1
Tsitsiklis, J.N.2
-
11
-
-
0001234682
-
-
In Proceedings of advances in neural information processing systems
-
Dayan, P., & Hinton, G. E. (1993). Feudal reinforcement learning. In Proceedings of advances in neural information processing systems.
-
(1993)
Feudal reinforcement learning
-
-
Dayan, P.1
Hinton, G.E.2
-
12
-
-
0002278788
-
Hierarchical reinforcement learning with the MAXQ value function decomposition
-
Dietterich T.G. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 2000, 13:227-303.
-
(2000)
Journal of Artificial Intelligence Research
, vol.13
, pp. 227-303
-
-
Dietterich, T.G.1
-
13
-
-
34250717446
-
Qualitative reinforcement learning
-
In Proceedings of the 23rd international conference on machine learning
-
Epshteyn, A., & DeJong, G. (2006). Qualitative reinforcement learning. In Proceedings of the 23rd international conference on machine learning (pp. 305-312).
-
(2006)
, pp. 305-312
-
-
Epshteyn, A.1
DeJong, G.2
-
14
-
-
84902609284
-
-
Elsevier, Morgan Kaufmann Publishers
-
Ghallab M., Nau D., Traverso P. Automated planning, theory and practice 2004, Elsevier, Morgan Kaufmann Publishers.
-
(2004)
Automated planning, theory and practice
-
-
Ghallab, M.1
Nau, D.2
Traverso, P.3
-
15
-
-
58849111871
-
Multigrid reinforcement learning with reward shaping
-
Springer-Verlag, Proceedings of the 18th international conference on artificial neural networks
-
Grzes M., Kudenko D. Multigrid reinforcement learning with reward shaping. LNCS 2008, Springer-Verlag.
-
(2008)
LNCS
-
-
Grzes, M.1
Kudenko, D.2
-
17
-
-
84875095635
-
Shaping as a method for accelerating reinforcement learning
-
In Proceedings of the 1992 IEEE international symposium on intelligent control
-
Gullapalli, V., & Barto, A. G. (1992). Shaping as a method for accelerating reinforcement learning. In Proceedings of the 1992 IEEE international symposium on intelligent control (pp. 554-559).
-
(1992)
, pp. 554-559
-
-
Gullapalli, V.1
Barto, A.G.2
-
18
-
-
85143168613
-
Hierarchical learning in stochastic domains
-
In Proceedings of international conference on machine learning
-
Kaelbling, L. P. (1993). Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of international conference on machine learning (pp. 167-173).
-
(1993)
Preliminary results
, pp. 167-173
-
-
Kaelbling, L.P.1
-
19
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Kearns M., Singh S. Near-optimal reinforcement learning in polynomial time. Machine Learning 2002, 209-232.
-
(2002)
Machine Learning
, pp. 209-232
-
-
Kearns, M.1
Singh, S.2
-
20
-
-
33749243349
-
-
Autonomous shaping: Knowledge transfer in reinforcement learning. In The 23th international conference on machine learning
-
Konidaris, G., & Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In The 23th international conference on machine learning.
-
(2006)
-
-
Konidaris, G.1
Barto, A.2
-
21
-
-
34547964974
-
Automatic shaping and decomposition of reward functions
-
In Proceedings of the 24th international conference on machine learning
-
Marthi, B. (2007). Automatic shaping and decomposition of reward functions. In Proceedings of the 24th international conference on machine learning (pp. 601-608).
-
(2007)
, pp. 601-608
-
-
Marthi, B.1
-
22
-
-
84957895797
-
Reward functions for accelerated learning
-
In Proceedings of the 11th international conference on machine learning
-
Mataric, M. J. (1994). Reward functions for accelerated learning. In Proceedings of the 11th international conference on machine learning (pp. 181-189).
-
(1994)
, pp. 181-189
-
-
Mataric, M.J.1
-
24
-
-
84880688141
-
Multi-value-functions: Efficient automatic action hierarchies for multiple goal MDPs
-
In Proceedings of the international joint conference on artificial intelligence
-
Moore, A., Baird, L., & Kaelbling, L. P. (1999). Multi-value-functions: Efficient automatic action hierarchies for multiple goal MDPs. In Proceedings of the international joint conference on artificial intelligence (pp. 1316-1323).
-
(1999)
, pp. 1316-1323
-
-
Moore, A.1
Baird, L.2
Kaelbling, L.P.3
-
25
-
-
0141596576
-
Policy invariance under reward transformations
-
In Proceedings of the 16th international conference on machine learning
-
Ng, A. Y., Harada, D., & Russell, S. J. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 16th international conference on machine learning (pp. 278-287).
-
(1999)
Theory and application to reward shaping
, pp. 278-287
-
-
Ng, A.Y.1
Harada, D.2
Russell, S.J.3
-
26
-
-
0141819580
-
-
PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of uncertainty in artificial intelligence
-
Ng, A. Y., & Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of uncertainty in artificial intelligence (pp. 406-415).
-
(2000)
, pp. 406-415
-
-
Ng, A.Y.1
Jordan, M.2
-
27
-
-
0001070375
-
Reinforcement learning with hierarchies of machines
-
In Proceedings of advances in neural information processing systems
-
Parr, R., & Russell, S. (1997). Reinforcement learning with hierarchies of machines. In Proceedings of advances in neural information processing systems. Vol. 10.
-
(1997)
, vol.10
-
-
Parr, R.1
Russell, S.2
-
28
-
-
0003998452
-
-
John Wiley & Sons, Inc, New York, NY, USA
-
Puterman M.L. Markov decision processes: Discrete stochastic dynamic programming 1994, John Wiley & Sons, Inc, New York, NY, USA.
-
(1994)
Markov decision processes: Discrete stochastic dynamic programming
-
-
Puterman, M.L.1
-
29
-
-
1642401055
-
Learning to drive a bicycle using reinforcement learning and shaping
-
In Proceedings of the 15th international conference on machine learning
-
Randløv, J., & Alstrom, P. (1998). Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th international conference on machine learning (pp. 463-471).
-
(1998)
, pp. 463-471
-
-
Randløv, J.1
Alstrom, P.2
-
30
-
-
84880859730
-
Real-time heuristic search with a priority queue
-
In Proceedings of the 2007 international joint conference on artificial intelligence
-
Rayner, D. C., Davison, K., Bulitko, V., Anderson, K., & Lu, J. (2007). Real-time heuristic search with a priority queue. In Proceedings of the 2007 international joint conference on artificial intelligence (pp. 2372-2377).
-
(2007)
, pp. 2372-2377
-
-
Rayner, D.C.1
Davison, K.2
Bulitko, V.3
Anderson, K.4
Lu, J.5
-
32
-
-
77950296264
-
-
In Proceedings of the 11th European conference on machine learning
-
Stone, P., & Veloso, M. (2000). Layered learning. In Proceedings of the 11th European conference on machine learning.
-
(2000)
Layered learning
-
-
Stone, P.1
Veloso, M.2
-
33
-
-
33845972675
-
-
Pac reinforcement learning bounds for rtdp and rand-rtdp. In Proceedings of AAAI workshop on learning for search
-
Strehl, A. L., Li, L., & Littman, M. L. (2006). Pac reinforcement learning bounds for rtdp and rand-rtdp. In Proceedings of AAAI workshop on learning for search.
-
(2006)
-
-
Strehl, A.L.1
Li, L.2
Littman, M.L.3
-
34
-
-
85132026293
-
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th international conference on machine learning
-
Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th international conference on machine learning (pp. 216-224).
-
(1990)
, pp. 216-224
-
-
Sutton, R.S.1
-
36
-
-
0033170372
-
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
-
Sutton R.S., Precup D., Singh S.P. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 1999, 112(1-2):181-211.
-
(1999)
Artificial Intelligence
, vol.112
, Issue.1-2
, pp. 181-211
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.P.3
-
37
-
-
27544473171
-
Behavior transfer for value-function-based reinforcement learning
-
In Proceedings of the 4th international joint conference on autonomous agents and multiagent systems
-
Taylor, M. E., & Stone, P. (2005). Behavior transfer for value-function-based reinforcement learning. In Proceedings of the 4th international joint conference on autonomous agents and multiagent systems (pp. 53-59).
-
(2005)
, pp. 53-59
-
-
Taylor, M.E.1
Stone, P.2
-
38
-
-
0000985504
-
TD-gammon, a self-teaching backgammon program, achieves master-level play
-
Tesauro G.J. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 1994, 6(2):215-219.
-
(1994)
Neural Computation
, vol.6
, Issue.2
, pp. 215-219
-
-
Tesauro, G.J.1
-
39
-
-
27344453198
-
Potential-based shaping and q-value initialisation are equivalent
-
Wiewiora E. Potential-based shaping and q-value initialisation are equivalent. Journal of Artificial Intelligence Research 2003, 19:205-208.
-
(2003)
Journal of Artificial Intelligence Research
, vol.19
, pp. 205-208
-
-
Wiewiora, E.1
|