-
1
-
-
49049087720
-
Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators
-
Baddeley, B.: Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 38(4), 950-956 (2008)
-
(2008)
IEEE Transactions on Systems Man and Cybernetics-Part B: Cybernetics
, vol.38
, Issue.4
, pp. 950-956
-
-
Baddeley, B.1
-
3
-
-
0020970738
-
Neuronlike adaptive elements than can solve difficult learning control problems
-
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements than can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13(5), 833-846 (1983)
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.13
, Issue.5
, pp. 833-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
5
-
-
0026923465
-
Learning and tuning fuzzy logic controllers through reinforcements
-
Berenji, H.R., Khedkar, P.: Learning and tuning fuzzy logic controllers through reinforcements. IEEE Transactions on Neural Networks 3(5), 724-740 (1992)
-
(1992)
IEEE Transactions on Neural Networks
, vol.3
, Issue.5
, pp. 724-740
-
-
Berenji, H.R.1
Khedkar, P.2
-
6
-
-
0041877717
-
A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters
-
Berenji, H.R., Vengerov, D.: A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters. IEEE Transactions on Fuzzy Systems 11(4), 478-485 (2003)
-
(2003)
IEEE Transactions on Fuzzy Systems
, vol.11
, Issue.4
, pp. 478-485
-
-
Berenji, H.R.1
Vengerov, D.2
-
7
-
-
0024680419
-
Adaptive aggregation methods for infinite horizon dynamic programming
-
Bertsekas, D.P.: Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control 34(6), 589-598 (1989)
-
(1989)
IEEE Transactions on Automatic Control
, vol.34
, Issue.6
, pp. 589-598
-
-
Bertsekas, D.P.1
-
8
-
-
33645410501
-
Dynamic programming and suboptimal control: A survey from ADP to MPC
-
Special issue for the CDC-ECC-05 in Seville, Spain
-
Bertsekas, D.P.: Dynamic programming and suboptimal control: A survey from ADP to MPC. European Journal of Control 11(4-5) (2005); Special issue for the CDC-ECC-05 in Seville, Spain
-
(2005)
European Journal of Control
, vol.11
, Issue.4-5
-
-
Bertsekas, D.P.1
-
12
-
-
13244278201
-
An actor-critic algorithm for constrained Markov decision processes
-
Borkar, V.: An actor-critic algorithm for constrained Markov decision processes. Systems & Control Letters 54, 207-213 (2005)
-
(2005)
Systems & Control Letters
, vol.54
, pp. 207-213
-
-
Borkar, V.1
-
13
-
-
40949147745
-
A comprehensive survey of multi-agent reinforcement learning
-
Bu̧soniu, L., Babuška, R., De Schutter, B.: A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics 38(2), 156-172 (2008)
-
(2008)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.38
, Issue.2
, pp. 156-172
-
-
Bu̧soniu, L.1
Babuška, R.2
De Schutter, B.3
-
14
-
-
55249099118
-
Consistency of fuzzy model-based reinforcement learning
-
Hong Kong
-
Bu̧soniu, L., Ernst, D., De Schutter, B., Babuška, R.: Consistency of fuzzy model-based reinforcement learning. In: Proceedings 2008 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2008), Hong Kong, pp. 518-524 (2008)
-
(2008)
Proceedings 2008 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2008)
, pp. 518-524
-
-
Bu̧soniu, L.1
Ernst, D.2
De Schutter, B.3
Babuška, R.4
-
15
-
-
79961019197
-
Fuzzy partition optimization for approximate fuzzy Q-iteration
-
Seoul, Korea
-
Bu̧soniu, L., Ernst, D., De Schutter, B., Babuška, R.: Fuzzy partition optimization for approximate fuzzy Q-iteration. In: Proceedings 17th IFACWorld Congress (IFAC 2008), Seoul, Korea, pp. 5629-5634 (2008)
-
(2008)
Proceedings 17th IFACWorld Congress (IFAC 2008)
, pp. 5629-5634
-
-
Bu̧soniu, L.1
Ernst, D.2
De Schutter, B.3
Babuška, R.4
-
16
-
-
67650502101
-
Policy search with cross-entropy optimization of basis functions
-
Nashville, US
-
Bu̧soniu, L., Ernst, D., De Schutter, B., Babuška, R.: Policy search with cross-entropy optimization of basis functions. In: Proceedings 2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, US, pp. 153-160 (2009)
-
(2009)
Proceedings 2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009)
, pp. 153-160
-
-
Bu̧soniu, L.1
Ernst, D.2
De Schutter, B.3
Babuška, R.4
-
17
-
-
34547120053
-
-
Springer, Heidelberg
-
Chang, H.S., Fu, M.C., Hu, J., Marcus, S.I.: Simulation-Based Algorithms for Markov Decision Processes. Springer, Heidelberg (2007)
-
(2007)
Simulation-Based Algorithms for Markov Decision Processes
-
-
Chang, H.S.1
Fu, M.C.2
Hu, J.3
Marcus, S.I.4
-
18
-
-
28644432777
-
Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes
-
Morgantown, US
-
Chin, H.H., Jafari, A.A.: Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes. In: Proceedings 30th Southeastern Symposium on System Theory, Morgantown, US, pp. 538-543 (1998)
-
(1998)
Proceedings 30th Southeastern Symposium on System Theory
, pp. 538-543
-
-
Chin, H.H.1
Jafari, A.A.2
-
19
-
-
0026206780
-
An optimal one-way multigrid algorithm for discrete-time stochastic control
-
Chow, C.S., Tsitsiklis, J.N.: An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control 36(8), 898-914 (1991)
-
(1991)
IEEE Transactions on Automatic Control
, vol.36
, Issue.8
, pp. 898-914
-
-
Chow, C.S.1
Tsitsiklis, J.N.2
-
20
-
-
84942750244
-
Feedforward neural networks in reinforcement learning applied to highdimensional motor control
-
Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds.). Springer, Heidelberg
-
Coulom, R.: Feedforward neural networks in reinforcement learning applied to highdimensional motor control. In: Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds.) ALT 2002. LNCS (LNAI), vol.2533, pp. 403-413. Springer, Heidelberg (2002)
-
(2002)
ALT 2002. LNCS (LNAI)
, vol.2533
, pp. 403-413
-
-
Coulom, R.1
-
21
-
-
21844465127
-
Tree-based batch mode reinforcement learning
-
Ernst, D., Geurts, P.,Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503-556 (2005)
-
(2005)
Journal of Machine Learning Research
, vol.6
, pp. 503-556
-
-
Ernst, D.1
Geurts, P.2
Wehenkel, L.3
-
22
-
-
64049106701
-
Reinforcement learning versus model predictive control: A comparison on a power system problem
-
Ernst, D., Glavic, M., Capitanescu, F., Wehenkel, L.: Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 39(2), 517-529 (2009)
-
(2009)
IEEE Transactions on Systems Man and Cybernetics-Part B: Cybernetics
, vol.39
, Issue.2
, pp. 517-529
-
-
Ernst, D.1
Glavic, M.2
Capitanescu, F.3
Wehenkel, L.4
-
24
-
-
33750374195
-
Efficient non-linear control through neuroevolution
-
Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.). Springer, Heidelberg
-
Gomez, F.J., Schmidhuber, J., Miikkulainen, R.: Efficient non-linear control through neuroevolution. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol.4212, pp. 654-662. Springer, Heidelberg (2006)
-
(2006)
ECML 2006. LNCS (LNAI)
, vol.4212
, pp. 654-662
-
-
Gomez, F.J.1
Schmidhuber, J.2
Miikkulainen, R.3
-
25
-
-
0022027092
-
On deterministic control problems: An approximation procedure for the optimal cost I. The stationary problem
-
Gonzalez, R.L., Rofman, E.: On deterministic control problems: An approximation procedure for the optimal cost I. The stationary problem. SIAM Journal on Control and Optimization 23(2), 242-266 (1985)
-
(1985)
SIAM Journal on Control and Optimization
, vol.23
, Issue.2
, pp. 242-266
-
-
Gonzalez, R.L.1
Rofman, E.2
-
27
-
-
11144319332
-
Error estimation and adaptive discretization for the discrete stochastic Hamilton-Jacobi-Bellman equation
-
Grüne, L.: Error estimation and adaptive discretization for the discrete stochastic Hamilton-Jacobi-Bellman equation. Numerical Mathematics 99, 85-112 (2004)
-
(2004)
Numerical Mathematics
, vol.99
, pp. 85-112
-
-
Grüne, L.1
-
28
-
-
0030377615
-
Fuzzy interpolation-based Q-learning with continuous states and actions
-
New Orleans, US
-
Horiuchi, T., Fujino, A., Katai, O., Sawaragi, T.: Fuzzy interpolation-based Q-learning with continuous states and actions. In: Proceedings 5th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 1996), New Orleans, US, pp. 594-600 (1996)
-
(1996)
Proceedings 5th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 1996)
, pp. 594-600
-
-
Horiuchi, T.1
Fujino, A.2
Katai, O.3
Sawaragi, T.4
-
29
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6(6), 1185-1201 (1994)
-
(1994)
Neural Computation
, vol.6
, Issue.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
31
-
-
84885993384
-
Least squares SVM for least squares TD learning
-
Riva del Garda, Italy
-
Jung, T., Polani, D.: Least squares SVM for least squares TD learning. In: Proceedings of 17th European Conference on Artificial Intelligence (ECAI 2006), Riva del Garda, Italy, pp. 499-503 (2006)
-
(2006)
Proceedings of 17th European Conference on Artificial Intelligence (ECAI 2006)
, pp. 499-503
-
-
Jung, T.1
Polani, D.2
-
32
-
-
34548765672
-
Kernelizing LSPE(λ)
-
Honolulu, US
-
Jung, T., Polani, D.: Kernelizing LSPE(λ). In: Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007), Honolulu, US, pp. 338-345 (2007)
-
(2007)
Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007)
, pp. 338-345
-
-
Jung, T.1
Polani, D.2
-
33
-
-
22944487667
-
Experiments in value function approximation with sparse support vector regression
-
Pisa, Italy
-
Jung, T., Uthmann, T.: Experiments in value function approximation with sparse support vector regression. In: Proceedings 15th European Conference on Machine Learning (ECML 2004), Pisa, Italy, pp. 180-191 (2004)
-
(2004)
Proceedings 15th European Conference on Machine Learning (ECML 2004)
, pp. 180-191
-
-
Jung, T.1
Uthmann, T.2
-
34
-
-
0032073263
-
Planning and acting in partially observable stochastic domains
-
Kaelbling, L.P., Littman,M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99-134 (1998)
-
(1998)
Artificial Intelligence
, vol.101
, pp. 99-134
-
-
Kaelbling, L.P.1
Littman, M.L.2
Cassandra, A.R.3
-
35
-
-
0029679044
-
Reinforcement learning: A survey
-
Kaelbling, L.P., Littman,M.L.,Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237-285 (1996)
-
(1996)
Journal of Artificial Intelligence Research
, vol.4
, pp. 237-285
-
-
Kaelbling, L.P.1
Littman, M.L.2
Moore, A.W.3
-
36
-
-
0042758707
-
-
Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, US
-
Konda, V.: Actor-critic algorithms. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, US (2002)
-
(2002)
Actor-critic Algorithms
-
-
Konda, V.1
-
37
-
-
84898938510
-
Actor-critic algorithms
-
Solla, S.A., Leen, T.K., Müller, K.R. (eds.). MIT Press, Cambridge
-
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol.12, pp. 1008-1014. MIT Press, Cambridge (2000)
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1008-1014
-
-
Konda, V.R.1
Tsitsiklis, J.N.2
-
39
-
-
35048819671
-
Least-squares methods in reinforcement learning for control
-
Vlahavas, I.P., Spyropoulos, C.D. (eds.). Springer, Heidelberg
-
Lagoudakis, M., Parr, R., Littman, M.: Least-squares methods in reinforcement learning for control. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) SETN 2002. LNCS (LNAI), vol.2308, pp. 249-260. Springer, Heidelberg (2002)
-
(2002)
SETN 2002. LNCS (LNAI)
, vol.2308
, pp. 249-260
-
-
Lagoudakis, M.1
Parr, R.2
Littman, M.3
-
41
-
-
1942420814
-
Reinforcement learning as classification: Leveraging modern classifiers
-
Washington, US
-
Lagoudakis, M.G., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: Proceedings 20th International Conference on Machine Learning (ICML 2003), Washington, US, pp. 424-431 (2003)
-
(2003)
Proceedings 20th International Conference on Machine Learning (ICML 2003)
, pp. 424-431
-
-
Lagoudakis, M.G.1
Parr, R.2
-
42
-
-
0033412824
-
Pattern search algorithms for bound constrained minimization
-
Lewis, R.M., Torczon, V.: Pattern search algorithms for bound constrained minimization. SIAM Journal on Optimization 9(4), 1082-1099 (1999)
-
(1999)
SIAM Journal on Optimization
, vol.9
, Issue.4
, pp. 1082-1099
-
-
Lewis, R.M.1
Torczon, V.2
-
43
-
-
0000123778
-
Self-improving reactive agents based on reinforcement learning, planning and teaching
-
Special Issue on Reinforcement Learning
-
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8(3/4), 293-321 (1992); Special Issue on Reinforcement Learning
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 293-321
-
-
Lin, L.J.1
-
44
-
-
49049108697
-
Adaptive critic learning techniques for engine torque and air-fuel ratio control
-
Liu, D., Javaherian, H., Kovalenko, O., Huang, T.: Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 38(4), 988-993 (2008)
-
(2008)
IEEE Transactions on Systems Man and Cybernetics-Part B: Cybernetics
, vol.38
, Issue.4
, pp. 988-993
-
-
Liu, D.1
Javaherian, H.2
Kovalenko, O.3
Huang, T.4
-
47
-
-
35748957806
-
Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
-
Mahadevan, S., Maggioni, M.: Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research 8, 2169-2231 (2007)
-
(2007)
Journal of Machine Learning Research
, vol.8
, pp. 2169-2231
-
-
Mahadevan, S.1
Maggioni, M.2
-
48
-
-
1942516890
-
The cross-entropy method for fast policy search
-
Washington, US
-
Mannor, S., Rubinstein, R.Y., Gat, Y.: The cross-entropy method for fast policy search. In: Proceedings 20th International Conference on Machine Learning (ICML 2003), Washington, US, pp. 512-519 (2003)
-
(2003)
Proceedings 20th International Conference on Machine Learning (ICML 2003)
, pp. 512-519
-
-
Mannor, S.1
Rubinstein, R.Y.2
Gat, Y.3
-
51
-
-
17444414191
-
Basis function adaptation in temporal difference reinforcement learning
-
Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215-238 (2005)
-
(2005)
Annals of Operations Research
, vol.134
, pp. 215-238
-
-
Menache, I.1
Mannor, S.2
Shimkin, N.3
-
52
-
-
0036832960
-
Continuous-action Q-learning
-
Millán, J.d.R., Posenato, D., Dedieu, E.: Continuous-action Q-learning. Machine Learning 49(2-3), 247-265 (2002)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 247-265
-
-
Millán, J.D.R.1
Posenato, D.2
Dedieu, E.3
-
53
-
-
84947933152
-
Finite-element methods with local triangulation refinement for continuous reinforcement learning problems
-
van Someren, M., Widmer, G. (eds.). Springer, Heidelberg
-
Munos, R.: Finite-element methods with local triangulation refinement for continuous reinforcement learning problems. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol.1224, pp. 170-182. Springer, Heidelberg (1997)
-
(1997)
ECML 1997. LNCS
, vol.1224
, pp. 170-182
-
-
Munos, R.1
-
55
-
-
0036832953
-
Variable-resolution discretization in optimal control
-
Munos, R., Moore, A.: Variable-resolution discretization in optimal control. Machine Learning 49(2-3), 291-323 (2002)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 291-323
-
-
Munos, R.1
Moore, A.2
-
56
-
-
34547769694
-
Reinforcement learning for a biped robot based on a CPG-actor-critic method
-
Nakamura, Y., Moria, T., Satoc, M., Ishiia, S.: Reinforcement learning for a biped robot based on a CPG-actor-critic method. Neural Networks 20, 723-735 (2007)
-
(2007)
Neural Networks
, vol.20
, pp. 723-735
-
-
Nakamura, Y.1
Moria, T.2
Satoc, M.3
Ishiia, S.4
-
57
-
-
0037288398
-
Least-squares policy evaluation algorithms with linear function approximation
-
Nedíc, A., Bertsekas, D.P.: Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems 13, 79-110 (2003)
-
(2003)
Discrete Event Dynamic Systems
, vol.13
, pp. 79-110
-
-
Nedíc, A.1
Bertsekas, D.P.2
-
58
-
-
0141596576
-
Policy invariance under reward transformations: Theory and application to reward shaping
-
Bled, Slovenia
-
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings 16th International Conference on Machine Learning (ICML 1999), Bled, Slovenia, pp. 278-287 (1999)
-
(1999)
Proceedings 16th International Conference on Machine Learning (ICML 1999)
, pp. 278-287
-
-
Ng, A.Y.1
Harada, D.2
Russell, S.3
-
59
-
-
0141819580
-
PEGASUS: A policy search method for large MDPs and POMDPs
-
Palo Alto, US
-
Ng, A.Y., Jordan, M.I.: PEGASUS: A policy search method for large MDPs and POMDPs. In: Proceedings 16th Conference in Uncertainty in Artificial Intelligence (UAI 2000), Palo Alto, US, pp. 406-415 (2000)
-
(2000)
Proceedings 16th Conference in Uncertainty in Artificial Intelligence (UAI 2000)
, pp. 406-415
-
-
Ng, A.Y.1
Jordan, M.I.2
-
60
-
-
0036832956
-
Kernel-based reinforcement learning
-
Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49(2-3), 161-178 (2002)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 161-178
-
-
Ormoneit, D.1
Sen, S.2
-
61
-
-
21344469989
-
Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots
-
Emergent Neural Computational Architectures Based on Neuroscience. Wermter, S., Austin, J.,Willshaw, D.J. (eds.). Springer, Heidelberg
-
Pérez-Uribe, A.: Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots. In:Wermter, S., Austin, J.,Willshaw, D.J. (eds.) Emergent Neural Computational Architectures Based on Neuroscience. LNCS (LNAI), vol.2036, pp. 522-533. Springer, Heidelberg (2001)
-
(2001)
LNCS (LNAI)
, vol.2036
, pp. 522-533
-
-
Pérez-Uribe, A.1
-
62
-
-
40649106649
-
Natural actor-critic
-
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7-9), 1180-1190 (2008)
-
(2008)
Neurocomputing
, vol.71
, Issue.7-9
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
63
-
-
33750724397
-
Point-based value iteration for continuous POMDPs
-
Porta, J.M., Vlassis, N., Spaan, M.T., Poupart, P.: Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research 7, 2329-2367 (2006)
-
(2006)
Journal of Machine Learning Research
, vol.7
, pp. 2329-2367
-
-
Porta, J.M.1
Vlassis, N.2
Spaan, M.T.3
Poupart, P.4
-
65
-
-
22944448066
-
Sparse distributed memories for on-line value-based reinforcement learning
-
Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.). Springer, Heidelberg
-
Ratitch, B., Precup, D.: Sparse distributed memories for on-line value-based reinforcement learning. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol.3201, pp. 347-358. Springer, Heidelberg (2004)
-
(2004)
ECML 2004. LNCS (LNAI)
, vol.3201
, pp. 347-358
-
-
Ratitch, B.1
Precup, D.2
-
66
-
-
0242536865
-
Adaptive resolution model-free reinforcement learning: Decision boundary partitioning
-
Stanford University, US
-
Reynolds, S.I.: Adaptive resolution model-free reinforcement learning: Decision boundary partitioning. In: Proceedings 17th International Conference on Machine Learning (ICML 2000), Stanford University, US, pp. 783-790 (2000)
-
(2000)
Proceedings 17th International Conference on Machine Learning (ICML 2000)
, pp. 783-790
-
-
Reynolds, S.I.1
-
67
-
-
33646398129
-
Neural fitted Q-iteration - First experiences with a data efficient neural reinforcement learning method
-
Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.). Springer, Heidelberg
-
Riedmiller, M.: Neural fitted Q-iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol.3720, pp. 317-328. Springer, Heidelberg (2005)
-
(2005)
ECML 2005. LNCS (LNAI)
, vol.3720
, pp. 317-328
-
-
Riedmiller, M.1
-
68
-
-
34548763245
-
Evaluation of policy gradient methods and variants on the cart-pole benchmark
-
Honolulu, US
-
Riedmiller, M., Peters, J., Schaal, S.: Evaluation of policy gradient methods and variants on the cart-pole benchmark. In: Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007), Honolulu, US, pp. 254-261 (2007)
-
(2007)
Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007)
, pp. 254-261
-
-
Riedmiller, M.1
Peters, J.2
Schaal, S.3
-
69
-
-
0003636089
-
On-line Q-learning using connectionist systems
-
Engineering Department, Cambridge University, UK
-
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Tech. Rep. CUED/F-INFENG/TR166, Engineering Department, Cambridge University, UK (1994)
-
(1994)
Tech. Rep. CUED/F-INFENG/TR166
-
-
Rummery, G.A.1
Niranjan, M.2
-
70
-
-
0008872081
-
Analysis of a numerical dynamic programming algorithm applied to economic models
-
Santos, M.S., Vigo-Aguiar, J.: Analysis of a numerical dynamic programming algorithm applied to economic models. Econometrica 66(2), 409-426 (1998)
-
(1998)
Econometrica
, vol.66
, Issue.2
, pp. 409-426
-
-
Santos, M.S.1
Vigo-Aguiar, J.2
-
71
-
-
85153965130
-
Reinforcement learning with soft state aggregation
-
Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.)
-
Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol.7, pp. 361-368 (1995)
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 361-368
-
-
Singh, S.P.1
Jaakkola, T.2
Jordan, M.I.3
-
72
-
-
33847202724
-
Learning to predict by the method of temporal differences
-
Sutton, R.S.: Learning to predict by the method of temporal differences. Machine Learning 3, 9-44 (1988)
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
73
-
-
85132026293
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
-
Austin, US
-
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings 7th International Conference on Machine Learning (ICML 1990), Austin, US, pp. 216-224 (1990)
-
(1990)
Proceedings 7th International Conference on Machine Learning (ICML 1990)
, pp. 216-224
-
-
Sutton, R.S.1
-
75
-
-
0026852362
-
Reinforcement learning is adaptive optimal control
-
Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is adaptive optimal control. IEEE Control Systems Magazine 12(2), 19-22 (1992)
-
(1992)
IEEE Control Systems Magazine
, vol.12
, Issue.2
, pp. 19-22
-
-
Sutton, R.S.1
Barto, A.G.2
Williams, R.J.3
-
76
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Solla, S.A., Leen, T.K., Müller, K.R. (eds.). MIT Press, Cambridge
-
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol.12, pp. 1057-1063. MIT Press, Cambridge (2000)
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.P.3
Mansour, Y.4
-
77
-
-
14344263882
-
Interpolation-based Q-learning
-
Bannf, Canada
-
Szepesvári, C., Smart,W.D.: Interpolation-based Q-learning. In: Proceedings 21st International Conference on Machine Learning (ICML 2004), Bannf, Canada, pp. 791-798 (2004)
-
(2004)
Proceedings 21st International Conference on Machine Learning (ICML 2004)
, pp. 791-798
-
-
Szepesvári, C.1
Smart, W.D.2
-
78
-
-
0031540855
-
On the convergence of pattern search algorithms
-
Torczon, V.: On the convergence of pattern search algorithms. SIAM Journal on Optimization 7(1), 1-25 (1997)
-
(1997)
SIAM Journal on Optimization
, vol.7
, Issue.1
, pp. 1-25
-
-
Torczon, V.1
-
79
-
-
0031341345
-
Neural reinforcement learning for behaviour synthesis
-
Touzet, C.F.: Neural reinforcement learning for behaviour synthesis. Robotics and Autonomous Systems 22(3-4), 251-281 (1997)
-
(1997)
Robotics and Autonomous Systems
, vol.22
, Issue.3-4
, pp. 251-281
-
-
Touzet, C.F.1
-
80
-
-
0029752470
-
Feature-based methods for large scale dynamic programming
-
Tsitsiklis, J.N., Van Roy, B.: Feature-based methods for large scale dynamic programming. Machine Learning 22(1-3), 59-94 (1996)
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
81
-
-
0031143730
-
An analysis of temporal difference learning with function approximation
-
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control 42(5), 674-690 (1997)
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
82
-
-
0031636218
-
Tree based discretization for continuous state space reinforcement learning
-
Madison, US
-
Uther, W.T.B., Veloso, M.M.: Tree based discretization for continuous state space reinforcement learning. In: Proceedings 15th National Conference on Artificial Intelligence and 10th Innovative Applications of Artificial Intelligence Conference (AAAI 1998/IAAI 1998), Madison, US, pp. 769-774 (1998)
-
(1998)
Proceedings 15th National Conference on Artificial Intelligence and 10th Innovative Applications of Artificial Intelligence Conference (AAAI 1998/IAAI 1998)
, pp. 769-774
-
-
Uther, W.T.B.1
Veloso, M.M.2
-
83
-
-
58349110975
-
Adaptive optimal control for continuous-time linear systems based on policy iteration
-
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477-484 (2009)
-
(2009)
Automatica
, vol.45
, Issue.2
, pp. 477-484
-
-
Vrabie, D.1
Pastravanu, O.2
Abu-Khalaf, M.3
Lewis, F.4
-
85
-
-
47149095559
-
Value approximation with least squares support vector machine in reinforcement learning system
-
Wang, X., Tian, X., Cheng, Y.: Value approximation with least squares support vector machine in reinforcement learning system. Journal of Computational and Theoretical Nanoscience 4(7-8), 1290-1294 (2007)
-
(2007)
Journal of Computational and Theoretical Nanoscience
, vol.4
, Issue.7-8
, pp. 1290-1294
-
-
Wang, X.1
Tian, X.2
Cheng, Y.3
-
88
-
-
22944460232
-
Convergence and divergence in standard and averaging reinforcement learning
-
Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.). Springer, Heidelberg
-
Wiering, M.: Convergence and divergence in standard and averaging reinforcement learning. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol.3201, pp. 477-488. Springer, Heidelberg (2004)
-
(2004)
ECML 2004. LNCS (LNAI)
, vol.3201
, pp. 477-488
-
-
Wiering, M.1
-
89
-
-
0347348609
-
Tight performance bounds on greedy policies based on imperfect value functions
-
New Haven, US
-
Williams, R.J., Baird, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. In: Proceedings 8th Yale Workshop on Adaptive and Learning Systems, New Haven, US, pp. 108-113 (1994)
-
(1994)
Proceedings 8th Yale Workshop on Adaptive and Learning Systems
, pp. 108-113
-
-
Williams, R.J.1
Baird, L.C.2
-
90
-
-
34547098844
-
Kernel-based least-squares policy iteration for reinforcement learning
-
Xu, X., Hu, D., Lu, X.: Kernel-based least-squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4), 973-992 (2007)
-
(2007)
IEEE Transactions on Neural Networks
, vol.18
, Issue.4
, pp. 973-992
-
-
Xu, X.1
Hu, D.2
Lu, X.3
-
91
-
-
34547991475
-
Convergence results for some temporal difference methods based on least-squares
-
Massachusetts Institute of Technology, Cambridge, US
-
Yu, H., Bertsekas, D.P.: Convergence results for some temporal difference methods based on least-squares. Tech. Rep. LIDS 2697, Massachusetts Institute of Technology, Cambridge, US (2006)
-
(2006)
Tech. Rep. LIDS 2697
-
-
Yu, H.1
Bertsekas, D.P.2
|