-
2
-
-
66449130966
-
Adaptive dynamic programming: An introduction
-
May
-
F. Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: An introduction," IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39-47, May 2009.
-
(2009)
IEEE Comput. Intell. Mag.
, vol.4
, Issue.2
, pp. 39-47
-
-
Wang, F.Y.1
Zhang, H.2
Liu, D.3
-
3
-
-
0000985504
-
TD-gammon a self-teaching backgammon program achieves master-level play
-
Mar.
-
G. Tesauro, "TD-gammon a self-teaching backgammon program achieves master-level play," Neural Comput., vol. 6, no. 2, pp. 215-219, Mar. 1994.
-
(1994)
Neural Comput.
, vol.6
, Issue.2
, pp. 215-219
-
-
Tesauro, G.1
-
4
-
-
84918834208
-
A reinforcement learning approach to job-shop scheduling
-
San Francisco, C.A.
-
W. Zhang and T. Dietterich, "A reinforcement learning approach to job-shop scheduling," in Proc. 14th Int. Joint Conf. Art. Intell., San Francisco, C.A., 1995, pp. 1114-1120.
-
(1995)
Proc. 14th Int. Joint Conf. Art. Intell
, pp. 1114-1120
-
-
Zhang, W.1
Dietterich, T.2
-
5
-
-
0032208335
-
Elevator Group Control Using Multiple Reinforcement Learning Agents
-
R. H. Crites and A. G. Barto, "Elevator group control using multiple reinforcement learning agents," Mach. Learn., vol. 33, nos. 2-3, pp. 235-262, 1998. (Pubitemid 128522644)
-
(1998)
Machine Learning
, vol.12
, Issue.4
, pp. 235-262
-
-
Crites, R.H.1
Barto, A.G.2
-
6
-
-
84898980684
-
Autonomous helicopter flight via reinforcement learning
-
Cambridge, MA: MIT Press
-
A. Y. Ng, H. J. Kim, M. Jordan, and S. Sastry, "Autonomous helicopter flight via reinforcement learning," in Advances in Neural Information Processing Systems 16. Cambridge, MA: MIT Press, 2004.
-
(2004)
Advances in Neural Information Processing Systems 16
-
-
Ng, A.Y.1
Kim, H.J.2
Jordan, M.3
Sastry, S.4
-
7
-
-
85012688561
-
-
Princeton N.J: Princeton Univ. Press
-
R. E. Bellman, Dynamic Programming. Princeton, N.J: Princeton Univ. Press, 1957.
-
(1957)
Dynamic Programming
-
-
Bellman, R.E.1
-
8
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
Cambridge, MA: MIT Press
-
R. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press, 1996, pp. 1038-1044.
-
(1996)
Advances in Neural Information Processing Systems 8
, pp. 1038-1044
-
-
Sutton, R.1
-
9
-
-
0036911781
-
Residual-gradient-based neural reinforcement learning for the optimal control of an acrobat
-
Vancouver, Canada
-
X. Xu and H. G. He, "Residual-gradient-based neural reinforcement learning for the optimal control of an acrobat," in Proc. IEEE Int. Symp. Intell. Control, Vancouver, Canada, 2002, pp. 758-763.
-
(2002)
Proc.IEEE Int. Symp. Intell. Control
, pp. 758-763
-
-
Xu, X.1
He, H.G.2
-
10
-
-
0013535965
-
Infinite-horizon policy-gradient estimation
-
Jul.
-
J. Baxter and P. L. Bartlett, "Infinite-horizon policy-gradient estimation," J. Art. Intell. Res., vol. 15, no. 1, pp. 319-350, Jul. 2001.
-
(2001)
J. Art. Intell. Res.
, vol.15
, Issue.1
, pp. 319-350
-
-
Baxter, J.1
Bartlett, P.L.2
-
12
-
-
0041345290
-
Efficient reinforcement learning using recursive least-squares methods
-
X. Xu, H. G. He, and D. W. Hu, "Efficient reinforcement learning using recursive least-squares methods," J. Art. Intell. Res., vol. 16, no. 1, pp. 259-292, Jan. 2002. (Pubitemid 43057174)
-
(2002)
Journal of Artificial Intelligence Research
, vol.16
, pp. 259-292
-
-
Xu, X.1
He, H.-G.2
Hu, D.3
-
13
-
-
79960115021
-
Adaptive learning and control for MIMO system based on adaptive dynamic programming
-
Jul.
-
J. Fu, H. He, and X. Zhou, "Adaptive learning and control for MIMO system based on adaptive dynamic programming," IEEE Trans. Neural Netw., vol. 22, no. 7, pp. 1133-1148, Jul. 2011.
-
(2011)
IEEE Trans. Neural Netw.
, vol.22
, Issue.7
, pp. 1133-1148
-
-
Fu, J.1
He, H.2
Zhou, X.3
-
14
-
-
78651311269
-
ε-error bound
-
Jan.
-
ε-error bound," IEEE Trans. Neural Netw., vol. 22, no. 1, pp. 24-36, Jan. 2011.
-
(2011)
IEEE Trans. Neural Netw.
, vol.22
, Issue.1
, pp. 24-36
-
-
Wang, F.Y.1
Jin, N.2
Liu, D.R.3
Wei, Q.L.4
-
15
-
-
70349253929
-
Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints
-
Sep.
-
H. G. Zhang, Y. H. Luo, and D. R. Liu, "Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints," IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1490-1503, Sep. 2009.
-
(2009)
IEEE Trans. Neural Netw.
, vol.20
, Issue.9
, pp. 1490-1503
-
-
Zhang, H.G.1
Luo, Y.H.2
Liu, D.R.3
-
16
-
-
4644323293
-
Least-squares policy iteration
-
Dec.
-
M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Mach. Learn. Res., vol. 4, pp. 1107-1149, Dec. 2003.
-
(2003)
J. Mach. Learn. Res.
, vol.4
, pp. 1107-1149
-
-
Lagoudakis, M.G.1
Parr, R.2
-
17
-
-
34547098844
-
Kernel-based least squares policy iteration for reinforcement learning
-
DOI 10.1109/TNN.2007.899161, Neural Networks for Feedback Control Systems
-
X. Xu, D. W. Hu, and X. C. Lu, "Kernel based least-squares policy iteration for reinforcement learning," IEEE Trans. Neural Netw., vol. 18, no. 4, pp. 973-992, Jul. 2007. (Pubitemid 47098876)
-
(2007)
IEEE Transactions on Neural Networks
, vol.18
, Issue.4
, pp. 973-992
-
-
Xu, X.1
Hu, D.2
Lu, X.3
-
18
-
-
0141988716
-
Recent advances in hierarchical reinforcement learning
-
Jan.-Apr.
-
A. G. Barto and S. Mahadevan, "Recent advances in hierarchical reinforcement learning," Discrete Event Dynamic Syst.-Theory Applicat., vol. 13, nos. 1-2, pp. 41-77, Jan.-Apr. 2003.
-
(2003)
Discrete Event Dynamic Syst.-Theory Applicat.
, vol.13
, Issue.1-2
, pp. 41-77
-
-
Barto, A.G.1
Mahadevan, S.2
-
20
-
-
0002278788
-
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
-
T. G. Dietterich, "Hierarchical reinforcement learning with the Max-Q value function decomposition," J. Art. Intell. Res., vol. 13, no. 1, pp. 227-303, Aug. 2000. (Pubitemid 33682087)
-
(2000)
Journal of Artificial Intelligence Research
, vol.13
, pp. 227-303
-
-
Dietterich, T.G.1
-
21
-
-
0033170372
-
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
-
DOI 10.1016/S0004-3702(99)00052-1
-
R. Sutton, D. Precup, and S. Singh, "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning," Art. Intell., vol. 112, nos. 1-2, pp. 181-211, Aug. 1999. (Pubitemid 32079890)
-
(1999)
Artificial Intelligence
, vol.112
, Issue.1
, pp. 181-211
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.3
-
22
-
-
0003506152
-
State abstraction in MAXQ hierarchical reinforcement learning
-
T. G. Dietterich, "State abstraction in MAXQ hierarchical reinforcement learning," in Proc. Adv. Neural Inf. Process. Syst., 2000, pp. 994-1000.
-
(2000)
Proc. Adv. Neural Inf. Process. Syst.
, pp. 994-1000
-
-
Dietterich, T.G.1
-
23
-
-
0036927201
-
State abstraction for programmable reinforcement learning agents
-
D. Andre and S. J. Russell, "State abstraction for programmable reinforcement learning agents," in Proc. 18th Nat. Conf. Art. Intell., 2002, pp. 119-125.
-
(2002)
Proc. 18th Nat. Conf. Art. Intell.
, pp. 119-125
-
-
Andre, D.1
Russell, S.J.2
-
24
-
-
38349050495
-
Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning
-
B. Hengst, "Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning," in Proc. AI: Adv. Art. Intell. Lecture Notes Comput. Sci., 2007, pp. 58-67.
-
(2007)
Proc. AI: Adv. Art. Intell. Lecture Notes Comput. Sci.
, pp. 58-67
-
-
Hengst, B.1
-
27
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
DOI 10.1023/A:1017936530646
-
J. Boyan, "Technical update: Least-squares temporal difference learning," Mach. Learn., vol. 49, nos. 2-3, pp. 233-246, 2002. (Pubitemid 34325688)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 233-246
-
-
Boyan, J.A.1
-
29
-
-
3543096272
-
The kernel recursive least-squares algorithm
-
Aug.
-
Y. Engel, S. Mannor, and R. Meir, "The kernel recursive least-squares algorithm," IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2275-2285, Aug. 2004.
-
(2004)
IEEE Trans. Signal Process.
, vol.52
, Issue.8
, pp. 2275-2285
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
31
-
-
58449114139
-
Algorithms and bounds for rollout sampling approximate policy iteration
-
Villeneuve d'Ascq, France, LNAI 5323. Jun.-Jul.
-
C. Dimitrakakis and M. G. Lagoudakis, "Algorithms and bounds for rollout sampling approximate policy iteration," in Proc. 8th Eur. Workshop, Recent Adv. Reinforce. Learn., Villeneuve d'Ascq, France, LNAI 5323. Jun.-Jul. 2008, pp. 27-40.
-
(2008)
Proc. 8th Eur. Workshop, Recent Adv. Reinforce. Learn
, pp. 27-40
-
-
Dimitrakakis, C.1
Lagoudakis, M.G.2
-
32
-
-
44649189852
-
Finite-time bounds for fitted valueiteration
-
May
-
R. Munos and C. Szepesvári, "Finite-time bounds for fitted value iteration," J. Mach. Learn. Res., vol. 9, pp. 815-857, May 2008.
-
(2008)
J. Mach. Learn. Res.
, vol.9
, pp. 815-857
-
-
Munos, R.1
Szepesvári, C.2
-
33
-
-
77955513754
-
Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices
-
Aug.
-
B. H. Li and J. Si, "Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices," IEEE Trans. Neural Netw., vol. 21, no. 8, pp. 1270-1280, Aug. 2010.
-
(2010)
IEEE Trans. Neural Netw.
, vol.21
, Issue.8
, pp. 1270-1280
-
-
Li, B.H.1
Si, J.2
-
34
-
-
77955509816
-
Backpropagation and ordered derivatives in the time scales calculus
-
Aug.
-
J. Seiffertt and D. C. Wunsch, "Backpropagation and ordered derivatives in the time scales calculus," IEEE Trans. Neural Netw., vol. 21, no. 8, pp. 1262-1269, Aug. 2010.
-
(2010)
IEEE Trans. Neural Netw.
, vol.21
, Issue.8
, pp. 1262-1269
-
-
Seiffertt, J.1
Wunsch, D.C.2
-
36
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
PII S0018928697034375
-
J. N. Tsitsiklis and B. V. Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Automat. Control, vol. 42, no. 5, pp. 674-690, May 1997. (Pubitemid 127760263)
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
37
-
-
40849145988
-
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
-
A. Antos, C. Szepesvari, and R. Munos, "Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path," Mach. Learn., vol. 71, no. 1, pp. 89-129, 2008.
-
(2008)
Mach. Learn.
, vol.71
, Issue.1
, pp. 89-129
-
-
Antos, A.1
Szepesvari, C.2
Munos, R.3
-
39
-
-
0003932121
-
Reinforcement Learning with Selective Perception and Hidden State
-
Ph.D. thesis Rochester, NY
-
A. McCallum, "Reinforcement Learning with Selective Perception and Hidden State," Ph.D. thesis, Comput. Sci. Dept., Univ. Rochester., Rochester, NY, 1995.
-
(1995)
Comput. Sci. Dept., Univ. Rochester
-
-
McCallum, A.1
-
41
-
-
21844465127
-
Tree-based batch mode reinforcement learning
-
Apr.
-
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learn. Res., vol. 6, pp. 503-556, Apr. 2005.
-
(2005)
J. Mach. Learn. Res.
, vol.6
, pp. 503-556
-
-
Ernst, D.1
Geurts, P.2
Wehenkel, L.3
-
42
-
-
83855163944
-
Automated state abstraction for options using the U-tree algorithm
-
Cambridge, MA: MIT Press
-
A. Jonsson and A. G. Barto, "Automated state abstraction for options using the U-tree algorithm," in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2000.
-
(2000)
Advances in Neural Information Processing Systems
-
-
Jonsson, A.1
Barto, A.G.2
-
43
-
-
49049105169
-
Ensemble algorithms in reinforcement learning
-
Aug.
-
M. A. Wiering and H. V. Hasselt, "Ensemble algorithms in reinforcement learning," IEEE Trans. Syst. Man Cybern. Part B: Cybern., vol. 38, no. 4, pp. 930-936, Aug. 2008.
-
(2008)
IEEE Trans. Syst. Man Cybern. Part B: Cybern.
, vol.38
, Issue.4
, pp. 930-936
-
-
Wiering, M.A.1
Hasselt, H.V.2
-
44
-
-
79952394156
-
Ensembles of neural networks for robust reinforcement learning
-
Washington D.C.
-
H. Alexander and U. Steffen, "Ensembles of neural networks for robust reinforcement learning," in Proc. 19th Int. Conf. Mach. Learn. Applicat., Washington D.C., 2010, pp. 401-406.
-
(2010)
Proc. 19th Int. Conf. Mach. Learn. Applicat
, pp. 401-406
-
-
Alexander, H.1
Steffen, U.2
-
46
-
-
84899834143
-
Online exploration in least-squares policy iteration
-
Budapest, Hungary May
-
L. H. Li, M. L. Littman, and C. R. Mansley, "Online exploration in least-squares policy iteration," in Proc. AAMAS-09 8th Int. Conf. Autonomous Agents Multiagent Syst., Budapest, Hungary, May 2009, pp. 733-739.
-
(2009)
Proc. AAMAS-09 8th Int. Conf. Autonomous Agents Multiagent Syst
, pp. 733-739
-
-
Li, L.H.1
Littman, M.L.2
Mansley, C.R.3
|