-
1
-
-
84898939480
-
Policy-gradient methods for reinforcement learning with function approximation
-
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy-gradient methods for reinforcement learning with function approximation," Adv. Neural Inf. Process. Syst., vol. 12, no. 22, pp. 1057-1063, 2000.
-
(2000)
Adv. Neural Inf. Process. Syst.
, vol.12
, Issue.22
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
2
-
-
0032187591
-
Smoothing trajectory tracking of three-link robot: A self-organizing CMAC approach
-
Oct.
-
K. S. Hwang and C. S. Lin, "Smoothing trajectory tracking of three-link robot: A self-organizing CMAC approach," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 28, no. 5, pp. 680-692, Oct. 1998.
-
(1998)
IEEE Trans. Syst., Man, Cybern.-Part B, Cybern.
, vol.28
, Issue.5
, pp. 680-692
-
-
Hwang, K.S.1
Lin, C.S.2
-
3
-
-
0004102479
-
-
Cambridge, Cambridge, MA, USA: MIT Press
-
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, Cambridge, MA, USA: MIT Press, 1998.
-
(1998)
Reinforcement Learning: An Introduction
-
-
Sutton, R.S.1
Barto, A.G.2
-
4
-
-
49049105169
-
Ensemble algorithms in reinforcement learning
-
Aug.
-
M. A. Wiering and H. V. Hasselt, "Ensemble algorithms in reinforcement learning," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 38, no. 4, pp. 930-936, Aug. 2008.
-
(2008)
IEEE Trans. Syst., Man, Cybern.-Part B, Cybern.
, vol.38
, Issue.4
, pp. 930-936
-
-
Wiering, M.A.1
Hasselt, H.V.2
-
5
-
-
49049094852
-
Higher level application of adp: A next phase for the control field?
-
Aug.
-
G. G. Lendaris, "Higher level application of adp: A next phase for the control field?" IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 38, no. 4, pp. 901-912, Aug. 2008.
-
(2008)
IEEE Trans. Syst., Man, Cybern.-Part B, Cybern.
, vol.38
, Issue.4
, pp. 901-912
-
-
Lendaris, G.G.1
-
6
-
-
49049104480
-
Quantum reinforcement learning
-
Oct.
-
D. Dong, C. Chen, H. Li, and T. Tarn, "Quantum reinforcement learning," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 38, no. 5, pp. 1207-1220, Oct. 2008.
-
(2008)
IEEE Trans. Syst., Man, Cybern.-Part B, Cybern.
, vol.38
, Issue.5
, pp. 1207-1220
-
-
Dong, D.1
Chen, C.2
Li, H.3
Tarn, T.4
-
7
-
-
49049087720
-
Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators
-
Aug.
-
B. Baddeley, "Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 38, no. 4, pp. 950-956, Aug. 2008.
-
(2008)
IEEE Trans. Syst., Man, Cybern.-Part B, Cybern.
, vol.38
, Issue.4
, pp. 950-956
-
-
Baddeley, B.1
-
8
-
-
84876914496
-
Neural-Fitted TD-Leaf Learning for Playing Othello with Structured Neural Networks
-
Nov.
-
S. Dries and M. A. Wiering, "Neural-Fitted TD-Leaf Learning for Playing Othello with Structured Neural Networks," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 11, pp. 1701-1713, Nov. 2012.
-
(2012)
IEEE Trans. Neural Netw. Learn. Syst.
, vol.23
, Issue.11
, pp. 1701-1713
-
-
Dries, S.1
Wiering, M.A.2
-
9
-
-
84876909440
-
Neural network based Online simultaneous policy update algorithm for solving the HJI equation in nonlinear H∞ Control
-
Dec.
-
H.-N. Wu and B. Luo, "Neural network based Online simultaneous policy update algorithm for solving the HJI equation in nonlinear H∞ Control," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 12, pp. 1884-1895, Dec. 2012.
-
(2012)
IEEE Trans. Neural Netw. Learn. Syst.
, vol.23
, Issue.12
, pp. 1884-1895
-
-
Wu, H.-N.1
Luo, B.2
-
10
-
-
0028574683
-
Reinforcement learning algorithms for average-payoff markovian decision processes
-
S. P. Singh, "Reinforcement learning algorithms for average-payoff markovian decision processes," in Proc. 12th Amer. Assoc. Artif. Intell., 1994, pp. 700-705.
-
Proc. 12th Amer. Assoc. Artif. Intell., 1994
, pp. 700-705
-
-
Singh, S.P.1
-
11
-
-
17444428905
-
Second-order training of adaptive critics for Online process control
-
Apr.
-
J. J. Govindhasamy, S. F. McLoone, and G. W. Irwin, "Second-order training of adaptive critics for Online process control," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 35, no. 2, pp. 381-385, Apr. 2005.
-
(2005)
IEEE Trans. Syst., Man, Cybern.-Part B, Cybern.
, vol.35
, Issue.2
, pp. 381-385
-
-
Govindhasamy, J.J.1
McLoone, S.F.2
Irwin, G.W.3
-
12
-
-
48249156672
-
Epoch-incremental queue-dyna algorithm
-
R. Zajdel, "Epoch-incremental queue-dyna algorithm," in Proc. Lect. Notes Artif. Intell., pp. 1160-1170, 2008.
-
(2008)
Proc. Lect. Notes Artif. Intell.
, pp. 1160-1170
-
-
Zajdel, R.1
-
13
-
-
0012929784
-
Dyna, an integrated architecture for learning, planning, and reacting
-
Aug.
-
R. Sutton, "Dyna, an integrated architecture for learning, planning, and reacting," Special Interest Group Artif. intell. Bulletin, vol. 2, no. 4, pp. 160-163, Aug. 1991.
-
(1991)
Special Interest Group Artif. Intell. Bulletin
, vol.2
, Issue.4
, pp. 160-163
-
-
Sutton, R.1
-
14
-
-
0036832957
-
On average versus discounted reward temporal-difference learning
-
DOI 10.1023/A:1017980312899
-
J. N. Tsitsiklis and B. Van Roy, "On average versus discounted reward temporal-difference learning," Mach. Learn., vol. 49, no. 2, pp. 179-191, 2002. (Pubitemid 34325685)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 179-191
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
15
-
-
0033722074
-
Behavioral considerations suggest an average reward TD model of the dopamine system
-
N. D. Daw and D. S. Touretzky, "Behavioral considerations suggest an average reward TD model of the dopamine system," Neurocomputing, pp. 679-684, 2000.
-
(2000)
Neurocomputing
, pp. 679-684
-
-
Daw, N.D.1
Touretzky, D.S.2
-
16
-
-
0025600638
-
A stochastic reinforcement learning algorithm for learning real-valued functions
-
V. Gullapalli, "A stochastic reinforcement learning algorithm for learning real-valued functions," Neural Netw., vol. 3, no. 6, pp. 671-692, 1990.
-
(1990)
Neural Netw.
, vol.3
, Issue.6
, pp. 671-692
-
-
Gullapalli, V.1
-
17
-
-
0033878670
-
Neural network-based model reference adaptive control system
-
DOI 10.1109/3477.826961
-
H. D. Patino and D. Liu, "Neural Network-Based Model Reference Adaptive Control System," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 30, no. 1, pp. 198-204, Feb. 2000. (Pubitemid 30588328)
-
(2000)
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
, vol.30
, Issue.1
, pp. 198-204
-
-
Patino, H.D.1
Liu, D.2
-
18
-
-
0000756319
-
Optimum settings for automatic controllers
-
J. G. Ziegler and N. B. Nichols, "Optimum settings for automatic controllers," Trans. of the ASME, vol. 64, no. 11, pp. 759-768, 1942.
-
(1942)
Trans. of the ASME
, vol.64
, Issue.11
, pp. 759-768
-
-
Ziegler, J.G.1
Nichols, N.B.2
|