-
1
-
-
0002201501
-
Learning and sequential decision making
-
Cambridge, MA, USA: MIT Press
-
A. G. Barto, R. S. Sutton, and C. J. C. H. Watkins, "Learning and sequential decision making," in Learning Computational Neuroscience. Cambridge, MA, USA: MIT Press, 1989, pp. 539-602.
-
(1989)
Learning Computational Neuroscience
, pp. 539-602
-
-
Barto, A.G.1
Sutton, R.S.2
Watkins, C.J.C.H.3
-
3
-
-
0003787146
-
-
Princeton NJ USA: Princeton Univ. Press
-
R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton Univ. Press, 1957.
-
(1957)
Dynamic Programming
-
-
Bellman, R.1
-
4
-
-
0004163205
-
-
New York, NY, USA: Wiley
-
F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. New York, NY, USA: Wiley, 2012.
-
(2012)
Optimal Control
-
-
Lewis, F.L.1
Vrabie, D.2
Syrmos, V.L.3
-
5
-
-
0003565783
-
-
Belmont, MA, USA: Athena Scientific
-
D. P. Bertsekas, Dynamic Programming and Optimal Control, vol. 1, Belmont, MA, USA: Athena Scientific, 1995.
-
(1995)
Dynamic Programming and Optimal Control
, vol.1
-
-
Bertsekas, D.P.1
-
6
-
-
0006488248
-
Robust reinforcement learning in motion planning
-
S. Singh, A. Barto, R. Grupen, and C. Connolly, "Robust reinforcement learning in motion planning," in Proc. Adv. Neural Inf. Process. Syst., 1994, pp. 655-662.
-
(1994)
Proc. Adv. Neural Inf. Process. Syst.
, pp. 655-662
-
-
Singh, S.1
Barto, A.2
Grupen, R.3
Connolly, C.4
-
7
-
-
85132026293
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
-
R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," in Proc. 7th Int. Conf. Mach. Learn., 1990, pp. 216-224.
-
(1990)
Proc. 7th Int. Conf. Mach. Learn.
, pp. 216-224
-
-
Sutton, R.S.1
-
8
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
-
(1988)
Mach. Learn.
, vol.3
, Issue.1
, pp. 9-44
-
-
Sutton, R.1
-
9
-
-
0003617454
-
-
Ph.D. dissertation, Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Jan.
-
R. Sutton, Temporal credit assignment in reinforcement learning, Ph.D. dissertation, Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Jan. 1984.
-
(1984)
Temporal Credit Assignment in Reinforcement Learning
-
-
Sutton, R.1
-
10
-
-
84876119970
-
Reinforcement learning and approximate dynamic programming (RLADP)-foundations, common misconceptions and challenges ahead
-
New York, USA: Wiley
-
P. Werbos, "Reinforcement learning and approximate dynamic programming (RLADP) - foundations, common misconceptions and challenges ahead," in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. New York, USA: Wiley, 2013, pp. 3-30.
-
(2013)
Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
, pp. 3-30
-
-
Werbos, P.1
-
11
-
-
49049091767
-
ADP: The key direction for future research in intelligent control and understanding brain intelligence
-
Aug
-
P. J. Werbos, "ADP: The key direction for future research in intelligent control and understanding brain intelligence," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 898-900, Aug. 2008.
-
(2008)
IEEE Trans. Syst., Man, Cybern. B, Cybern.
, vol.38
, Issue.4
, pp. 898-900
-
-
Werbos, P.J.1
-
12
-
-
67349247013
-
Intelligence in the brain: A theory of how it works and how to build it
-
P. J. Werbos, "Intelligence in the brain: A theory of how it works and how to build it," Neural Netw., vol. 22, no. 3, pp. 200-212, 2009.
-
(2009)
Neural Netw.
, vol.22
, Issue.3
, pp. 200-212
-
-
Werbos, P.J.1
-
14
-
-
0025229247
-
Consistency of HDP applied to a simple reinforcement learning problem
-
P. J. Werbos, "Consistency of HDP applied to a simple reinforcement learning problem," Neural Netw., vol. 3, no. 2, pp. 179-189, 1990.
-
(1990)
Neural Netw.
, vol.3
, Issue.2
, pp. 179-189
-
-
Werbos, P.J.1
-
15
-
-
0035273403
-
On-line learning control by association and reinforcement
-
DOI 10.1109/72.914523, PII S1045922701014047
-
J. Si and Y.-T. Wang, "Online learning control by association and reinforcement," IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264-276, Mar. 2001. (Pubitemid 32371483)
-
(2001)
IEEE Transactions on Neural Networks
, vol.12
, Issue.2
, pp. 264-276
-
-
Si, J.1
Wang, Y.-T.2
-
16
-
-
0043026775
-
Helicopter trimming and tracking control using direct neural dynamic programming
-
Jul
-
R. Enns and J. Si, "Helicopter trimming and tracking control using direct neural dynamic programming," IEEE Trans. Neural Netw., vol. 14, no. 4, pp. 929-939, Jul. 2003.
-
(2003)
IEEE Trans. Neural Netw.
, vol.14
, Issue.4
, pp. 929-939
-
-
Enns, R.1
Si, J.2
-
17
-
-
70349615619
-
Direct heuristic dynamic programming for nonlinear tracking conrol with filtered tracking error
-
Dec
-
L. Yang, J. Si, K. S. Tsakalis, and A. A. Rodriguez, "Direct heuristic dynamic programming for nonlinear tracking conrol with filtered tracking error," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39, no. 6, pp. 1617-1622, Dec. 2009.
-
(2009)
IEEE Trans. Syst., Man, Cybern. B, Cybern.
, vol.39
, Issue.6
, pp. 1617-1622
-
-
Yang, L.1
Si, J.2
Tsakalis, K.S.3
Rodriguez, A.A.4
-
18
-
-
84881555023
-
Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems
-
Apr
-
D. Liu and Q. Wei, "Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems," IEEE Trans. Cybern., vol. 43, no. 2, pp. 779-789, Apr. 2013.
-
(2013)
IEEE Trans. Cybern.
, vol.43
, Issue.2
, pp. 779-789
-
-
Liu, D.1
Wei, Q.2
-
19
-
-
0031236002
-
Adaptive critic designs
-
PII S1045922797052430
-
D. Prokhorov and D. Wunsch, "Adaptive critic designs," IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997-1007, Sep. 1997. (Pubitemid 127763331)
-
(1997)
IEEE Transactions on Neural Networks
, vol.8
, Issue.5
, pp. 997-1007
-
-
Prokhorov, D.V.1
Wunsch II, D.C.2
-
20
-
-
0003448868
-
-
Ph.D. Dissertation, Dept. Electr. Eng., Texas Tech. Univ., Lubbock, TX, USA, Oct.
-
D. V. Prokhorov, "Adaptive critic designs and their applications," Ph.D. Dissertation, Dept. Electr. Eng., Texas Tech. Univ., Lubbock, TX, USA, Oct. 1997.
-
(1997)
Adaptive Critic Designs and Their Applications
-
-
Prokhorov, D.V.1
-
21
-
-
84863467146
-
Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming
-
Jul
-
D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, "Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming," IEEE Trans. Autom. Sci. Eng., vol. 9, no. 3, pp. 628-634, Jul. 2012.
-
(2012)
IEEE Trans. Autom. Sci. Eng.
, vol.9
, Issue.3
, pp. 628-634
-
-
Liu, D.1
Wang, D.2
Zhao, D.3
Wei, Q.4
Jin, N.5
-
22
-
-
84864489666
-
Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming
-
Jul
-
D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, "Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming," Automatica, vol. 48, pp. 1825-1832, Jul. 2012.
-
(2012)
Automatica
, vol.48
, pp. 1825-1832
-
-
Wang, D.1
Liu, D.2
Wei, Q.3
Zhao, D.4
Jin, N.5
-
23
-
-
84886339128
-
Optimal control of unkonwn nonlinear discretetime systems using the iterative globalized dual heuristic programming algorithm
-
New York, NY, USA: Wiley
-
D. Liu and D. Wang, "Optimal control of unkonwn nonlinear discretetime systems using the iterative globalized dual heuristic programming algorithm," in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. New York, NY, USA: Wiley, 2013, pp. 52-74.
-
(2013)
Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
, pp. 52-74
-
-
Liu, D.1
Wang, D.2
-
24
-
-
79960115021
-
Adaptive learning and control for MIMO system based on adaptive dynamic programming
-
Jun
-
J. Fu, H. He, and X. Zhou, "Adaptive learning and control for MIMO system based on adaptive dynamic programming," IEEE Trans. Neural Netw., vol. 22, no. 7, pp. 1133-1148, Jun. 2011.
-
(2011)
IEEE Trans. Neural Netw.
, vol.22
, Issue.7
, pp. 1133-1148
-
-
Fu, J.1
He, H.2
Zhou, X.3
-
25
-
-
80054754525
-
An online actor-critic learning approach with Levenberg-Marquardt algorithm
-
Z. Ni, H. He, D. V. Prokhorov, and J. Fu, "An online actor-critic learning approach with Levenberg-Marquardt algorithm," in Proc. IJCNN, 2011, pp. 2333-2340.
-
(2011)
Proc. IJCNN
, pp. 2333-2340
-
-
Ni, Z.1
He, H.2
Prokhorov, D.V.3
Fu, J.4
-
26
-
-
80052225230
-
Adaptive dynamic programming with balanced weights seeking strategy
-
Apr.
-
J. Fu, H. He, and Z. Ni, "Adaptive dynamic programming with balanced weights seeking strategy," in Proc. IEEE Symp. ADPRL, Apr. 2011, pp. 210-217.
-
(2011)
Proc. IEEE Symp. ADPRL
, pp. 210-217
-
-
Fu, J.1
He, H.2
Ni, Z.3
-
27
-
-
82655173881
-
A three-network architecture for on-line learning and optimization based on adaptive dynamic programming
-
H. He, Z. Ni, and J. Fu, "A three-network architecture for on-line learning and optimization based on adaptive dynamic programming," Neurocomputing, vol. 78, no. 1, pp. 3-13, 2012.
-
(2012)
Neurocomputing
, vol.78
, Issue.1
, pp. 3-13
-
-
He, H.1
Ni, Z.2
Fu, J.3
-
28
-
-
84876149222
-
Adaptive learning in tracking control based on the dual critic network design
-
Jun
-
Z. Ni, H. He, and J. Wen, "Adaptive learning in tracking control based on the dual critic network design," IEEE Trans. Neural Netw. Learn. Syst., vol. 6, no. 24, pp. 913-928, Jun. 2013.
-
(2013)
IEEE Trans. Neural Netw. Learn. Syst.
, vol.6
, Issue.24
, pp. 913-928
-
-
Ni, Z.1
He, H.2
Wen, J.3
-
30
-
-
84865079504
-
Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming
-
Z. Ni, H. He, D. Zhao, and D. Prokhorov, "Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming," in Proc. IJCNN, Jun. 2012, pp. 1-8.
-
(2012)
Proc. IJCNN, Jun.
, pp. 1-8
-
-
Ni, Z.1
He, H.2
Zhao, D.3
Prokhorov, D.4
-
31
-
-
84876118292
-
Learning and optimization in hierarchical adaptive critic design
-
Piscataway, NJ, USA: IEEE Press
-
H. He, Z. Ni, and D. Zhao, "Learning and optimization in hierarchical adaptive critic design," in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Piscataway, NJ, USA: IEEE Press, 2013, pp. 78-95.
-
(2013)
Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
, pp. 78-95
-
-
He, H.1
Ni, Z.2
Zhao, D.3
-
32
-
-
84872330793
-
Data-driven learning and control with multiple critic networks
-
H. He, Z. Ni, and D. Zhao, "Data-driven learning and control with multiple critic networks," in Proc. 10th WCICA, Jul. 2012, pp. 523-527.
-
(2012)
Proc. 10th WCICA, Jul.
, pp. 523-527
-
-
He, H.1
Ni, Z.2
Zhao, D.3
-
33
-
-
84891525568
-
Real-time tracking control on adaptive critic design with uniformly ultimately bounded condition
-
Apr.
-
Z. Ni, X. Fang, H. He, D. Zhao, and X. Xu, Real-time tracking control on adaptive critic design with uniformly ultimately bounded condition, in Proc. IEEE Symp. ADPRL, Apr. 2013.
-
(2013)
Proc. IEEE Symp. ADPRL
-
-
Ni, Z.1
Fang, X.2
He, H.3
Zhao, D.4
Xu, X.5
-
34
-
-
84872295913
-
Learning and control in virtual reality for machine intelligence
-
Jul.
-
X. Fang, H. He, Z. Ni, and Y. Tang, Learning and control in virtual reality for machine intelligence, in Proc. 3rd ICICIP, pp. 63-67, Jul. 2012.
-
(2012)
Proc. 3rd ICICIP
, pp. 63-67
-
-
Fang, X.1
He, H.2
Ni, Z.3
Tang, Y.4
-
36
-
-
0005942467
-
Neural network design for J function approximation in dynamic programming
-
X. Pang and P. J. Werbos, "Neural network design for J function approximation in dynamic programming," Math. Model. Sci. Comput., vol. 5, nos. 2-3, pp. 1-3, 1996.
-
(1996)
Math. Model. Sci. Comput.
, vol.5
, Issue.2-3
, pp. 1-3
-
-
Pang, X.1
Werbos, P.J.2
-
37
-
-
0033698503
-
The cellular simultaneous recurrent network adaptive critic design for the generalized maze problem has a simple closed-form solution
-
Jul.
-
D. Wunsch, "The cellular simultaneous recurrent network adaptive critic design for the generalized maze problem has a simple closed-form solution," in Proc. IEEE IJCNN, vol. 3. Jul. 2000, pp. 79-82.
-
(2000)
Proc. IEEE IJCNN
, vol.3
, pp. 79-82
-
-
Wunsch, D.1
-
38
-
-
34548237758
-
Cellular SRN trained by extended Kalman filter shows promise for ADP
-
1716135, International Joint Conference on Neural Networks 2006, IJCNN '06
-
R. Ilin, R. Kozma, and P. Werbos, "Cellular SRN trained by extended Kalman filter shows promise for ADP," in Proc. IEEE IJCNN, Jul. 2006, pp. 506-510. (Pubitemid 351369350)
-
(2006)
IEEE International Conference on Neural Networks - Conference Proceedings
, pp. 506-510
-
-
Ilin, R.1
Kozma, R.2
Werbos, P.J.3
-
39
-
-
34548725198
-
Efficient learning in cellular simultaneous recurrent neural networks - The case of maze navigation problem
-
DOI 10.1109/ADPRL.2007.368206, 4220851, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
-
R. Ilin, R. Kozma, and P. Werbos, "Efficient learning in cellular simultaneous recurrent neural networks - The case of maze navigation problem," in Proc. IEEE Int. Symp. ADPRL, Apr. 2007, pp. 324-329. (Pubitemid 47431403)
-
(2007)
Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
, pp. 324-329
-
-
Ilin, R.1
Kozma, R.2
Werbos, P.J.3
-
40
-
-
49149131955
-
Beyond feedforward models trained by backpropagation: A practical training tool for a more efficient universal approximator
-
Jun
-
R. Ilin, R. Kozma, and P. Werbos, "Beyond feedforward models trained by backpropagation: A practical training tool for a more efficient universal approximator," IEEE Trans. Neural Netw., vol. 19, no. 6, pp. 929-937, Jun. 2008.
-
(2008)
IEEE Trans. Neural Netw.
, vol.19
, Issue.6
, pp. 929-937
-
-
Ilin, R.1
Kozma, R.2
Werbos, P.3
-
41
-
-
34548771972
-
Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods
-
DOI 10.1109/ADPRL.2007.368200, 4220845, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
-
M. Wiering and H. Van Hasselt, "Two novel on-policy reinforcement learning algorithms based on TD (λ)-methods," in Proc. IEEE Int. Symp. ADPRL, Apr. 2007, pp. 280-287. (Pubitemid 47431397)
-
(2007)
Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
, pp. 280-287
-
-
Wiering, M.A.1
Van Hasselt, H.2
-
42
-
-
0030421566
-
Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot
-
Oct.
-
P. Werbos and X. Pang, "Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot," in Proc. IEEE Int. Conf. Syst., Man, Cybern., vol. 3. Oct. 1996, pp. 1764-1769.
-
(1996)
Proc. IEEE Int. Conf. Syst., Man, Cybern.
, vol.3
, pp. 1764-1769
-
-
Werbos, P.1
Pang, X.2
-
44
-
-
84862811991
-
A boundedness result for the direct heuristic dynamic programming
-
Aug.
-
F. Liu, J. Sun, J. Si, W. Guo, and S. Mei, "A boundedness result for the direct heuristic dynamic programming," Neural Netw., vol. 32, pp. 229-235, Aug. 2012.
-
(2012)
Neural Netw.
, vol.32
, pp. 229-235
-
-
Liu, F.1
Sun, J.2
Si, J.3
Guo, W.4
Mei, S.5
-
45
-
-
34047138362
-
Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints
-
DOI 10.1109/TSMCB.2006.883869, Special Issue on Robot Learning by Observation, Demonstration and Imitation
-
P. He and S. Jagannathan, "Reinforcement learning neural-networkbased controller for nonlinear discrete-time systems with input contraints," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 2, pp. 425-436, Apr. 2007. (Pubitemid 46523230)
-
(2007)
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
, vol.37
, Issue.2
, pp. 425-436
-
-
He, P.1
Jagannathan, S.2
-
46
-
-
0028388685
-
TD (λ) converges with probability 1
-
P. Dayan and T. J. Sejnowski, "TD (λ) converges with probability 1," Mach. Learn., vol. 14, no. 3, pp. 295-301, 1994.
-
(1994)
Mach. Learn.
, vol.14
, Issue.3
, pp. 295-301
-
-
Dayan, P.1
Sejnowski, T.J.2
-
47
-
-
0000430514
-
The convergence of TD (λ) for general λ
-
P. Dayan, "The convergence of TD (λ) for general λ," Mach. Learn., vol. 8, no. 34, pp. 341-362, 1992.
-
(1992)
Mach. Learn.
, vol.8
, Issue.34
, pp. 341-362
-
-
Dayan, P.1
-
48
-
-
0000016172
-
A stochastic approximation method
-
H. Robbins and S. Monro, "A stochastic approximation method," Anna. Math. Stat., vol. 10, no. 1, pp. 400-407, 1951.
-
(1951)
Anna. Math. Stat.
, vol.10
, Issue.1
, pp. 400-407
-
-
Robbins, H.1
Monro, S.2
|