SCOPUS 정보 검색 플랫폼

IEEE Transactions on Neural Networks and Learning Systems

Volumn 24, Issue 12, 2013, Pages 2038-2050

Goal representation heuristic dynamic programming on maze navigation

(4) Ni, Zhen a He, Haibo a Wen, Jinyu b Xu, Xin c

a University of Rhode Island (United States)

b HUAZHONG UNIVERSITY OF SCIENCE AND TECHNOLOGY (China)

c NATIONAL UNIVERSITY OF DEFENSE TECHNOLOGY (China)

Author keywords

Adaptive dynamic programming; goal representation heuristic dynamic programming; Markov decision process; maze navigation path planning; reinforcement learning

Indexed keywords

ADAPTIVE DYNAMIC PROGRAMMING; CHARACTERISTICS ANALYSIS; HEURISTIC DYNAMIC PROGRAMMING; LEARNING PERFORMANCE; MARKOV DECISION PROCESSES; REINFORCEMENT SIGNAL; THEORETICAL GUARANTEES; VALUE FUNCTION APPROXIMATION;

BENCHMARKING; LEARNING ALGORITHMS; MARKOV PROCESSES; NAVIGATION; REINFORCEMENT LEARNING;

DYNAMIC PROGRAMMING;

EID: 84887990637 PISSN: 2162237X EISSN: 21622388 Source Type: Journal
DOI: 10.1109/TNNLS.2013.2271454 Document Type: Article

Times cited : (97)

References (49)

1
- 0002201501
- Learning and sequential decision making
- Cambridge, MA, USA: MIT Press
- A. G. Barto, R. S. Sutton, and C. J. C. H. Watkins, "Learning and sequential decision making," in Learning Computational Neuroscience. Cambridge, MA, USA: MIT Press, 1989, pp. 539-602.
- (1989) Learning Computational Neuroscience , pp. 539-602
- Barto, A.G.¹ Sutton, R.S.² Watkins, C.J.C.H.³

2
- 0004102479
- U.K.: Cambridge Univ. Press
- R. Sutton and A. Barto, Reinforcement Learning: An Introduction. Cambridge, U.K.: Cambridge Univ. Press, 1998.
- (1998) Reinforcement Learning: An Introduction. Cambridge
- Sutton, R.¹ Barto, A.²

3
- 0003787146
- Princeton NJ USA: Princeton Univ. Press
- R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton Univ. Press, 1957.
- (1957) Dynamic Programming
- Bellman, R.¹

4
- 0004163205
- New York, NY, USA: Wiley
- F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. New York, NY, USA: Wiley, 2012.
- (2012) Optimal Control
- Lewis, F.L.¹ Vrabie, D.² Syrmos, V.L.³

5
- 0003565783
- Belmont, MA, USA: Athena Scientific
- D. P. Bertsekas, Dynamic Programming and Optimal Control, vol. 1, Belmont, MA, USA: Athena Scientific, 1995.
- (1995) Dynamic Programming and Optimal Control , vol.1
- Bertsekas, D.P.¹

6
- 0006488248
- Robust reinforcement learning in motion planning
- S. Singh, A. Barto, R. Grupen, and C. Connolly, "Robust reinforcement learning in motion planning," in Proc. Adv. Neural Inf. Process. Syst., 1994, pp. 655-662.
- (1994) Proc. Adv. Neural Inf. Process. Syst. , pp. 655-662
- Singh, S.¹ Barto, A.² Grupen, R.³ Connolly, C.⁴

7
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," in Proc. 7th Int. Conf. Mach. Learn., 1990, pp. 216-224.
- (1990) Proc. 7th Int. Conf. Mach. Learn. , pp. 216-224
- Sutton, R.S.¹

8
- 33847202724
- Learning to predict by the methods of temporal differences
- R. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
- (1988) Mach. Learn. , vol.3 , Issue.1 , pp. 9-44
- Sutton, R.¹

9
- 0003617454
- Ph.D. dissertation, Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Jan.
- R. Sutton, Temporal credit assignment in reinforcement learning, Ph.D. dissertation, Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Jan. 1984.
- (1984) Temporal Credit Assignment in Reinforcement Learning
- Sutton, R.¹

10
- 84876119970
- Reinforcement learning and approximate dynamic programming (RLADP)-foundations, common misconceptions and challenges ahead
- New York, USA: Wiley
- P. Werbos, "Reinforcement learning and approximate dynamic programming (RLADP) - foundations, common misconceptions and challenges ahead," in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. New York, USA: Wiley, 2013, pp. 3-30.
- (2013) Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , pp. 3-30
- Werbos, P.¹

11
- 49049091767
- ADP: The key direction for future research in intelligent control and understanding brain intelligence
- Aug
- P. J. Werbos, "ADP: The key direction for future research in intelligent control and understanding brain intelligence," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 898-900, Aug. 2008.
- (2008) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.38 , Issue.4 , pp. 898-900
- Werbos, P.J.¹

12
- 67349247013
- Intelligence in the brain: A theory of how it works and how to build it
- P. J. Werbos, "Intelligence in the brain: A theory of how it works and how to build it," Neural Netw., vol. 22, no. 3, pp. 200-212, 2009.
- (2009) Neural Netw. , vol.22 , Issue.3 , pp. 200-212
- Werbos, P.J.¹

13
- 84887994897
- New York, NY, USA: Van Nostrand
- P. J. Werbos, Handbook of Itelligent Control. New York, NY, USA: Van Nostrand, 1992.
- (1992) Handbook of Itelligent Control
- Werbos, P.J.¹

14
- 0025229247
- Consistency of HDP applied to a simple reinforcement learning problem
- P. J. Werbos, "Consistency of HDP applied to a simple reinforcement learning problem," Neural Netw., vol. 3, no. 2, pp. 179-189, 1990.
- (1990) Neural Netw. , vol.3 , Issue.2 , pp. 179-189
- Werbos, P.J.¹

15
- 0035273403
- On-line learning control by association and reinforcement
- DOI 10.1109/72.914523, PII S1045922701014047
- J. Si and Y.-T. Wang, "Online learning control by association and reinforcement," IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264-276, Mar. 2001. (Pubitemid 32371483)
- (2001) IEEE Transactions on Neural Networks , vol.12 , Issue.2 , pp. 264-276
- Si, J.¹ Wang, Y.-T.²

16
- 0043026775
- Helicopter trimming and tracking control using direct neural dynamic programming
- Jul
- R. Enns and J. Si, "Helicopter trimming and tracking control using direct neural dynamic programming," IEEE Trans. Neural Netw., vol. 14, no. 4, pp. 929-939, Jul. 2003.
- (2003) IEEE Trans. Neural Netw. , vol.14 , Issue.4 , pp. 929-939
- Enns, R.¹ Si, J.²

17
- 70349615619
- Direct heuristic dynamic programming for nonlinear tracking conrol with filtered tracking error
- Dec
- L. Yang, J. Si, K. S. Tsakalis, and A. A. Rodriguez, "Direct heuristic dynamic programming for nonlinear tracking conrol with filtered tracking error," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39, no. 6, pp. 1617-1622, Dec. 2009.
- (2009) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.39 , Issue.6 , pp. 1617-1622
- Yang, L.¹ Si, J.² Tsakalis, K.S.³ Rodriguez, A.A.⁴

18
- 84881555023
- Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems
- Apr
- D. Liu and Q. Wei, "Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems," IEEE Trans. Cybern., vol. 43, no. 2, pp. 779-789, Apr. 2013.
- (2013) IEEE Trans. Cybern. , vol.43 , Issue.2 , pp. 779-789
- Liu, D.¹ Wei, Q.²

19
- 0031236002
- Adaptive critic designs
- PII S1045922797052430
- D. Prokhorov and D. Wunsch, "Adaptive critic designs," IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997-1007, Sep. 1997. (Pubitemid 127763331)
- (1997) IEEE Transactions on Neural Networks , vol.8 , Issue.5 , pp. 997-1007
- Prokhorov, D.V.¹ Wunsch II, D.C.²

20
- 0003448868
- Ph.D. Dissertation, Dept. Electr. Eng., Texas Tech. Univ., Lubbock, TX, USA, Oct.
- D. V. Prokhorov, "Adaptive critic designs and their applications," Ph.D. Dissertation, Dept. Electr. Eng., Texas Tech. Univ., Lubbock, TX, USA, Oct. 1997.
- (1997) Adaptive Critic Designs and Their Applications
- Prokhorov, D.V.¹

21
- 84863467146
- Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming
- Jul
- D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, "Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming," IEEE Trans. Autom. Sci. Eng., vol. 9, no. 3, pp. 628-634, Jul. 2012.
- (2012) IEEE Trans. Autom. Sci. Eng. , vol.9 , Issue.3 , pp. 628-634
- Liu, D.¹ Wang, D.² Zhao, D.³ Wei, Q.⁴ Jin, N.⁵

22
- 84864489666
- Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming
- Jul
- D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, "Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming," Automatica, vol. 48, pp. 1825-1832, Jul. 2012.
- (2012) Automatica , vol.48 , pp. 1825-1832
- Wang, D.¹ Liu, D.² Wei, Q.³ Zhao, D.⁴ Jin, N.⁵

23
- 84886339128
- Optimal control of unkonwn nonlinear discretetime systems using the iterative globalized dual heuristic programming algorithm
- New York, NY, USA: Wiley
- D. Liu and D. Wang, "Optimal control of unkonwn nonlinear discretetime systems using the iterative globalized dual heuristic programming algorithm," in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. New York, NY, USA: Wiley, 2013, pp. 52-74.
- (2013) Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , pp. 52-74
- Liu, D.¹ Wang, D.²

24
- 79960115021
- Adaptive learning and control for MIMO system based on adaptive dynamic programming
- Jun
- J. Fu, H. He, and X. Zhou, "Adaptive learning and control for MIMO system based on adaptive dynamic programming," IEEE Trans. Neural Netw., vol. 22, no. 7, pp. 1133-1148, Jun. 2011.
- (2011) IEEE Trans. Neural Netw. , vol.22 , Issue.7 , pp. 1133-1148
- Fu, J.¹ He, H.² Zhou, X.³

25
- 80054754525
- An online actor-critic learning approach with Levenberg-Marquardt algorithm
- Z. Ni, H. He, D. V. Prokhorov, and J. Fu, "An online actor-critic learning approach with Levenberg-Marquardt algorithm," in Proc. IJCNN, 2011, pp. 2333-2340.
- (2011) Proc. IJCNN , pp. 2333-2340
- Ni, Z.¹ He, H.² Prokhorov, D.V.³ Fu, J.⁴

26
- 80052225230
- Adaptive dynamic programming with balanced weights seeking strategy
- Apr.
- J. Fu, H. He, and Z. Ni, "Adaptive dynamic programming with balanced weights seeking strategy," in Proc. IEEE Symp. ADPRL, Apr. 2011, pp. 210-217.
- (2011) Proc. IEEE Symp. ADPRL , pp. 210-217
- Fu, J.¹ He, H.² Ni, Z.³

27
- 82655173881
- A three-network architecture for on-line learning and optimization based on adaptive dynamic programming
- H. He, Z. Ni, and J. Fu, "A three-network architecture for on-line learning and optimization based on adaptive dynamic programming," Neurocomputing, vol. 78, no. 1, pp. 3-13, 2012.
- (2012) Neurocomputing , vol.78 , Issue.1 , pp. 3-13
- He, H.¹ Ni, Z.² Fu, J.³

28
- 84876149222
- Adaptive learning in tracking control based on the dual critic network design
- Jun
- Z. Ni, H. He, and J. Wen, "Adaptive learning in tracking control based on the dual critic network design," IEEE Trans. Neural Netw. Learn. Syst., vol. 6, no. 24, pp. 913-928, Jun. 2013.
- (2013) IEEE Trans. Neural Netw. Learn. Syst. , vol.6 , Issue.24 , pp. 913-928
- Ni, Z.¹ He, H.² Wen, J.³

29
- 84891585216
- New York NY USA: Wiley
- H. He, Self-Adaptive Systems for Machine Intelligence. New York, NY, USA: Wiley, 2011.
- (2011) Self-Adaptive Systems for Machine Intelligence
- He, H.¹

30
- 84865079504
- Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming
- Z. Ni, H. He, D. Zhao, and D. Prokhorov, "Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming," in Proc. IJCNN, Jun. 2012, pp. 1-8.
- (2012) Proc. IJCNN, Jun. , pp. 1-8
- Ni, Z.¹ He, H.² Zhao, D.³ Prokhorov, D.⁴

31
- 84876118292
- Learning and optimization in hierarchical adaptive critic design
- Piscataway, NJ, USA: IEEE Press
- H. He, Z. Ni, and D. Zhao, "Learning and optimization in hierarchical adaptive critic design," in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Piscataway, NJ, USA: IEEE Press, 2013, pp. 78-95.
- (2013) Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , pp. 78-95
- He, H.¹ Ni, Z.² Zhao, D.³

32
- 84872330793
- Data-driven learning and control with multiple critic networks
- H. He, Z. Ni, and D. Zhao, "Data-driven learning and control with multiple critic networks," in Proc. 10th WCICA, Jul. 2012, pp. 523-527.
- (2012) Proc. 10th WCICA, Jul. , pp. 523-527
- He, H.¹ Ni, Z.² Zhao, D.³

33
- 84891525568
- Real-time tracking control on adaptive critic design with uniformly ultimately bounded condition
- Apr.
- Z. Ni, X. Fang, H. He, D. Zhao, and X. Xu, Real-time tracking control on adaptive critic design with uniformly ultimately bounded condition, in Proc. IEEE Symp. ADPRL, Apr. 2013.
- (2013) Proc. IEEE Symp. ADPRL
- Ni, Z.¹ Fang, X.² He, H.³ Zhao, D.⁴ Xu, X.⁵

34
- 84872295913
- Learning and control in virtual reality for machine intelligence
- Jul.
- X. Fang, H. He, Z. Ni, and Y. Tang, Learning and control in virtual reality for machine intelligence, in Proc. 3rd ICICIP, pp. 63-67, Jul. 2012.
- (2012) Proc. 3rd ICICIP , pp. 63-67
- Fang, X.¹ He, H.² Ni, Z.³ Tang, Y.⁴

35
- 84887990728
- Virtual Reality (VR) Platform for Adaptive Learning and Control Based on Adaptive Dynamic Programming [Online]. Available: http://www.youtube.com/watch- v=OeZEDBz6ki0&feature=youtu.be
- Platform for Adaptive Learning and Control Based on Adaptive Dynamic Programming

36
- 0005942467
- Neural network design for J function approximation in dynamic programming
- X. Pang and P. J. Werbos, "Neural network design for J function approximation in dynamic programming," Math. Model. Sci. Comput., vol. 5, nos. 2-3, pp. 1-3, 1996.
- (1996) Math. Model. Sci. Comput. , vol.5 , Issue.2-3 , pp. 1-3
- Pang, X.¹ Werbos, P.J.²

37
- 0033698503
- The cellular simultaneous recurrent network adaptive critic design for the generalized maze problem has a simple closed-form solution
- Jul.
- D. Wunsch, "The cellular simultaneous recurrent network adaptive critic design for the generalized maze problem has a simple closed-form solution," in Proc. IEEE IJCNN, vol. 3. Jul. 2000, pp. 79-82.
- (2000) Proc. IEEE IJCNN , vol.3 , pp. 79-82
- Wunsch, D.¹

38
- 34548237758
- Cellular SRN trained by extended Kalman filter shows promise for ADP
- 1716135, International Joint Conference on Neural Networks 2006, IJCNN '06
- R. Ilin, R. Kozma, and P. Werbos, "Cellular SRN trained by extended Kalman filter shows promise for ADP," in Proc. IEEE IJCNN, Jul. 2006, pp. 506-510. (Pubitemid 351369350)
- (2006) IEEE International Conference on Neural Networks - Conference Proceedings , pp. 506-510
- Ilin, R.¹ Kozma, R.² Werbos, P.J.³

39
- 34548725198
- Efficient learning in cellular simultaneous recurrent neural networks - The case of maze navigation problem
- DOI 10.1109/ADPRL.2007.368206, 4220851, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
- R. Ilin, R. Kozma, and P. Werbos, "Efficient learning in cellular simultaneous recurrent neural networks - The case of maze navigation problem," in Proc. IEEE Int. Symp. ADPRL, Apr. 2007, pp. 324-329. (Pubitemid 47431403)
- (2007) Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 , pp. 324-329
- Ilin, R.¹ Kozma, R.² Werbos, P.J.³

40
- 49149131955
- Beyond feedforward models trained by backpropagation: A practical training tool for a more efficient universal approximator
- Jun
- R. Ilin, R. Kozma, and P. Werbos, "Beyond feedforward models trained by backpropagation: A practical training tool for a more efficient universal approximator," IEEE Trans. Neural Netw., vol. 19, no. 6, pp. 929-937, Jun. 2008.
- (2008) IEEE Trans. Neural Netw. , vol.19 , Issue.6 , pp. 929-937
- Ilin, R.¹ Kozma, R.² Werbos, P.³

41
- 34548771972
- Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods
- DOI 10.1109/ADPRL.2007.368200, 4220845, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
- M. Wiering and H. Van Hasselt, "Two novel on-policy reinforcement learning algorithms based on TD (λ)-methods," in Proc. IEEE Int. Symp. ADPRL, Apr. 2007, pp. 280-287. (Pubitemid 47431397)
- (2007) Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 , pp. 280-287
- Wiering, M.A.¹ Van Hasselt, H.²

42
- 0030421566
- Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot
- Oct.
- P. Werbos and X. Pang, "Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot," in Proc. IEEE Int. Conf. Syst., Man, Cybern., vol. 3. Oct. 1996, pp. 1764-1769.
- (1996) Proc. IEEE Int. Conf. Syst., Man, Cybern. , vol.3 , pp. 1764-1769
- Werbos, P.¹ Pang, X.²

43
- 0004255908
- New York, NY USA: McGraw-Hill
- T. M. Mitchell, Machine Learning. New York, NY, USA: McGraw-Hill, 1997.
- (1997) Machine Learning
- Mitchell, T.M.¹

44
- 84862811991
- A boundedness result for the direct heuristic dynamic programming
- Aug.
- F. Liu, J. Sun, J. Si, W. Guo, and S. Mei, "A boundedness result for the direct heuristic dynamic programming," Neural Netw., vol. 32, pp. 229-235, Aug. 2012.
- (2012) Neural Netw. , vol.32 , pp. 229-235
- Liu, F.¹ Sun, J.² Si, J.³ Guo, W.⁴ Mei, S.⁵

45
- 34047138362
- Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints
- DOI 10.1109/TSMCB.2006.883869, Special Issue on Robot Learning by Observation, Demonstration and Imitation
- P. He and S. Jagannathan, "Reinforcement learning neural-networkbased controller for nonlinear discrete-time systems with input contraints," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 2, pp. 425-436, Apr. 2007. (Pubitemid 46523230)
- (2007) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.37 , Issue.2 , pp. 425-436
- He, P.¹ Jagannathan, S.²

46
- 0028388685
- TD (λ) converges with probability 1
- P. Dayan and T. J. Sejnowski, "TD (λ) converges with probability 1," Mach. Learn., vol. 14, no. 3, pp. 295-301, 1994.
- (1994) Mach. Learn. , vol.14 , Issue.3 , pp. 295-301
- Dayan, P.¹ Sejnowski, T.J.²

47
- 0000430514
- The convergence of TD (λ) for general λ
- P. Dayan, "The convergence of TD (λ) for general λ," Mach. Learn., vol. 8, no. 34, pp. 341-362, 1992.
- (1992) Mach. Learn. , vol.8 , Issue.34 , pp. 341-362
- Dayan, P.¹

48
- 0000016172
- A stochastic approximation method
- H. Robbins and S. Monro, "A stochastic approximation method," Anna. Math. Stat., vol. 10, no. 1, pp. 400-407, 1951.
- (1951) Anna. Math. Stat. , vol.10 , Issue.1 , pp. 400-407
- Robbins, H.¹ Monro, S.²

49
- 0004066022
- New York, NY, USA: Springer-Verlag
- H. J. Kushner and G. G. Yin, Stochastic Approximation Algorithms and Applications. New York, NY, USA: Springer-Verlag, 1997.
- (1997) Stochastic Approximation Algorithms and Applications
- Kushner, H.J.¹ Yin, G.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.