메뉴 건너뛰기




Volumn 24, Issue 12, 2013, Pages 2038-2050

Goal representation heuristic dynamic programming on maze navigation

Author keywords

Adaptive dynamic programming; goal representation heuristic dynamic programming; Markov decision process; maze navigation path planning; reinforcement learning

Indexed keywords

ADAPTIVE DYNAMIC PROGRAMMING; CHARACTERISTICS ANALYSIS; HEURISTIC DYNAMIC PROGRAMMING; LEARNING PERFORMANCE; MARKOV DECISION PROCESSES; REINFORCEMENT SIGNAL; THEORETICAL GUARANTEES; VALUE FUNCTION APPROXIMATION;

EID: 84887990637     PISSN: 2162237X     EISSN: 21622388     Source Type: Journal    
DOI: 10.1109/TNNLS.2013.2271454     Document Type: Article
Times cited : (97)

References (49)
  • 3
    • 0003787146 scopus 로고
    • Princeton NJ USA: Princeton Univ. Press
    • R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton Univ. Press, 1957.
    • (1957) Dynamic Programming
    • Bellman, R.1
  • 7
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," in Proc. 7th Int. Conf. Mach. Learn., 1990, pp. 216-224.
    • (1990) Proc. 7th Int. Conf. Mach. Learn. , pp. 216-224
    • Sutton, R.S.1
  • 8
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
    • (1988) Mach. Learn. , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.1
  • 9
    • 0003617454 scopus 로고
    • Ph.D. dissertation, Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Jan.
    • R. Sutton, Temporal credit assignment in reinforcement learning, Ph.D. dissertation, Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Jan. 1984.
    • (1984) Temporal Credit Assignment in Reinforcement Learning
    • Sutton, R.1
  • 10
    • 84876119970 scopus 로고    scopus 로고
    • Reinforcement learning and approximate dynamic programming (RLADP)-foundations, common misconceptions and challenges ahead
    • New York, USA: Wiley
    • P. Werbos, "Reinforcement learning and approximate dynamic programming (RLADP) - foundations, common misconceptions and challenges ahead," in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. New York, USA: Wiley, 2013, pp. 3-30.
    • (2013) Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , pp. 3-30
    • Werbos, P.1
  • 11
    • 49049091767 scopus 로고    scopus 로고
    • ADP: The key direction for future research in intelligent control and understanding brain intelligence
    • Aug
    • P. J. Werbos, "ADP: The key direction for future research in intelligent control and understanding brain intelligence," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 898-900, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.38 , Issue.4 , pp. 898-900
    • Werbos, P.J.1
  • 12
    • 67349247013 scopus 로고    scopus 로고
    • Intelligence in the brain: A theory of how it works and how to build it
    • P. J. Werbos, "Intelligence in the brain: A theory of how it works and how to build it," Neural Netw., vol. 22, no. 3, pp. 200-212, 2009.
    • (2009) Neural Netw. , vol.22 , Issue.3 , pp. 200-212
    • Werbos, P.J.1
  • 14
    • 0025229247 scopus 로고
    • Consistency of HDP applied to a simple reinforcement learning problem
    • P. J. Werbos, "Consistency of HDP applied to a simple reinforcement learning problem," Neural Netw., vol. 3, no. 2, pp. 179-189, 1990.
    • (1990) Neural Netw. , vol.3 , Issue.2 , pp. 179-189
    • Werbos, P.J.1
  • 15
    • 0035273403 scopus 로고    scopus 로고
    • On-line learning control by association and reinforcement
    • DOI 10.1109/72.914523, PII S1045922701014047
    • J. Si and Y.-T. Wang, "Online learning control by association and reinforcement," IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264-276, Mar. 2001. (Pubitemid 32371483)
    • (2001) IEEE Transactions on Neural Networks , vol.12 , Issue.2 , pp. 264-276
    • Si, J.1    Wang, Y.-T.2
  • 16
    • 0043026775 scopus 로고    scopus 로고
    • Helicopter trimming and tracking control using direct neural dynamic programming
    • Jul
    • R. Enns and J. Si, "Helicopter trimming and tracking control using direct neural dynamic programming," IEEE Trans. Neural Netw., vol. 14, no. 4, pp. 929-939, Jul. 2003.
    • (2003) IEEE Trans. Neural Netw. , vol.14 , Issue.4 , pp. 929-939
    • Enns, R.1    Si, J.2
  • 17
    • 70349615619 scopus 로고    scopus 로고
    • Direct heuristic dynamic programming for nonlinear tracking conrol with filtered tracking error
    • Dec
    • L. Yang, J. Si, K. S. Tsakalis, and A. A. Rodriguez, "Direct heuristic dynamic programming for nonlinear tracking conrol with filtered tracking error," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39, no. 6, pp. 1617-1622, Dec. 2009.
    • (2009) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.39 , Issue.6 , pp. 1617-1622
    • Yang, L.1    Si, J.2    Tsakalis, K.S.3    Rodriguez, A.A.4
  • 18
    • 84881555023 scopus 로고    scopus 로고
    • Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems
    • Apr
    • D. Liu and Q. Wei, "Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems," IEEE Trans. Cybern., vol. 43, no. 2, pp. 779-789, Apr. 2013.
    • (2013) IEEE Trans. Cybern. , vol.43 , Issue.2 , pp. 779-789
    • Liu, D.1    Wei, Q.2
  • 20
    • 0003448868 scopus 로고    scopus 로고
    • Ph.D. Dissertation, Dept. Electr. Eng., Texas Tech. Univ., Lubbock, TX, USA, Oct.
    • D. V. Prokhorov, "Adaptive critic designs and their applications," Ph.D. Dissertation, Dept. Electr. Eng., Texas Tech. Univ., Lubbock, TX, USA, Oct. 1997.
    • (1997) Adaptive Critic Designs and Their Applications
    • Prokhorov, D.V.1
  • 21
    • 84863467146 scopus 로고    scopus 로고
    • Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming
    • Jul
    • D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, "Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming," IEEE Trans. Autom. Sci. Eng., vol. 9, no. 3, pp. 628-634, Jul. 2012.
    • (2012) IEEE Trans. Autom. Sci. Eng. , vol.9 , Issue.3 , pp. 628-634
    • Liu, D.1    Wang, D.2    Zhao, D.3    Wei, Q.4    Jin, N.5
  • 22
    • 84864489666 scopus 로고    scopus 로고
    • Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming
    • Jul
    • D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, "Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming," Automatica, vol. 48, pp. 1825-1832, Jul. 2012.
    • (2012) Automatica , vol.48 , pp. 1825-1832
    • Wang, D.1    Liu, D.2    Wei, Q.3    Zhao, D.4    Jin, N.5
  • 23
    • 84886339128 scopus 로고    scopus 로고
    • Optimal control of unkonwn nonlinear discretetime systems using the iterative globalized dual heuristic programming algorithm
    • New York, NY, USA: Wiley
    • D. Liu and D. Wang, "Optimal control of unkonwn nonlinear discretetime systems using the iterative globalized dual heuristic programming algorithm," in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. New York, NY, USA: Wiley, 2013, pp. 52-74.
    • (2013) Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , pp. 52-74
    • Liu, D.1    Wang, D.2
  • 24
    • 79960115021 scopus 로고    scopus 로고
    • Adaptive learning and control for MIMO system based on adaptive dynamic programming
    • Jun
    • J. Fu, H. He, and X. Zhou, "Adaptive learning and control for MIMO system based on adaptive dynamic programming," IEEE Trans. Neural Netw., vol. 22, no. 7, pp. 1133-1148, Jun. 2011.
    • (2011) IEEE Trans. Neural Netw. , vol.22 , Issue.7 , pp. 1133-1148
    • Fu, J.1    He, H.2    Zhou, X.3
  • 25
    • 80054754525 scopus 로고    scopus 로고
    • An online actor-critic learning approach with Levenberg-Marquardt algorithm
    • Z. Ni, H. He, D. V. Prokhorov, and J. Fu, "An online actor-critic learning approach with Levenberg-Marquardt algorithm," in Proc. IJCNN, 2011, pp. 2333-2340.
    • (2011) Proc. IJCNN , pp. 2333-2340
    • Ni, Z.1    He, H.2    Prokhorov, D.V.3    Fu, J.4
  • 26
    • 80052225230 scopus 로고    scopus 로고
    • Adaptive dynamic programming with balanced weights seeking strategy
    • Apr.
    • J. Fu, H. He, and Z. Ni, "Adaptive dynamic programming with balanced weights seeking strategy," in Proc. IEEE Symp. ADPRL, Apr. 2011, pp. 210-217.
    • (2011) Proc. IEEE Symp. ADPRL , pp. 210-217
    • Fu, J.1    He, H.2    Ni, Z.3
  • 27
    • 82655173881 scopus 로고    scopus 로고
    • A three-network architecture for on-line learning and optimization based on adaptive dynamic programming
    • H. He, Z. Ni, and J. Fu, "A three-network architecture for on-line learning and optimization based on adaptive dynamic programming," Neurocomputing, vol. 78, no. 1, pp. 3-13, 2012.
    • (2012) Neurocomputing , vol.78 , Issue.1 , pp. 3-13
    • He, H.1    Ni, Z.2    Fu, J.3
  • 28
    • 84876149222 scopus 로고    scopus 로고
    • Adaptive learning in tracking control based on the dual critic network design
    • Jun
    • Z. Ni, H. He, and J. Wen, "Adaptive learning in tracking control based on the dual critic network design," IEEE Trans. Neural Netw. Learn. Syst., vol. 6, no. 24, pp. 913-928, Jun. 2013.
    • (2013) IEEE Trans. Neural Netw. Learn. Syst. , vol.6 , Issue.24 , pp. 913-928
    • Ni, Z.1    He, H.2    Wen, J.3
  • 30
    • 84865079504 scopus 로고    scopus 로고
    • Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming
    • Z. Ni, H. He, D. Zhao, and D. Prokhorov, "Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming," in Proc. IJCNN, Jun. 2012, pp. 1-8.
    • (2012) Proc. IJCNN, Jun. , pp. 1-8
    • Ni, Z.1    He, H.2    Zhao, D.3    Prokhorov, D.4
  • 32
    • 84872330793 scopus 로고    scopus 로고
    • Data-driven learning and control with multiple critic networks
    • H. He, Z. Ni, and D. Zhao, "Data-driven learning and control with multiple critic networks," in Proc. 10th WCICA, Jul. 2012, pp. 523-527.
    • (2012) Proc. 10th WCICA, Jul. , pp. 523-527
    • He, H.1    Ni, Z.2    Zhao, D.3
  • 33
    • 84891525568 scopus 로고    scopus 로고
    • Real-time tracking control on adaptive critic design with uniformly ultimately bounded condition
    • Apr.
    • Z. Ni, X. Fang, H. He, D. Zhao, and X. Xu, Real-time tracking control on adaptive critic design with uniformly ultimately bounded condition, in Proc. IEEE Symp. ADPRL, Apr. 2013.
    • (2013) Proc. IEEE Symp. ADPRL
    • Ni, Z.1    Fang, X.2    He, H.3    Zhao, D.4    Xu, X.5
  • 34
    • 84872295913 scopus 로고    scopus 로고
    • Learning and control in virtual reality for machine intelligence
    • Jul.
    • X. Fang, H. He, Z. Ni, and Y. Tang, Learning and control in virtual reality for machine intelligence, in Proc. 3rd ICICIP, pp. 63-67, Jul. 2012.
    • (2012) Proc. 3rd ICICIP , pp. 63-67
    • Fang, X.1    He, H.2    Ni, Z.3    Tang, Y.4
  • 36
    • 0005942467 scopus 로고    scopus 로고
    • Neural network design for J function approximation in dynamic programming
    • X. Pang and P. J. Werbos, "Neural network design for J function approximation in dynamic programming," Math. Model. Sci. Comput., vol. 5, nos. 2-3, pp. 1-3, 1996.
    • (1996) Math. Model. Sci. Comput. , vol.5 , Issue.2-3 , pp. 1-3
    • Pang, X.1    Werbos, P.J.2
  • 37
    • 0033698503 scopus 로고    scopus 로고
    • The cellular simultaneous recurrent network adaptive critic design for the generalized maze problem has a simple closed-form solution
    • Jul.
    • D. Wunsch, "The cellular simultaneous recurrent network adaptive critic design for the generalized maze problem has a simple closed-form solution," in Proc. IEEE IJCNN, vol. 3. Jul. 2000, pp. 79-82.
    • (2000) Proc. IEEE IJCNN , vol.3 , pp. 79-82
    • Wunsch, D.1
  • 40
    • 49149131955 scopus 로고    scopus 로고
    • Beyond feedforward models trained by backpropagation: A practical training tool for a more efficient universal approximator
    • Jun
    • R. Ilin, R. Kozma, and P. Werbos, "Beyond feedforward models trained by backpropagation: A practical training tool for a more efficient universal approximator," IEEE Trans. Neural Netw., vol. 19, no. 6, pp. 929-937, Jun. 2008.
    • (2008) IEEE Trans. Neural Netw. , vol.19 , Issue.6 , pp. 929-937
    • Ilin, R.1    Kozma, R.2    Werbos, P.3
  • 42
    • 0030421566 scopus 로고    scopus 로고
    • Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot
    • Oct.
    • P. Werbos and X. Pang, "Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot," in Proc. IEEE Int. Conf. Syst., Man, Cybern., vol. 3. Oct. 1996, pp. 1764-1769.
    • (1996) Proc. IEEE Int. Conf. Syst., Man, Cybern. , vol.3 , pp. 1764-1769
    • Werbos, P.1    Pang, X.2
  • 44
    • 84862811991 scopus 로고    scopus 로고
    • A boundedness result for the direct heuristic dynamic programming
    • Aug.
    • F. Liu, J. Sun, J. Si, W. Guo, and S. Mei, "A boundedness result for the direct heuristic dynamic programming," Neural Netw., vol. 32, pp. 229-235, Aug. 2012.
    • (2012) Neural Netw. , vol.32 , pp. 229-235
    • Liu, F.1    Sun, J.2    Si, J.3    Guo, W.4    Mei, S.5
  • 45
    • 34047138362 scopus 로고    scopus 로고
    • Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints
    • DOI 10.1109/TSMCB.2006.883869, Special Issue on Robot Learning by Observation, Demonstration and Imitation
    • P. He and S. Jagannathan, "Reinforcement learning neural-networkbased controller for nonlinear discrete-time systems with input contraints," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 2, pp. 425-436, Apr. 2007. (Pubitemid 46523230)
    • (2007) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.37 , Issue.2 , pp. 425-436
    • He, P.1    Jagannathan, S.2
  • 46
    • 0028388685 scopus 로고
    • TD (λ) converges with probability 1
    • P. Dayan and T. J. Sejnowski, "TD (λ) converges with probability 1," Mach. Learn., vol. 14, no. 3, pp. 295-301, 1994.
    • (1994) Mach. Learn. , vol.14 , Issue.3 , pp. 295-301
    • Dayan, P.1    Sejnowski, T.J.2
  • 47
    • 0000430514 scopus 로고
    • The convergence of TD (λ) for general λ
    • P. Dayan, "The convergence of TD (λ) for general λ," Mach. Learn., vol. 8, no. 34, pp. 341-362, 1992.
    • (1992) Mach. Learn. , vol.8 , Issue.34 , pp. 341-362
    • Dayan, P.1
  • 48
    • 0000016172 scopus 로고
    • A stochastic approximation method
    • H. Robbins and S. Monro, "A stochastic approximation method," Anna. Math. Stat., vol. 10, no. 1, pp. 400-407, 1951.
    • (1951) Anna. Math. Stat. , vol.10 , Issue.1 , pp. 400-407
    • Robbins, H.1    Monro, S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.