메뉴 건너뛰기




Volumn 22, Issue 12 PART 1, 2011, Pages 1863-1877

Hierarchical approximate policy iteration with binary-tree state space decomposition

Author keywords

Adaptive dynamic programming; approximate policy iteration; binary tree; hierarchical reinforcement learning; Markov decision processes; time optimal control

Indexed keywords

ADAPTIVE DYNAMIC PROGRAMMING; APPROXIMATE POLICY ITERATION; HIERARCHICAL REINFORCEMENT LEARNING; MARKOV DECISION PROCESSES; TIME OPTIMAL CONTROL;

EID: 83855164075     PISSN: 10459227     EISSN: None     Source Type: Journal    
DOI: 10.1109/TNN.2011.2168422     Document Type: Article
Times cited : (40)

References (46)
  • 2
    • 66449130966 scopus 로고    scopus 로고
    • Adaptive dynamic programming: An introduction
    • May
    • F. Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: An introduction," IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39-47, May 2009.
    • (2009) IEEE Comput. Intell. Mag. , vol.4 , Issue.2 , pp. 39-47
    • Wang, F.Y.1    Zhang, H.2    Liu, D.3
  • 3
    • 0000985504 scopus 로고
    • TD-gammon a self-teaching backgammon program achieves master-level play
    • Mar.
    • G. Tesauro, "TD-gammon a self-teaching backgammon program achieves master-level play," Neural Comput., vol. 6, no. 2, pp. 215-219, Mar. 1994.
    • (1994) Neural Comput. , vol.6 , Issue.2 , pp. 215-219
    • Tesauro, G.1
  • 4
    • 84918834208 scopus 로고
    • A reinforcement learning approach to job-shop scheduling
    • San Francisco, C.A.
    • W. Zhang and T. Dietterich, "A reinforcement learning approach to job-shop scheduling," in Proc. 14th Int. Joint Conf. Art. Intell., San Francisco, C.A., 1995, pp. 1114-1120.
    • (1995) Proc. 14th Int. Joint Conf. Art. Intell , pp. 1114-1120
    • Zhang, W.1    Dietterich, T.2
  • 5
    • 0032208335 scopus 로고    scopus 로고
    • Elevator Group Control Using Multiple Reinforcement Learning Agents
    • R. H. Crites and A. G. Barto, "Elevator group control using multiple reinforcement learning agents," Mach. Learn., vol. 33, nos. 2-3, pp. 235-262, 1998. (Pubitemid 128522644)
    • (1998) Machine Learning , vol.12 , Issue.4 , pp. 235-262
    • Crites, R.H.1    Barto, A.G.2
  • 7
    • 85012688561 scopus 로고
    • Princeton N.J: Princeton Univ. Press
    • R. E. Bellman, Dynamic Programming. Princeton, N.J: Princeton Univ. Press, 1957.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 8
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • Cambridge, MA: MIT Press
    • R. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press, 1996, pp. 1038-1044.
    • (1996) Advances in Neural Information Processing Systems 8 , pp. 1038-1044
    • Sutton, R.1
  • 9
    • 0036911781 scopus 로고    scopus 로고
    • Residual-gradient-based neural reinforcement learning for the optimal control of an acrobat
    • Vancouver, Canada
    • X. Xu and H. G. He, "Residual-gradient-based neural reinforcement learning for the optimal control of an acrobat," in Proc. IEEE Int. Symp. Intell. Control, Vancouver, Canada, 2002, pp. 758-763.
    • (2002) Proc.IEEE Int. Symp. Intell. Control , pp. 758-763
    • Xu, X.1    He, H.G.2
  • 10
    • 0013535965 scopus 로고    scopus 로고
    • Infinite-horizon policy-gradient estimation
    • Jul.
    • J. Baxter and P. L. Bartlett, "Infinite-horizon policy-gradient estimation," J. Art. Intell. Res., vol. 15, no. 1, pp. 319-350, Jul. 2001.
    • (2001) J. Art. Intell. Res. , vol.15 , Issue.1 , pp. 319-350
    • Baxter, J.1    Bartlett, P.L.2
  • 12
    • 0041345290 scopus 로고    scopus 로고
    • Efficient reinforcement learning using recursive least-squares methods
    • X. Xu, H. G. He, and D. W. Hu, "Efficient reinforcement learning using recursive least-squares methods," J. Art. Intell. Res., vol. 16, no. 1, pp. 259-292, Jan. 2002. (Pubitemid 43057174)
    • (2002) Journal of Artificial Intelligence Research , vol.16 , pp. 259-292
    • Xu, X.1    He, H.-G.2    Hu, D.3
  • 13
    • 79960115021 scopus 로고    scopus 로고
    • Adaptive learning and control for MIMO system based on adaptive dynamic programming
    • Jul.
    • J. Fu, H. He, and X. Zhou, "Adaptive learning and control for MIMO system based on adaptive dynamic programming," IEEE Trans. Neural Netw., vol. 22, no. 7, pp. 1133-1148, Jul. 2011.
    • (2011) IEEE Trans. Neural Netw. , vol.22 , Issue.7 , pp. 1133-1148
    • Fu, J.1    He, H.2    Zhou, X.3
  • 15
    • 70349253929 scopus 로고    scopus 로고
    • Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints
    • Sep.
    • H. G. Zhang, Y. H. Luo, and D. R. Liu, "Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints," IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1490-1503, Sep. 2009.
    • (2009) IEEE Trans. Neural Netw. , vol.20 , Issue.9 , pp. 1490-1503
    • Zhang, H.G.1    Luo, Y.H.2    Liu, D.R.3
  • 16
    • 4644323293 scopus 로고    scopus 로고
    • Least-squares policy iteration
    • Dec.
    • M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Mach. Learn. Res., vol. 4, pp. 1107-1149, Dec. 2003.
    • (2003) J. Mach. Learn. Res. , vol.4 , pp. 1107-1149
    • Lagoudakis, M.G.1    Parr, R.2
  • 17
    • 34547098844 scopus 로고    scopus 로고
    • Kernel-based least squares policy iteration for reinforcement learning
    • DOI 10.1109/TNN.2007.899161, Neural Networks for Feedback Control Systems
    • X. Xu, D. W. Hu, and X. C. Lu, "Kernel based least-squares policy iteration for reinforcement learning," IEEE Trans. Neural Netw., vol. 18, no. 4, pp. 973-992, Jul. 2007. (Pubitemid 47098876)
    • (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
    • Xu, X.1    Hu, D.2    Lu, X.3
  • 18
    • 0141988716 scopus 로고    scopus 로고
    • Recent advances in hierarchical reinforcement learning
    • Jan.-Apr.
    • A. G. Barto and S. Mahadevan, "Recent advances in hierarchical reinforcement learning," Discrete Event Dynamic Syst.-Theory Applicat., vol. 13, nos. 1-2, pp. 41-77, Jan.-Apr. 2003.
    • (2003) Discrete Event Dynamic Syst.-Theory Applicat. , vol.13 , Issue.1-2 , pp. 41-77
    • Barto, A.G.1    Mahadevan, S.2
  • 19
    • 36949027865 scopus 로고    scopus 로고
    • Hierarchical average reward reinforcement learning
    • M. Ghavamzadeh and S. Mahadevan, "Hierarchical average reward reinforcement learning," J. Mach. Learn. Res., vol. 8, pp. 2629-2669, Nov. 2007. (Pubitemid 350241862)
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 2629-2669
    • Ghavamzadeh, M.1    Mahadevan, S.2
  • 20
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
    • T. G. Dietterich, "Hierarchical reinforcement learning with the Max-Q value function decomposition," J. Art. Intell. Res., vol. 13, no. 1, pp. 227-303, Aug. 2000. (Pubitemid 33682087)
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 21
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • DOI 10.1016/S0004-3702(99)00052-1
    • R. Sutton, D. Precup, and S. Singh, "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning," Art. Intell., vol. 112, nos. 1-2, pp. 181-211, Aug. 1999. (Pubitemid 32079890)
    • (1999) Artificial Intelligence , vol.112 , Issue.1 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 22
    • 0003506152 scopus 로고    scopus 로고
    • State abstraction in MAXQ hierarchical reinforcement learning
    • T. G. Dietterich, "State abstraction in MAXQ hierarchical reinforcement learning," in Proc. Adv. Neural Inf. Process. Syst., 2000, pp. 994-1000.
    • (2000) Proc. Adv. Neural Inf. Process. Syst. , pp. 994-1000
    • Dietterich, T.G.1
  • 23
    • 0036927201 scopus 로고    scopus 로고
    • State abstraction for programmable reinforcement learning agents
    • D. Andre and S. J. Russell, "State abstraction for programmable reinforcement learning agents," in Proc. 18th Nat. Conf. Art. Intell., 2002, pp. 119-125.
    • (2002) Proc. 18th Nat. Conf. Art. Intell. , pp. 119-125
    • Andre, D.1    Russell, S.J.2
  • 24
    • 38349050495 scopus 로고    scopus 로고
    • Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning
    • B. Hengst, "Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning," in Proc. AI: Adv. Art. Intell. Lecture Notes Comput. Sci., 2007, pp. 58-67.
    • (2007) Proc. AI: Adv. Art. Intell. Lecture Notes Comput. Sci. , pp. 58-67
    • Hengst, B.1
  • 27
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • DOI 10.1023/A:1017936530646
    • J. Boyan, "Technical update: Least-squares temporal difference learning," Mach. Learn., vol. 49, nos. 2-3, pp. 233-246, 2002. (Pubitemid 34325688)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
    • Boyan, J.A.1
  • 29
    • 3543096272 scopus 로고    scopus 로고
    • The kernel recursive least-squares algorithm
    • Aug.
    • Y. Engel, S. Mannor, and R. Meir, "The kernel recursive least-squares algorithm," IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2275-2285, Aug. 2004.
    • (2004) IEEE Trans. Signal Process. , vol.52 , Issue.8 , pp. 2275-2285
    • Engel, Y.1    Mannor, S.2    Meir, R.3
  • 31
    • 58449114139 scopus 로고    scopus 로고
    • Algorithms and bounds for rollout sampling approximate policy iteration
    • Villeneuve d'Ascq, France, LNAI 5323. Jun.-Jul.
    • C. Dimitrakakis and M. G. Lagoudakis, "Algorithms and bounds for rollout sampling approximate policy iteration," in Proc. 8th Eur. Workshop, Recent Adv. Reinforce. Learn., Villeneuve d'Ascq, France, LNAI 5323. Jun.-Jul. 2008, pp. 27-40.
    • (2008) Proc. 8th Eur. Workshop, Recent Adv. Reinforce. Learn , pp. 27-40
    • Dimitrakakis, C.1    Lagoudakis, M.G.2
  • 32
    • 44649189852 scopus 로고    scopus 로고
    • Finite-time bounds for fitted valueiteration
    • May
    • R. Munos and C. Szepesvári, "Finite-time bounds for fitted value iteration," J. Mach. Learn. Res., vol. 9, pp. 815-857, May 2008.
    • (2008) J. Mach. Learn. Res. , vol.9 , pp. 815-857
    • Munos, R.1    Szepesvári, C.2
  • 33
    • 77955513754 scopus 로고    scopus 로고
    • Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices
    • Aug.
    • B. H. Li and J. Si, "Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices," IEEE Trans. Neural Netw., vol. 21, no. 8, pp. 1270-1280, Aug. 2010.
    • (2010) IEEE Trans. Neural Netw. , vol.21 , Issue.8 , pp. 1270-1280
    • Li, B.H.1    Si, J.2
  • 34
    • 77955509816 scopus 로고    scopus 로고
    • Backpropagation and ordered derivatives in the time scales calculus
    • Aug.
    • J. Seiffertt and D. C. Wunsch, "Backpropagation and ordered derivatives in the time scales calculus," IEEE Trans. Neural Netw., vol. 21, no. 8, pp. 1262-1269, Aug. 2010.
    • (2010) IEEE Trans. Neural Netw. , vol.21 , Issue.8 , pp. 1262-1269
    • Seiffertt, J.1    Wunsch, D.C.2
  • 36
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • PII S0018928697034375
    • J. N. Tsitsiklis and B. V. Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Automat. Control, vol. 42, no. 5, pp. 674-690, May 1997. (Pubitemid 127760263)
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 37
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • A. Antos, C. Szepesvari, and R. Munos, "Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path," Mach. Learn., vol. 71, no. 1, pp. 89-129, 2008.
    • (2008) Mach. Learn. , vol.71 , Issue.1 , pp. 89-129
    • Antos, A.1    Szepesvari, C.2    Munos, R.3
  • 39
    • 0003932121 scopus 로고
    • Reinforcement Learning with Selective Perception and Hidden State
    • Ph.D. thesis Rochester, NY
    • A. McCallum, "Reinforcement Learning with Selective Perception and Hidden State," Ph.D. thesis, Comput. Sci. Dept., Univ. Rochester., Rochester, NY, 1995.
    • (1995) Comput. Sci. Dept., Univ. Rochester
    • McCallum, A.1
  • 41
    • 21844465127 scopus 로고    scopus 로고
    • Tree-based batch mode reinforcement learning
    • Apr.
    • D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learn. Res., vol. 6, pp. 503-556, Apr. 2005.
    • (2005) J. Mach. Learn. Res. , vol.6 , pp. 503-556
    • Ernst, D.1    Geurts, P.2    Wehenkel, L.3
  • 42
    • 83855163944 scopus 로고    scopus 로고
    • Automated state abstraction for options using the U-tree algorithm
    • Cambridge, MA: MIT Press
    • A. Jonsson and A. G. Barto, "Automated state abstraction for options using the U-tree algorithm," in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2000.
    • (2000) Advances in Neural Information Processing Systems
    • Jonsson, A.1    Barto, A.G.2
  • 44
    • 79952394156 scopus 로고    scopus 로고
    • Ensembles of neural networks for robust reinforcement learning
    • Washington D.C.
    • H. Alexander and U. Steffen, "Ensembles of neural networks for robust reinforcement learning," in Proc. 19th Int. Conf. Mach. Learn. Applicat., Washington D.C., 2010, pp. 401-406.
    • (2010) Proc. 19th Int. Conf. Mach. Learn. Applicat , pp. 401-406
    • Alexander, H.1    Steffen, U.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.