메뉴 건너뛰기




Volumn 26, Issue 4, 2015, Pages 775-787

A data-based online reinforcement learning algorithm satisfying probably approximately correct principle

Author keywords

Kd tree; Probably approximately correct; Reinforcement learning

Indexed keywords

APPROXIMATION ALGORITHMS; CONTINUOUS TIME SYSTEMS; E-LEARNING; LEARNING ALGORITHMS; ONLINE SYSTEMS; OPTIMAL CONTROL SYSTEMS; REINFORCEMENT LEARNING; SOCIAL NETWORKING (ONLINE);

EID: 84939962659     PISSN: 09410643     EISSN: None     Source Type: Journal    
DOI: 10.1007/s00521-014-1738-2     Document Type: Article
Times cited : (15)

References (32)
  • 3
    • 79952438883 scopus 로고    scopus 로고
    • A hybrid agent architecture integrating desire, intention and reinforcement learning
    • Tan AH, Ong YS, Tapanuj A (2011) A hybrid agent architecture integrating desire, intention and reinforcement learning. Expert Syst Appl 38(7):8477–8487
    • (2011) Expert Syst Appl , vol.38 , Issue.7 , pp. 8477-8487
    • Tan, A.H.1    Ong, Y.S.2    Tapanuj, A.3
  • 4
    • 84902475773 scopus 로고    scopus 로고
    • Tang L, Liu Y-J, Tong S (2014) Adaptive neural control using reinforcement learning for a class of robot manipulator. Neural Comput Appl 25(1):135–141
    • Tang L, Liu Y-J, Tong S (2014) Adaptive neural control using reinforcement learning for a class of robot manipulator. Neural Comput Appl 25(1):135–141
  • 5
    • 84872617336 scopus 로고    scopus 로고
    • A neural-network-based iterative GDHP approach for solving a class of nonlinear optimal control problems with control constraints
    • Wang D, Liu D, Zhao D, Huang Y, Zhang D (2013) A neural-network-based iterative GDHP approach for solving a class of nonlinear optimal control problems with control constraints. Neural Comput Appl 22(2):219–227
    • (2013) Neural Comput Appl , vol.22 , Issue.2 , pp. 219-227
    • Wang, D.1    Liu, D.2    Zhao, D.3    Huang, Y.4    Zhang, D.5
  • 6
    • 84898013913 scopus 로고    scopus 로고
    • Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems
    • Wei Q, Liu D (2014) Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems. Neural Comput Appl 24(6):1355–1367
    • (2014) Neural Comput Appl , vol.24 , Issue.6 , pp. 1355-1367
    • Wei, Q.1    Liu, D.2
  • 7
    • 84896543600 scopus 로고    scopus 로고
    • Dual heuristic dynamic programming for nonlinear discrete-time uncertain systems with state delay
    • Wang B, Zhao D, Alippi C, Liu D (2014) Dual heuristic dynamic programming for nonlinear discrete-time uncertain systems with state delay. Neurocomputing 134:222–229
    • (2014) Neurocomputing , vol.134 , pp. 222-229
    • Wang, B.1    Zhao, D.2    Alippi, C.3    Liu, D.4
  • 8
    • 0004049893 scopus 로고
    • Learning from delayed rewards
    • Cambridge University, Cambridge:
    • Watkins C (1989) Learning from delayed rewards. PhD thesis, Cambridge University, Cambridge
    • (1989) PhD thesis
    • Watkins, C.1
  • 11
    • 84863467146 scopus 로고    scopus 로고
    • Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming
    • Liu D, Wang D, Zhao D, Wei Q, Jin N (2012) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634
    • (2012) IEEE Trans Autom Sci Eng , vol.9 , Issue.3 , pp. 628-634
    • Liu, D.1    Wang, D.2    Zhao, D.3    Wei, Q.4    Jin, N.5
  • 13
    • 84888019460 scopus 로고    scopus 로고
    • Full range adaptive cruise control based on supervised adaptive dynamic programming
    • Zhao D, Hu Z, Xia Z, Alippi C, Wang D (2014) Full range adaptive cruise control based on supervised adaptive dynamic programming. Neurocomputing 125:57–67
    • (2014) Neurocomputing , vol.125 , pp. 57-67
    • Zhao, D.1    Hu, Z.2    Xia, Z.3    Alippi, C.4    Wang, D.5
  • 14
    • 84885903360 scopus 로고    scopus 로고
    • A supervised actor-critic approach for adaptive cruise control
    • Zhao D, Wang B, Liu D (2013) A supervised actor-critic approach for adaptive cruise control. Soft Comput 17(11):2089–2099
    • (2013) Soft Comput , vol.17 , Issue.11 , pp. 2089-2099
    • Zhao, D.1    Wang, B.2    Liu, D.3
  • 16
    • 70350492296 scopus 로고    scopus 로고
    • The application of ADHDP λ method to coordinated multiple ramps metering
    • Bai X, Zhao D, Yi J (2009) The application of ADHDP$$(\lambda)$$(λ) method to coordinated multiple ramps metering. Int J Innov Comput 5(10(B)):3471–3481
    • (2009) Int J Innov Comput , vol.5 , Issue.10B , pp. 3471-3481
    • Bai, X.1    Zhao, D.2    Yi, J.3
  • 17
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Kearns M, Singh S (2002) Near-optimal reinforcement learning in polynomial time. Mach Learn 49(2–3):209–232
    • (2002) Mach Learn , vol.49 , Issue.2-3 , pp. 209-232
    • Kearns, M.1    Singh, S.2
  • 18
    • 0041965975 scopus 로고    scopus 로고
    • R-max—a general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman RI, Tennenholtz M (2003) R-max—a general polynomial time algorithm for near-optimal reinforcement learning. J Mach Learn Res 3:213–231
    • (2003) J Mach Learn Res , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 19
    • 31844432138 scopus 로고    scopus 로고
    • A theoretical analysis of model-based interval estimation. In: Proceedings of 22nd international conference on machine learning (ICML’05)
    • Strehl AL, Littman ML (2005) A theoretical analysis of model-based interval estimation. In: Proceedings of 22nd international conference on machine learning (ICML’05), pp 856–863
    • (2005) pp 856–863
    • Strehl, A.L.1    Littman, M.L.2
  • 20
    • 33749255382 scopus 로고    scopus 로고
    • PAC model-free reinforcement learning. In: Proceedings of 23rd international conference on machine learning (ICML’06)
    • Strehl AL, Li L, Wiewiora E, Langford J, Littman ML (2006) PAC model-free reinforcement learning. In: Proceedings of 23rd international conference on machine learning (ICML’06), pp 881–888
    • (2006) pp 881–888
    • Strehl, A.L.1    Li, L.2    Wiewiora, E.3    Langford, J.4    Littman, M.L.5
  • 21
    • 1942452450 scopus 로고    scopus 로고
    • Exploration in metric state spaces. In: Proceedings of 20th international conference on machine learning (ICML’03)
    • Kakade S, Kearns MJ, Langford J (2003) Exploration in metric state spaces. In: Proceedings of 20th international conference on machine learning (ICML’03), pp 306–312
    • (2003) pp 306–312
    • Kakade, S.1    Kearns, M.J.2    Langford, J.3
  • 23
    • 78649716899 scopus 로고    scopus 로고
    • Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains
    • Bernstein A, Shimkin N (2010) Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains. Mach Learn 81(3):359–397
    • (2010) Mach Learn , vol.81 , Issue.3 , pp. 359-397
    • Bernstein, A.1    Shimkin, N.2
  • 24
    • 0036832953 scopus 로고    scopus 로고
    • Variable resolution discretization in optimal control
    • Munos R, Moore A (2002) Variable resolution discretization in optimal control. Mach Learn 49(2–3):291–323
    • (2002) Mach Learn , vol.49 , Issue.2-3 , pp. 291-323
    • Munos, R.1    Moore, A.2
  • 25
    • 21844465127 scopus 로고    scopus 로고
    • Tree-based batch mode reinforcement learning
    • Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503–556
    • (2005) J Mach Learn Res , vol.6 , pp. 503-556
    • Ernst, D.1    Geurts, P.2    Wehenkel, L.3
  • 27
    • 84878421441 scopus 로고    scopus 로고
    • Optimal control for discrete-time affine nonlinear systems using general value iteration
    • Li H, Liu D (2012) Optimal control for discrete-time affine nonlinear systems using general value iteration. IET Control Theory Appl 6(18):2725–2736
    • (2012) IET Control Theory Appl , vol.6 , Issue.18 , pp. 2725-2736
    • Li, H.1    Liu, D.2
  • 28
    • 49049089962 scopus 로고    scopus 로고
    • Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof
    • Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. Trans Syst Man Cyber Part B 38(4):943–949
    • (2008) Trans Syst Man Cyber Part B , vol.38 , Issue.4 , pp. 943-949
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3
  • 29
    • 84887472008 scopus 로고    scopus 로고
    • Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics
    • Liu D, Yang X, Li H (2013) Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics. Neural Comput Appl 23(7–8):1843–1850
    • (2013) Neural Comput Appl , vol.23 , Issue.7-8 , pp. 1843-1850
    • Liu, D.1    Yang, X.2    Li, H.3
  • 30
    • 84887486066 scopus 로고    scopus 로고
    • A hierarchical reinforcement learning approach for optimal path tracking of wheeled mobile robots
    • Zuo L, Xu X, Liu C, Huang Z (2013) A hierarchical reinforcement learning approach for optimal path tracking of wheeled mobile robots. Neural Comput Appl 23(7–8):1873–1883
    • (2013) Neural Comput Appl , vol.23 , Issue.7-8 , pp. 1873-1883
    • Zuo, L.1    Xu, X.2    Liu, C.3    Huang, Z.4
  • 31
    • 0344961876 scopus 로고    scopus 로고
    • Reinforcement learning on explicitly specified time scales
    • Schoknecht R, Riedmiller M (2003) Reinforcement learning on explicitly specified time scales. Neural Comput Appl 12(2):61–80
    • (2003) Neural Comput Appl , vol.12 , Issue.2 , pp. 61-80
    • Schoknecht, R.1    Riedmiller, M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.