SCOPUS 정보 검색 플랫폼

Neural Computing and Applications

Volumn 26, Issue 4, 2015, Pages 775-787

A data-based online reinforcement learning algorithm satisfying probably approximately correct principle

(2) Zhu, Yuanheng a Zhao, Dongbin a

a INSTITUTE OF GEOLOGY AND GEOPHYSICS (China)

Author keywords

Kd tree; Probably approximately correct; Reinforcement learning

Indexed keywords

APPROXIMATION ALGORITHMS; CONTINUOUS TIME SYSTEMS; E-LEARNING; LEARNING ALGORITHMS; ONLINE SYSTEMS; OPTIMAL CONTROL SYSTEMS; REINFORCEMENT LEARNING; SOCIAL NETWORKING (ONLINE);

DETERMINISTIC SYSTEMS; KD-TREE; NEAR-OPTIMAL CONTROL; ONLINE DATA; OPTIMAL CONTROL PROBLEM; PROBABLY APPROXIMATELY CORRECT; RUNNING TIME;

TREES (MATHEMATICS);

EID: 84939962659 PISSN: 09410643 EISSN: None Source Type: Journal
DOI: 10.1007/s00521-014-1738-2 Document Type: Article

Times cited : (15)

References (32)

1
- 0004102479
- MIT Press, Cambridge:
- Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
- (1998) Reinforcement learning: an introduction
- Sutton, R.S.¹ Barto, A.G.²

2
- 85046476577
- CRC Press, New York:
- Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press, New York
- (2010) Reinforcement learning and dynamic programming using function approximators
- Busoniu, L.¹ Babuska, R.² De Schutter, B.³ Ernst, D.⁴

3
- 79952438883
- A hybrid agent architecture integrating desire, intention and reinforcement learning
- Tan AH, Ong YS, Tapanuj A (2011) A hybrid agent architecture integrating desire, intention and reinforcement learning. Expert Syst Appl 38(7):8477–8487
- (2011) Expert Syst Appl , vol.38 , Issue.7 , pp. 8477-8487
- Tan, A.H.¹ Ong, Y.S.² Tapanuj, A.³

4
- 84902475773
- Tang L, Liu Y-J, Tong S (2014) Adaptive neural control using reinforcement learning for a class of robot manipulator. Neural Comput Appl 25(1):135–141
- Tang L, Liu Y-J, Tong S (2014) Adaptive neural control using reinforcement learning for a class of robot manipulator. Neural Comput Appl 25(1):135–141

5
- 84872617336
- A neural-network-based iterative GDHP approach for solving a class of nonlinear optimal control problems with control constraints
- Wang D, Liu D, Zhao D, Huang Y, Zhang D (2013) A neural-network-based iterative GDHP approach for solving a class of nonlinear optimal control problems with control constraints. Neural Comput Appl 22(2):219–227
- (2013) Neural Comput Appl , vol.22 , Issue.2 , pp. 219-227
- Wang, D.¹ Liu, D.² Zhao, D.³ Huang, Y.⁴ Zhang, D.⁵

6
- 84898013913
- Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems
- Wei Q, Liu D (2014) Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems. Neural Comput Appl 24(6):1355–1367
- (2014) Neural Comput Appl , vol.24 , Issue.6 , pp. 1355-1367
- Wei, Q.¹ Liu, D.²

7
- 84896543600
- Dual heuristic dynamic programming for nonlinear discrete-time uncertain systems with state delay
- Wang B, Zhao D, Alippi C, Liu D (2014) Dual heuristic dynamic programming for nonlinear discrete-time uncertain systems with state delay. Neurocomputing 134:222–229
- (2014) Neurocomputing , vol.134 , pp. 222-229
- Wang, B.¹ Zhao, D.² Alippi, C.³ Liu, D.⁴

8
- 0004049893
- Learning from delayed rewards
- Cambridge University, Cambridge:
- Watkins C (1989) Learning from delayed rewards. PhD thesis, Cambridge University, Cambridge
- (1989) PhD thesis
- Watkins, C.¹

9
- 0345393286
- Neural Q-learning
- ten Hagen S, Kröse B (2003) Neural Q-learning. Neural Comput Appl 12(2):81–88
- (2003) Neural Comput Appl , vol.12 , Issue.2 , pp. 81-88
- ten Hagen, S.¹ Kröse, B.²

10
- 0003636089
- On-line Q-learning using connectionist systems. Tech. Rep. TR 166
- Cambridge, England:
- Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Tech. Rep. TR 166, Cambridge University Engineering Department, Cambridge, England
- (1994) Cambridge University Engineering Department
- Rummery, G.A.¹ Niranjan, M.²

11
- 84863467146
- Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming
- Liu D, Wang D, Zhao D, Wei Q, Jin N (2012) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634
- (2012) IEEE Trans Autom Sci Eng , vol.9 , Issue.3 , pp. 628-634
- Liu, D.¹ Wang, D.² Zhao, D.³ Wei, Q.⁴ Jin, N.⁵

12
- 0002210775
- The role of exploration in learning control
- Florence, Kentucky:
- Thrun SB (1992) The role of exploration in learning control. In: White D, Sofge D (eds) Handbook for intelligent control: neural, fuzzy and adaptive approaches. Van Nostrand Reinhold, Florence, Kentucky 41022
- (1992) Handbook for intelligent control: neural, fuzzy and adaptive approaches. Van Nostrand Reinhold , pp. 41022
- Thrun, S.B.¹ White, D.² Sofge, D.³

13
- 84888019460
- Full range adaptive cruise control based on supervised adaptive dynamic programming
- Zhao D, Hu Z, Xia Z, Alippi C, Wang D (2014) Full range adaptive cruise control based on supervised adaptive dynamic programming. Neurocomputing 125:57–67
- (2014) Neurocomputing , vol.125 , pp. 57-67
- Zhao, D.¹ Hu, Z.² Xia, Z.³ Alippi, C.⁴ Wang, D.⁵

14
- 84885903360
- A supervised actor-critic approach for adaptive cruise control
- Zhao D, Wang B, Liu D (2013) A supervised actor-critic approach for adaptive cruise control. Soft Comput 17(11):2089–2099
- (2013) Soft Comput , vol.17 , Issue.11 , pp. 2089-2099
- Zhao, D.¹ Wang, B.² Liu, D.³

15
- 82455175244
- DHP for coordinated freeway ramp metering
- Zhao D, Bai X, Wang F, Xu J, Yu W (2011) DHP for coordinated freeway ramp metering. IEEE Trans Intell Transp Syst 12(4):990–999
- (2011) IEEE Trans Intell Transp Syst , vol.12 , Issue.4 , pp. 990-999
- Zhao, D.¹ Bai, X.² Wang, F.³ Xu, J.⁴ Yu, W.⁵

16
- 70350492296
- The application of ADHDP λ method to coordinated multiple ramps metering
- Bai X, Zhao D, Yi J (2009) The application of ADHDP$$(\lambda)$$(λ) method to coordinated multiple ramps metering. Int J Innov Comput 5(10(B)):3471–3481
- (2009) Int J Innov Comput , vol.5 , Issue.10B , pp. 3471-3481
- Bai, X.¹ Zhao, D.² Yi, J.³

17
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- Kearns M, Singh S (2002) Near-optimal reinforcement learning in polynomial time. Mach Learn 49(2–3):209–232
- (2002) Mach Learn , vol.49 , Issue.2-3 , pp. 209-232
- Kearns, M.¹ Singh, S.²

18
- 0041965975
- R-max—a general polynomial time algorithm for near-optimal reinforcement learning
- Brafman RI, Tennenholtz M (2003) R-max—a general polynomial time algorithm for near-optimal reinforcement learning. J Mach Learn Res 3:213–231
- (2003) J Mach Learn Res , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

19
- 31844432138
- A theoretical analysis of model-based interval estimation. In: Proceedings of 22nd international conference on machine learning (ICML’05)
- Strehl AL, Littman ML (2005) A theoretical analysis of model-based interval estimation. In: Proceedings of 22nd international conference on machine learning (ICML’05), pp 856–863
- (2005) pp 856–863
- Strehl, A.L.¹ Littman, M.L.²

20
- 33749255382
- PAC model-free reinforcement learning. In: Proceedings of 23rd international conference on machine learning (ICML’06)
- Strehl AL, Li L, Wiewiora E, Langford J, Littman ML (2006) PAC model-free reinforcement learning. In: Proceedings of 23rd international conference on machine learning (ICML’06), pp 881–888
- (2006) pp 881–888
- Strehl, A.L.¹ Li, L.² Wiewiora, E.³ Langford, J.⁴ Littman, M.L.⁵

21
- 1942452450
- Exploration in metric state spaces. In: Proceedings of 20th international conference on machine learning (ICML’03)
- Kakade S, Kearns MJ, Langford J (2003) Exploration in metric state spaces. In: Proceedings of 20th international conference on machine learning (ICML’03), pp 306–312
- (2003) pp 306–312
- Kakade, S.¹ Kearns, M.J.² Langford, J.³

22
- 84893414333
- PAC optimal exploration in continuous space markov decision processes
- Pazis J, Parr R (2013) PAC optimal exploration in continuous space markov decision processes. In: AAAI conference on artificial intelligence
- (2013) In: AAAI conference on artificial intelligence
- Pazis, J.¹ Parr, R.²

23
- 78649716899
- Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains
- Bernstein A, Shimkin N (2010) Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains. Mach Learn 81(3):359–397
- (2010) Mach Learn , vol.81 , Issue.3 , pp. 359-397
- Bernstein, A.¹ Shimkin, N.²

24
- 0036832953
- Variable resolution discretization in optimal control
- Munos R, Moore A (2002) Variable resolution discretization in optimal control. Mach Learn 49(2–3):291–323
- (2002) Mach Learn , vol.49 , Issue.2-3 , pp. 291-323
- Munos, R.¹ Moore, A.²

25
- 21844465127
- Tree-based batch mode reinforcement learning
- Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503–556
- (2005) J Mach Learn Res , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

26
- 0003487647
- Springer, Berlin:
- Preparata FP, Shamos MI (1985) Computational geometry: an introduction. Springer, Berlin
- (1985) Computational geometry: an introduction
- Preparata, F.P.¹ Shamos, M.I.²

27
- 84878421441
- Optimal control for discrete-time affine nonlinear systems using general value iteration
- Li H, Liu D (2012) Optimal control for discrete-time affine nonlinear systems using general value iteration. IET Control Theory Appl 6(18):2725–2736
- (2012) IET Control Theory Appl , vol.6 , Issue.18 , pp. 2725-2736
- Li, H.¹ Liu, D.²

28
- 49049089962
- Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof
- Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. Trans Syst Man Cyber Part B 38(4):943–949
- (2008) Trans Syst Man Cyber Part B , vol.38 , Issue.4 , pp. 943-949
- Al-Tamimi, A.¹ Lewis, F.L.² Abu-Khalaf, M.³

29
- 84887472008
- Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics
- Liu D, Yang X, Li H (2013) Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics. Neural Comput Appl 23(7–8):1843–1850
- (2013) Neural Comput Appl , vol.23 , Issue.7-8 , pp. 1843-1850
- Liu, D.¹ Yang, X.² Li, H.³

30
- 84887486066
- A hierarchical reinforcement learning approach for optimal path tracking of wheeled mobile robots
- Zuo L, Xu X, Liu C, Huang Z (2013) A hierarchical reinforcement learning approach for optimal path tracking of wheeled mobile robots. Neural Comput Appl 23(7–8):1873–1883
- (2013) Neural Comput Appl , vol.23 , Issue.7-8 , pp. 1873-1883
- Zuo, L.¹ Xu, X.² Liu, C.³ Huang, Z.⁴

31
- 0344961876
- Reinforcement learning on explicitly specified time scales
- Schoknecht R, Riedmiller M (2003) Reinforcement learning on explicitly specified time scales. Neural Comput Appl 12(2):61–80
- (2003) Neural Comput Appl , vol.12 , Issue.2 , pp. 61-80
- Schoknecht, R.¹ Riedmiller, M.²

32
- 78651494226
- Master’s thesis: Technischen Universität (University of Technology) Graz
- Neumann G (2005) The reinforcement learning toolbox: reinforcement learning for optimal control tasks. Master’s thesis, Technischen Universität (University of Technology) Graz
- (2005) The reinforcement learning toolbox: reinforcement learning for optimal control tasks
- Neumann, G.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.