SCOPUS 정보 검색 플랫폼

Soft Computing

Volumn 15, Issue 6, 2011, Pages 1055-1070

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

(3) Xu, Xin a Liu, Chunming a Hu, Dewen a

a NATIONAL UNIVERSITY OF DEFENSE TECHNOLOGY (China)

Author keywords

Approximate policy iteration; Generalization; Learning control; Markov decision processes; Reinforcement learning

Indexed keywords

ACTION SPACES; ADAPTIVE BASIS FUNCTION; COMPUTATIONALLY EFFICIENT; CONTINUOUS SPACES; CONTINUOUS STATE; DECISION PROBLEMS; GENERALIZATION; GENERALIZATION ABILITY; KERNEL MACHINE; LEARNING CONTROL; LEARNING EFFICIENCY; LINEAR FUNCTIONS; MACHINE-LEARNING; MARKOV DECISION PROBLEM; MARKOV DECISION PROCESSES; NEAR-OPTIMAL POLICIES; OPEN PROBLEMS; OPTIMAL ACTIONS; POLICY ITERATION; POLICY SEARCH; SELECTION METHODS; SIMULATION RESULT; SPARSE APPROXIMATIONS; TEMPORAL DIFFERENCE LEARNING; VALUE FUNCTIONS;

ARTIFICIAL INTELLIGENCE; LEARNING ALGORITHMS; MARKOV PROCESSES; OPTIMIZATION;

REINFORCEMENT LEARNING;

EID: 79956192776 PISSN: 14327643 EISSN: 14337479 Source Type: Journal
DOI: 10.1007/s00500-010-0581-3 Document Type: Article

Times cited : (22)

References (30)

1
- 0011812771
- Kernel independent component analysis
- Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3: 1-48.
- (2002) J Mach Learn Res , vol.3 , pp. 1-48
- Bach, F.R.¹ Jordan, M.I.²

2
- 0020970738
- Neuronlike adaptive elements that can solve difficult learning control problems
- Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13(5): 835-846.
- (1983) IEEE Trans Syst Man Cybern , vol.13 , Issue.5 , pp. 835-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

3
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15: 319-350.
- (2001) J Artif Intell Res , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

4
- 0004211236
- Belmont: Athena Scientific
- Bertsekas DP, Tsitsiklis JN (1996) Neurodynamic programming. Athena Scientific, Belmont.
- (1996) Neurodynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

5
- 0036832950
- Technical update: least-squares temporal difference learning
- Boyan J (2002) Technical update: least-squares temporal difference learning. Mach Learn 49(2-3): 233-246.
- (2002) Mach Learn , vol.49 , Issue.2-3 , pp. 233-246
- Boyan, J.¹

6
- 0032208335
- Elevator group control using multiple reinforcement learning agents
- Crites RH, Barto AG (1998) Elevator group control using multiple reinforcement learning agents. Mach Learn 33(2-3): 235-262.
- (1998) Mach Learn , vol.33 , Issue.2-3 , pp. 235-262
- Crites, R.H.¹ Barto, A.G.²

7
- 0000430514
- The convergence of TD(λ) for general λ
- Dayan P (1992) The convergence of TD(λ) for general λ. Mach Learn 8: 341-362.
- (1992) Mach Learn , vol.8 , pp. 341-362
- Dayan, P.¹

8
- 0028388685
- TD(λ) converges with probability 1
- Dayan P, Sejnowski TJ (1994) TD(λ) converges with probability 1. Mach Learn 14: 295-301.
- (1994) Mach Learn , vol.14 , pp. 295-301
- Dayan, P.¹ Sejnowski, T.J.²

9
- 3543096272
- The kernel recursive least-squares algorithm
- Engel Y, Mannor S, Meir R (2004) The kernel recursive least-squares algorithm. IEEE Trans Signal Process 52(8): 2275-2285.
- (2004) IEEE Trans Signal Process , vol.52 , Issue.8 , pp. 2275-2285
- Engel, Y.¹ Mannor, S.² Meir, R.³

10
- 34548807200
- Reinforcement learning in continuous action spaces
- Hasselt HV, Wiering M (2007) Reinforcement learning in continuous action spaces. In: 2007 IEEE symposium on approximate dynamic programming and reinforcement learning, pp 272-279.
- (2007) IEEE Symposium On Approximate Dynamic Programming and Reinforcement Learning , pp. 272-279
- Hasselt, H.V.¹ Wiering, M.²

11
- 0029679044
- Reinforcement learning: a survey
- Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4: 237-285.
- (1996) J Artif Intell Res , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

12
- 4644323293
- Least-squares policy iteration
- Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4: 1107-1149.
- (2003) J Mach Learn Res , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

13
- 85015154191
- Reinforcement learning in continuous action spaces through sequential Monte Carlo methods
- MIT Press, Cambridge
- Lazaric A, Restelli M, Bonarini A (2008) Reinforcement learning in continuous action spaces through sequential Monte Carlo methods. In: Advances in neural information processing systems. MIT Press, Cambridge.
- (2008) Advances In Neural Information Processing Systems
- Lazaric, A.¹ Restelli, M.² Bonarini, A.³

14
- 35748957806
- Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes
- Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes. J Mach Learn Res 8: 2169-2231.
- (2007) J Mach Learn Res , vol.8 , pp. 2169-2231
- Mahadevan, S.¹ Maggioni, M.²

15
- 0036832960
- Continuous-action q-learning
- Millan JDR, Posenato D, Dedieu E (2002) Continuous-action q-learning. Mach Learn 49(2/3): 247-265.
- (2002) Mach Learn , vol.49 , Issue.2-3 , pp. 247-265
- Millan, J.D.R.¹ Posenato, D.² Dedieu, E.³

16
- 0031236002
- Adaptive critic designs
- Prokhorov DV, Wunsch DC (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5): 997-1007.
- (1997) IEEE Trans Neural Netw , vol.8 , Issue.5 , pp. 997-1007
- Prokhorov, D.V.¹ Wunsch, D.C.²

17
- 84899026055
- Gaussian processes in reinforcement learning
- Thrun S, Saul LK, Schölkopf B, MIT Press, Cambridge
- Rasmussen CE, Kuss M (2004) Gaussian processes in reinforcement learning. In: Thrun S, Saul LK, Schölkopf B (eds) Advances in neural information processing systems, vol 16. MIT Press, Cambridge, pp 751-759.
- (2004) Advances In Neural Information Processing Systems , vol.16 , pp. 751-759
- Rasmussen, C.E.¹ Kuss, M.²

18
- 0004094721
- Cambridge: MIT Press
- Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge.
- (2002) Learning with Kernels
- Schölkopf, B.¹ Smola, A.²

19
- 0033901602
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Singh SP, Jaakkola T, Littman ML, Szepesvari C (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Mach Learn 38: 287-308.
- (2000) Mach Learn , vol.38 , pp. 287-308
- Singh, S.P.¹ Jaakkola, T.² Littman, M.L.³ Szepesvari, C.⁴

20
- 33847202724
- Learning to predict by the method of temporal differences
- Sutton R (1988) Learning to predict by the method of temporal differences. Mach Learn 3(1): 9-44.
- (1988) Mach Learn , vol.3 , Issue.1 , pp. 9-44
- Sutton, R.¹

21
- 85156221438
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- MIT Press, Cambridge
- Sutton R (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in neural information processing systems, vol 8. MIT Press, Cambridge, pp 1038-1044.
- (1996) Advances In Neural Information Processing Systems , vol.8 , pp. 1038-1044
- Sutton, R.¹

22
- 0004102479
- Cambridge: MIT Press
- Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.²

23
- 0000985504
- TD-Gammon, a self-teaching backgammon program, achieves master-level play
- Tesauro G (1994) TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6: 215-219.
- (1994) Neural Comput , vol.6 , pp. 215-219
- Tesauro, G.¹

24
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Mach Learn 16: 185-202.
- (1994) Mach Learn , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

25
- 0031143730
- An analysis of temporal difference learning with function approximation
- Tsitsiklis JN, Roy BV (1997) An analysis of temporal difference learning with function approximation. IEEE Trans Autom Control 42(5): 674-690.
- (1997) IEEE Trans Autom Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Roy, B.V.²

26
- 34249833101
- Q-learning
- Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8: 279-292.
- (1992) Mach Learn , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

27
- 33646714634
- Evolutionary function approximation for reinforcement learning
- Whiteson S, Stone P (2006) Evolutionary function approximation for reinforcement learning. J Mach Learn Res 7: 877-917.
- (2006) J Mach Learn Res , vol.7 , pp. 877-917
- Whiteson, S.¹ Stone, P.²

28
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8: 229-256.
- (1992) Mach Learn , vol.8 , pp. 229-256
- Williams, R.J.¹

29
- 34547098844
- Kernel-based least-squares policy iteration for reinforcement learning
- Xu X, Hu DW, Lu XC (2007) Kernel-based least-squares policy iteration for reinforcement learning. IEEE Trans Neural Netw 18(4): 973-997.
- (2007) IEEE Trans Neural Netw , vol.18 , Issue.4 , pp. 973-997
- Xu, X.¹ Hu, D.W.² Lu, X.C.³

30
- 84918834208
- A reinforcement learning approach to job-shop scheduling
- Morgan Kaufmann
- Zhang W, Dietterich T (1995) A reinforcement learning approach to job-shop scheduling. In: Proceedings of the fourteenth international joint conference on artificial intelligence. Morgan Kaufmann, pp 1114-1120.
- (1995) Proceedings of the fourteenth international joint conference on artificial intelligence , pp. 1114-1120
- Zhang, W.¹ Dietterich, T.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.