메뉴 건너뛰기




Volumn 15, Issue 6, 2011, Pages 1055-1070

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

Author keywords

Approximate policy iteration; Generalization; Learning control; Markov decision processes; Reinforcement learning

Indexed keywords

ACTION SPACES; ADAPTIVE BASIS FUNCTION; COMPUTATIONALLY EFFICIENT; CONTINUOUS SPACES; CONTINUOUS STATE; DECISION PROBLEMS; GENERALIZATION; GENERALIZATION ABILITY; KERNEL MACHINE; LEARNING CONTROL; LEARNING EFFICIENCY; LINEAR FUNCTIONS; MACHINE-LEARNING; MARKOV DECISION PROBLEM; MARKOV DECISION PROCESSES; NEAR-OPTIMAL POLICIES; OPEN PROBLEMS; OPTIMAL ACTIONS; POLICY ITERATION; POLICY SEARCH; SELECTION METHODS; SIMULATION RESULT; SPARSE APPROXIMATIONS; TEMPORAL DIFFERENCE LEARNING; VALUE FUNCTIONS;

EID: 79956192776     PISSN: 14327643     EISSN: 14337479     Source Type: Journal    
DOI: 10.1007/s00500-010-0581-3     Document Type: Article
Times cited : (22)

References (30)
  • 1
    • 0011812771 scopus 로고    scopus 로고
    • Kernel independent component analysis
    • Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3: 1-48.
    • (2002) J Mach Learn Res , vol.3 , pp. 1-48
    • Bach, F.R.1    Jordan, M.I.2
  • 2
    • 0020970738 scopus 로고
    • Neuronlike adaptive elements that can solve difficult learning control problems
    • Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13(5): 835-846.
    • (1983) IEEE Trans Syst Man Cybern , vol.13 , Issue.5 , pp. 835-846
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 3
    • 0013535965 scopus 로고    scopus 로고
    • Infinite-horizon policy-gradient estimation
    • Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15: 319-350.
    • (2001) J Artif Intell Res , vol.15 , pp. 319-350
    • Baxter, J.1    Bartlett, P.L.2
  • 5
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: least-squares temporal difference learning
    • Boyan J (2002) Technical update: least-squares temporal difference learning. Mach Learn 49(2-3): 233-246.
    • (2002) Mach Learn , vol.49 , Issue.2-3 , pp. 233-246
    • Boyan, J.1
  • 6
    • 0032208335 scopus 로고    scopus 로고
    • Elevator group control using multiple reinforcement learning agents
    • Crites RH, Barto AG (1998) Elevator group control using multiple reinforcement learning agents. Mach Learn 33(2-3): 235-262.
    • (1998) Mach Learn , vol.33 , Issue.2-3 , pp. 235-262
    • Crites, R.H.1    Barto, A.G.2
  • 7
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • Dayan P (1992) The convergence of TD(λ) for general λ. Mach Learn 8: 341-362.
    • (1992) Mach Learn , vol.8 , pp. 341-362
    • Dayan, P.1
  • 8
    • 0028388685 scopus 로고
    • TD(λ) converges with probability 1
    • Dayan P, Sejnowski TJ (1994) TD(λ) converges with probability 1. Mach Learn 14: 295-301.
    • (1994) Mach Learn , vol.14 , pp. 295-301
    • Dayan, P.1    Sejnowski, T.J.2
  • 9
    • 3543096272 scopus 로고    scopus 로고
    • The kernel recursive least-squares algorithm
    • Engel Y, Mannor S, Meir R (2004) The kernel recursive least-squares algorithm. IEEE Trans Signal Process 52(8): 2275-2285.
    • (2004) IEEE Trans Signal Process , vol.52 , Issue.8 , pp. 2275-2285
    • Engel, Y.1    Mannor, S.2    Meir, R.3
  • 12
    • 4644323293 scopus 로고    scopus 로고
    • Least-squares policy iteration
    • Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4: 1107-1149.
    • (2003) J Mach Learn Res , vol.4 , pp. 1107-1149
    • Lagoudakis, M.G.1    Parr, R.2
  • 13
    • 85015154191 scopus 로고    scopus 로고
    • Reinforcement learning in continuous action spaces through sequential Monte Carlo methods
    • MIT Press, Cambridge
    • Lazaric A, Restelli M, Bonarini A (2008) Reinforcement learning in continuous action spaces through sequential Monte Carlo methods. In: Advances in neural information processing systems. MIT Press, Cambridge.
    • (2008) Advances In Neural Information Processing Systems
    • Lazaric, A.1    Restelli, M.2    Bonarini, A.3
  • 14
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes
    • Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes. J Mach Learn Res 8: 2169-2231.
    • (2007) J Mach Learn Res , vol.8 , pp. 2169-2231
    • Mahadevan, S.1    Maggioni, M.2
  • 17
    • 84899026055 scopus 로고    scopus 로고
    • Gaussian processes in reinforcement learning
    • Thrun S, Saul LK, Schölkopf B, MIT Press, Cambridge
    • Rasmussen CE, Kuss M (2004) Gaussian processes in reinforcement learning. In: Thrun S, Saul LK, Schölkopf B (eds) Advances in neural information processing systems, vol 16. MIT Press, Cambridge, pp 751-759.
    • (2004) Advances In Neural Information Processing Systems , vol.16 , pp. 751-759
    • Rasmussen, C.E.1    Kuss, M.2
  • 19
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • Singh SP, Jaakkola T, Littman ML, Szepesvari C (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Mach Learn 38: 287-308.
    • (2000) Mach Learn , vol.38 , pp. 287-308
    • Singh, S.P.1    Jaakkola, T.2    Littman, M.L.3    Szepesvari, C.4
  • 20
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton R (1988) Learning to predict by the method of temporal differences. Mach Learn 3(1): 9-44.
    • (1988) Mach Learn , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.1
  • 21
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • MIT Press, Cambridge
    • Sutton R (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in neural information processing systems, vol 8. MIT Press, Cambridge, pp 1038-1044.
    • (1996) Advances In Neural Information Processing Systems , vol.8 , pp. 1038-1044
    • Sutton, R.1
  • 23
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves master-level play
    • Tesauro G (1994) TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6: 215-219.
    • (1994) Neural Comput , vol.6 , pp. 215-219
    • Tesauro, G.1
  • 24
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Mach Learn 16: 185-202.
    • (1994) Mach Learn , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 25
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal difference learning with function approximation
    • Tsitsiklis JN, Roy BV (1997) An analysis of temporal difference learning with function approximation. IEEE Trans Autom Control 42(5): 674-690.
    • (1997) IEEE Trans Autom Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Roy, B.V.2
  • 27
    • 33646714634 scopus 로고    scopus 로고
    • Evolutionary function approximation for reinforcement learning
    • Whiteson S, Stone P (2006) Evolutionary function approximation for reinforcement learning. J Mach Learn Res 7: 877-917.
    • (2006) J Mach Learn Res , vol.7 , pp. 877-917
    • Whiteson, S.1    Stone, P.2
  • 28
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8: 229-256.
    • (1992) Mach Learn , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 29
    • 34547098844 scopus 로고    scopus 로고
    • Kernel-based least-squares policy iteration for reinforcement learning
    • Xu X, Hu DW, Lu XC (2007) Kernel-based least-squares policy iteration for reinforcement learning. IEEE Trans Neural Netw 18(4): 973-997.
    • (2007) IEEE Trans Neural Netw , vol.18 , Issue.4 , pp. 973-997
    • Xu, X.1    Hu, D.W.2    Lu, X.C.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.