SCOPUS 정보 검색 플랫폼

Machine Learning

Volumn 49, Issue 2-3, 2002, Pages 161-178

Kernel-based reinforcement learning

(2) Ormoneit, Dirk a Sen, Åśaunak b

a Stanford University (United States)

b JACKSON LABORATORY (United States)

Author keywords

Kernel smoothing; Kernel based learning; Lazy learning; Local averaging; Markov decision process; Reinforcement learning

Indexed keywords

APPROXIMATION THEORY; ASYMPTOTIC STABILITY; CONVERGENCE OF NUMERICAL METHODS; LEARNING ALGORITHMS; MARKOV PROCESSES; NEURAL NETWORKS; OPTIMIZATION; PARAMETER ESTIMATION; REGRESSION ANALYSIS; STATE SPACE METHODS;

BELLMANS EQUATION; GAUSSIAN PROCESS; KERNEL BASED LEARNING; LAZY LEARNING; MARKOV DECISION PROCESS; REINFORCEMENT LEARNING; TEMPORAL DIFFERENCE LEARNING;

LEARNING SYSTEMS;

EID: 0036832956 PISSN: 08856125 EISSN: None Source Type: Journal
DOI: 10.1023/A:1017928328829 Document Type: Article

Times cited : (430)

References (35)

1
- 0031073475
- Locally weighted regression for control
- (1997) Artificial Intelligence Review , vol.11 , Issue.1-5 , pp. 75-113
- Atkeson, C.G.¹ Moore, A.W.² Schaal, S.³

2
- 0003477315
- Reinforcement learning with high-dimensional, continuous actions
- Technical Report WL-TR-93-1147, Wright Laboratory, Wright-Patterson Air Force Base Ohio
- (1993)
- Baird, L.C.¹ Klopf, A.H.²

3
- 85012688561
- Englewood Cliffs, NJ: Princeton University Press
- (1957) Dynamic Programming
- Bellman, R.E.¹

4
- 0003565783
- Belmont, MA: Athena Scientific
- (1995) Dynamic Programming and Optimal Control , vol.1-2
- Bertsekas, D.P.¹

5
- 85153940465
- Generalization in reinforcement learning: Safely approximating the value function
- In G. Tesauro, D. Touretzky, & T. Leen (Eds.); Cambridge MA: The MIT Press
- (1995) Advance in Neural Information Processing Systems , vol.7 , pp. 369-376
- Boyan, J.A.¹ Moore, A.W.²

6
- 0000859970
- Reinforcement learning applied to linear quadratic regulation
- In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.); San Mateo, CA: Morgan Kaufmann
- (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 295-302
- Bradtke, S.J.¹

7
- 0040348531
- Estimating portfolio and consumption choice. A conditional Euler equations approach
- (1999) Journal of Finance , vol.54 , Issue.5 , pp. 1609-1645
- Brandt, M.W.¹

8
- 0008864314
- Learning to control a dynamic physical system
- San Mateo, CA: Morgan Kaufmann
- (1987) Sixth National Conference on Artificial Intelligence , pp. 456-460
- Connell, M.E.¹ Utgoff, P.E.²

9
- 0004019773
- Berlin: Springer-Verlag
- (1996) A Probabilistic Theory of Pattern Recognition
- Devroye, L.¹ Györfi, L.² Lugosi, G.³

10
- 0003691369
- London: Chapman & Hall
- (1996) Local Polynomial Modelling and Its Applications
- Fan, J.¹ Gijbels, I.²

11
- 0003989207
- Approximate solutions to Markov decision processes
- Ph.D. Thesis, Computer Science Department Carnegie Mellon University, Pittsburgh, PA
- (1999)
- Gordon, G.¹

12
- 84972525897
- Local regression: Autmatic kernel carpentry
- (1993) Statistical Science , vol.8 , Issue.2 , pp. 120-143
- Hastie, T.¹ Loader, C.²

13
- 0001213377
- Central limit theorems for C(S)-valued random variables
- (1975) Journal of Functional Analysis , vol.19 , pp. 216-231
- Jain, N.C.¹ Marcus, M.B.²

14
- 0003754075
- Reinforcement learning and distributed local model synthesis
- Ph.D. Thesis, Linköping University
- (1997)
- Landelius, T.¹

15
- 0003485741
- Valuing American options by simulations: A simple least-squares approach
- Technical Report 25-98, Department of Finance, UCLA
- (1998)
- Longstaff, F.A.¹ Schwartz, E.S.²

16
- 0003327481
- Kernel-based reinforcement learning in average-cost problems: An application to optimal portfolio choice
- Cambridge, MA: The MIT Press
- (2001) Advances in Neural Information Processing Systems , vol.13
- Ormoneit, D.¹ Glynn, P.W.²

17
- 0036804005
- Kernel-based reinforcement learning in average-cost problems
- accepted for publication
- (2002) IEEE Transactions on Automatic Control
- Ormoneit, D.¹ Glynn, P.W.²

18
- 0000954384
- Optimal kernel shapes for local linear regression
- In S. A. Solla, T. K. Leen, & K-R, Müller (Eds.); Cambridge, MA: The MIT Press
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 540-546
- Ormoneit, D.¹ Hastie, T.²

19
- 0008813123
- Convergence of reinforcement learning with general function approximators
- (1999) Proceedings IJCAI , pp. 974
- Papavassiliou, V.¹ Russell, S.²

20
- 0008815093
- Efficient learning and planning within the Dyna framework
- San Mateo, CA: Morgan Kaufmann
- (1995) Twelfth International Conference on Machine Learning , pp. 438-446
- Peng, J.¹ Williams, R.J.²

21
- 85102627959
- New York: Wiley
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

22
- 0000016172
- A stochastic approximation method
- (1951) Annals of Mathematical Statistics , vol.20 , pp. 400-407
- Robbins, H.¹ Monro, S.²

23
- 0001509947
- Using randomization to break the curse of dimensionality
- (1997) Econometrica , vol.65 , Issue.3 , pp. 487-516
- Rust, J.¹

24
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.); Cambridge, MA: The MIT Press
- (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 974
- Singh, S.¹ Bertsekas, D.²

25
- 0001898381
- Practical reinforcement learning in continuous spaces
- In P. Langley (Ed.); San Francisco, CA: Morgan Kaufmann
- (2000) Proceedings of the Seventeenth International Conference on Machine Learning , pp. 903-910
- Smart, W.D.¹ Kaebling, L.P.²

26
- 0000439527
- Optimal global rates of convergence for nonparametric regression
- (1982) Annals of Statistics , vol.10 , Issue.4 , pp. 1040-1053
- Stone, C.J.¹

27
- 33847202724
- Learning to predict by the methods of temporal differences
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

28
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.); Cambridge, MA: The MIT Press
- (2000) Advances in Neural Information Processing Systems , vol.12
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

29
- 0000361359
- Neurogammon wins computer olympiad
- (1989) Neural Computation , vol.1 , Issue.3 , pp. 321-323
- Tesauro, G.¹

30
- 0001546350
- Active exploration in dynamic environments
- In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.); San Mateo, CA: Morgan Kaufmann
- (1992) Advances in Neural Information Processing Systems , vol.4 , pp. 531-538
- Thrun, S.B.¹ Möller, K.²

31
- 0029752470
- Feature-based methods for large-scale dynamic programming
- (1996) Machine Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

32
- 0033351917
- Optimal stopping of Markov processes: Hilbert space theory approximation algorithms, and an application to pricing high-dimensional financial derivatives
- (1999) IEEE Transactions on Automatic Control , vol.44 , Issue.10 , pp. 1840-1851
- Tsitsiklis, J.N.¹ Van Roy, B.²

33
- 84898938510
- Actor-critic algorithms
- In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.); Cambridge, MA: The MIT Press
- (2000) Advances in Neural Information Processing Systems , vol.12
- Tsitsiklis, J.N.¹ Konda, V.R.²

34
- 34249833101
- Q-learning
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

35
- 0025229247
- Consistency of HDP applied to a simple reinforcement learning problem
- (1990) Neural Networks , vol.3 , pp. 179-189
- Werbos, P.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.