SCOPUS 정보 검색 플랫폼

Volumn 47, Issue 10, 2002, Pages 1624-1636

Kernel-based reinforcement learning in average-cost problems

a Marshall Wace Asset Management (United Kingdom)

Author keywords

Average cost problem; Dynamic programming; Kernel smoothing; Local averaging; Markov decision process (MDP); Perturbation theory; Policy iteration; Reinforcement learning; Temporal difference learning

Indexed keywords

APPROXIMATION THEORY; DECISION THEORY; DYNAMIC PROGRAMMING; ITERATIVE METHODS; LEARNING SYSTEMS; MARKOV PROCESSES;

REINFORCEMENT LEARNING;

OPTIMAL CONTROL SYSTEMS;

EID: 0036804005 PISSN: 00189286 EISSN: None Source Type: Journal
DOI: 10.1109/TAC.2002.803530 Document Type: Article

Times cited : (58)

References (30)

1
- 33847202724
- Learning to predict by the methods of temporal differences
- R.S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, pp. 9-44, 1988.
- (1988) Mach. Learn. , vol.3 , pp. 9-44
- Sutton, R.S.¹

2
- 0003565783
- Belmont, MA: Athena Scientific
- D.P. Bertsekas, Dynamic Programming and Optimal Control. Belmont, MA: Athena Scientific, 1995.
- (1995) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

3
- 0029752470
- Feature-based methods for large-scale dynamic programming
- J.N. Tsitsiklis and B. Van Roy, "Feature-based methods for large-scale dynamic programming," Mach. Learn., vol. 22, pp. 59-94, 1996.
- (1996) Mach. Learn. , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

4
- 0036832956
- Kernel-based reinforcement learning
- D. Ormoneit and Ś. Sen, "Kernel-based reinforcement learning," Mach. Learn., vol. 49, pp. 161-178, 2002.
- (2002) Mach. Learn. , vol.49 , pp. 161-178
- Ormoneit, D.¹ Sen, Ś.²

5
- 0027557742
- Discrete-time controlled Markov processes with average cost criterion: A survey
- A. Arapostathis, V.S. Borkhar, E. Fernández-Gaucherand, M.K. Ghosh, and S.I. Marcus, "Discrete-time controlled Markov processes with average cost criterion: A survey," SIAM J. Control Optim., vol. 31, no. 2, pp. 282-344, 1993.
- (1993) SIAM J. Control Optim. , vol.31 , Issue.2 , pp. 282-344
- Arapostathis, A.¹ Borkhar, V.S.² Fernández-Gaucherand, E.³ Ghosh, M.K.⁴ Marcus, S.I.⁵

6
- 0031344030
- The policy iteration algorithm for average reward Markov decision processes with general state space
- Oct.
- S.P. Meyn, "The policy iteration algorithm for average reward Markov decision processes with general state space," IEEE Trans. Automat. Contr., vol. 42, pp. 1382-1393, Oct. 1997.
- (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 1382-1393
- Meyn, S.P.¹

7
- 0001509947
- Using randomization to break the curse of dimensionality
- J. Rust, "Using randomization to break the curse of dimensionality," Econometrica, vol. 65, no. 3, pp. 487-516, 1997.
- (1997) Econometrica , vol.65 , Issue.3 , pp. 487-516
- Rust, J.¹

8
- 0003989207
- Ph.D. dissertation, Comput. Sci. Dept., Carnegie Mellon Univ., Pittsburgh, PA
- G. Gordon, "Approximate solutions to Markov decision processes," Ph.D. dissertation, Comput. Sci. Dept., Carnegie Mellon Univ., Pittsburgh, PA, 1999.
- (1999) Approximate solutions to Markov decision processes
- Gordon, G.¹

9
- 0001719501
- Stable fitted reinforcement learning
- D. Touretzky, M. Mozer, and M. Hasselmo, Eds. Cambridge, MA: MIT Press
- G.J. Gordon, "Stable fitted reinforcement learning," in Advances in Neural Information Processing Systems, D. Touretzky, M. Mozer, and M. Hasselmo, Eds. Cambridge, MA: MIT Press, 1996, vol. 8.
- (1996) Advances in Neural Information Processing Systems , vol.8
- Gordon, G.J.¹

10
- 0003981735
- Ph.D. dissertation, Div. Appl. Sci., Harvard Univ., Cambridge, MA
- W.L. Baker, "Learning via stochastic approximation in function space," Ph.D. dissertation, Div. Appl. Sci., Harvard Univ., Cambridge, MA, 1997.
- (1997) Learning via stochastic approximation in function space
- Baker, W.L.¹

11
- 0034550848
- A learning algorithm for discrete-time stochastic control
- V.S. Borkar, "A learning algorithm for discrete-time stochastic control," Probab. Eng. Inform. Sci., vol. 14, pp. 243-258, 2000.
- (2000) Probab. Eng. Inform. Sci. , vol.14 , pp. 243-258
- Borkar, V.S.¹

12
- 0036832953
- Variable resolution discretization in optimal control
- R. Munos and A. Moore, "Variable resolution discretization in optimal control," Mach. Learn., vol. 49, pp. 291-324, 2002.
- (2002) Mach. Learn. , vol.49 , pp. 291-324
- Munos, R.¹ Moore, A.²

13
- 0042758707
- Tech. Rep., Lab. Inform. Decision Systems., Mass. Inst. Technol.., Cambridge, MA, Preprint
- J.N. Tsitsiklis and V.R. Konda, "Actor-critic algorithms," Tech. Rep., Lab. Inform. Decision Systems., Mass. Inst. Technol.., Cambridge, MA, 2001, Preprint.
- (2001) Actor-critic algorithms
- Tsitsiklis, J.N.¹ Konda, V.R.²

14
- 0003637131
- New York: Springer-Verlag
- S.P. Meyn and R.L. Tweedie, Markov Chains and Stochastic Stability. New York: Springer-Verlag, 1993.
- (1993) Markov Chains and Stochastic Stability
- Meyn, S.P.¹ Tweedie, R.L.²

15
- 0000570382
- On estimating regression
- E.A. Nadaraya, "On estimating regression," Theor. Probab. Appl., vol. 9, pp. 141-142, 1964.
- (1964) Theor. Probab. Appl. , vol.9 , pp. 141-142
- Nadaraya, E.A.¹

16
- 0001762424
- Smooth regression analysis
- G.S. Watson, "Smooth regression analysis," Sankhyã Series A, vol. 26, pp. 359-372, 1964.
- (1964) Sankhyã Series A , vol.26 , pp. 359-372
- Watson, G.S.¹

17
- 0000439527
- Optimal global rates of convergence for nonparametric regression
- C.J. Stone, "Optimal global rates of convergence for nonparametric regression," Ann. Stat., vol. 10, no. 4, pp. 1040-1053, 1982.
- (1982) Ann. Stat. , vol.10 , Issue.4 , pp. 1040-1053
- Stone, C.J.¹

18
- 0004019773
- New York: Springer-Verlag
- L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag, 1996.
- (1996) A Probabilistic Theory of Pattern Recognition
- Devroye, L.¹ Györfi, L.² Lugosi, G.³

19
- 0003834629
- Philadelphia, PA: SIAM
- H. Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods. Philadelphia, PA: SIAM, 1992.
- (1992) Random Number Generation and Quasi-Monte Carlo Methods
- Niederreiter, H.¹

20
- 0003802343
- Belmont, CA: Wadsworth
- L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth, 1983.
- (1983) Classification and Regression Trees
- Breiman, L.¹ Friedman, J.H.² Olshen, R.A.³ Stone, C.J.⁴

21
- 0000388992
- Consistent nonparametric regression
- C.J. Stone, "Consistent nonparametric regression," Ann. Stat., vol. 5, no. 4, pp. 595-645, 1977.
- (1977) Ann. Stat. , vol.5 , Issue.4 , pp. 595-645
- Stone, C.J.¹

22
- 0003327481
- Kernel-based reinforcement learning in average-cost problems: An application to optimal portfolio choice
- Cambridge, MA: MIT Press
- D. Ormoneit and P.W. Glynn, "Kernel-based reinforcement learning in average-cost problems: An application to optimal portfolio choice," in Advances in Neural Information Processing Systems 13. Cambridge, MA: MIT Press, 2001.
- (2001) Advances in Neural Information Processing Systems , vol.13
- Ormoneit, D.¹ Glynn, P.W.²

23
- 0003691369
- London, U.K.: Chapman & Hall
- J. Fan and I. Gijbels, Local Polynomial Modeling and Its Applications. London, U.K.: Chapman & Hall, 1996.
- (1996) Local Polynomial Modeling and Its Applications
- Fan, J.¹ Gijbels, I.²

24
- 4243774602
- Tech. Rep., Dept. Comput. Sci., Stanford Univ., Stanford, CA
- D. Ormoneit and P.W. Glynn, "Kernel-based reinforcement learning in average-cost problems," Tech. Rep., Dept. Comput. Sci., Stanford Univ., Stanford, CA, 2001.
- (2001) Kernel-based reinforcement learning in average-cost problems
- Ormoneit, D.¹ Glynn, P.W.²

25
- 85102627959
- New York: Wiley
- M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

26
- 0031258478
- Perturbation realization, potentials, and sensitivity analysis of Markov processes
- Oct.
- X.-R. Cat, "Perturbation realization, potentials, and sensitivity analysis of Markov processes," IEEE Trans. Automat. Contr., vol. 42, pp. 1382-1393, Oct. 1997.
- (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 1382-1393
- Cat, X.-R.¹

27
- 0037079674
- Hoeffding's inequality for uniformly ergodic Markov chains
- P.W. Glynn and D. Ormoneit, "Hoeffding's inequality for uniformly ergodic Markov chains," Stat. Probab. Lett., vol. 56, pp. 143-146, 2002.
- (2002) Stat. Probab. Lett. , vol.56 , pp. 143-146
- Glynn, P.W.¹ Ormoneit, D.²

28
- 0003450542
- New York: Springer-Verlag
- V.N. Vapnik The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995.
- (1995) The Nature of Statistical Learning Theory
- Vapnik, V.N.¹

29
- 0000954384
- Optimal kernel shapes for local linear regression
- S.A. Solla, T.K. Leen, and K-R. Müller, Eds. Cambridge, MA: MIT Press
- D. Ormoneit and T. Hastie, "Optimal kernel shapes for local linear regression," in Advances in Neural Information Processing Systems 12, S.A. Solla, T.K. Leen, and K-R. Müller, Eds. Cambridge, MA: MIT Press, 2000, pp. 540-546.
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 540-546
- Ormoneit, D.¹ Hastie, T.²

30
- 0017949599
- The uniform convergence of nearest neighbor regression function estimators and their application in optimization
- Feb.
- L. Devroye, "The uniform convergence of nearest neighbor regression function estimators and their application in optimization," IEEE Trans. Inform. Theory, vol. IT-24, pp. 142-151, Feb. 1978.
- (1978) IEEE Trans. Inform. Theory , vol.IT-24 , pp. 142-151
- Devroye, L.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.