메뉴 건너뛰기




Volumn 47, Issue 10, 2002, Pages 1624-1636

Kernel-based reinforcement learning in average-cost problems

Author keywords

Average cost problem; Dynamic programming; Kernel smoothing; Local averaging; Markov decision process (MDP); Perturbation theory; Policy iteration; Reinforcement learning; Temporal difference learning

Indexed keywords

APPROXIMATION THEORY; DECISION THEORY; DYNAMIC PROGRAMMING; ITERATIVE METHODS; LEARNING SYSTEMS; MARKOV PROCESSES;

EID: 0036804005     PISSN: 00189286     EISSN: None     Source Type: Journal    
DOI: 10.1109/TAC.2002.803530     Document Type: Article
Times cited : (58)

References (30)
  • 1
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R.S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, pp. 9-44, 1988.
    • (1988) Mach. Learn. , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 3
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large-scale dynamic programming
    • J.N. Tsitsiklis and B. Van Roy, "Feature-based methods for large-scale dynamic programming," Mach. Learn., vol. 22, pp. 59-94, 1996.
    • (1996) Mach. Learn. , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 4
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • D. Ormoneit and Ś. Sen, "Kernel-based reinforcement learning," Mach. Learn., vol. 49, pp. 161-178, 2002.
    • (2002) Mach. Learn. , vol.49 , pp. 161-178
    • Ormoneit, D.1    Sen, Ś.2
  • 6
    • 0031344030 scopus 로고    scopus 로고
    • The policy iteration algorithm for average reward Markov decision processes with general state space
    • Oct.
    • S.P. Meyn, "The policy iteration algorithm for average reward Markov decision processes with general state space," IEEE Trans. Automat. Contr., vol. 42, pp. 1382-1393, Oct. 1997.
    • (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 1382-1393
    • Meyn, S.P.1
  • 7
    • 0001509947 scopus 로고    scopus 로고
    • Using randomization to break the curse of dimensionality
    • J. Rust, "Using randomization to break the curse of dimensionality," Econometrica, vol. 65, no. 3, pp. 487-516, 1997.
    • (1997) Econometrica , vol.65 , Issue.3 , pp. 487-516
    • Rust, J.1
  • 8
    • 0003989207 scopus 로고    scopus 로고
    • Ph.D. dissertation, Comput. Sci. Dept., Carnegie Mellon Univ., Pittsburgh, PA
    • G. Gordon, "Approximate solutions to Markov decision processes," Ph.D. dissertation, Comput. Sci. Dept., Carnegie Mellon Univ., Pittsburgh, PA, 1999.
    • (1999) Approximate solutions to Markov decision processes
    • Gordon, G.1
  • 9
    • 0001719501 scopus 로고    scopus 로고
    • Stable fitted reinforcement learning
    • D. Touretzky, M. Mozer, and M. Hasselmo, Eds. Cambridge, MA: MIT Press
    • G.J. Gordon, "Stable fitted reinforcement learning," in Advances in Neural Information Processing Systems, D. Touretzky, M. Mozer, and M. Hasselmo, Eds. Cambridge, MA: MIT Press, 1996, vol. 8.
    • (1996) Advances in Neural Information Processing Systems , vol.8
    • Gordon, G.J.1
  • 11
    • 0034550848 scopus 로고    scopus 로고
    • A learning algorithm for discrete-time stochastic control
    • V.S. Borkar, "A learning algorithm for discrete-time stochastic control," Probab. Eng. Inform. Sci., vol. 14, pp. 243-258, 2000.
    • (2000) Probab. Eng. Inform. Sci. , vol.14 , pp. 243-258
    • Borkar, V.S.1
  • 12
    • 0036832953 scopus 로고    scopus 로고
    • Variable resolution discretization in optimal control
    • R. Munos and A. Moore, "Variable resolution discretization in optimal control," Mach. Learn., vol. 49, pp. 291-324, 2002.
    • (2002) Mach. Learn. , vol.49 , pp. 291-324
    • Munos, R.1    Moore, A.2
  • 13
    • 0042758707 scopus 로고    scopus 로고
    • Tech. Rep., Lab. Inform. Decision Systems., Mass. Inst. Technol.., Cambridge, MA, Preprint
    • J.N. Tsitsiklis and V.R. Konda, "Actor-critic algorithms," Tech. Rep., Lab. Inform. Decision Systems., Mass. Inst. Technol.., Cambridge, MA, 2001, Preprint.
    • (2001) Actor-critic algorithms
    • Tsitsiklis, J.N.1    Konda, V.R.2
  • 15
    • 0000570382 scopus 로고
    • On estimating regression
    • E.A. Nadaraya, "On estimating regression," Theor. Probab. Appl., vol. 9, pp. 141-142, 1964.
    • (1964) Theor. Probab. Appl. , vol.9 , pp. 141-142
    • Nadaraya, E.A.1
  • 16
    • 0001762424 scopus 로고
    • Smooth regression analysis
    • G.S. Watson, "Smooth regression analysis," Sankhyã Series A, vol. 26, pp. 359-372, 1964.
    • (1964) Sankhyã Series A , vol.26 , pp. 359-372
    • Watson, G.S.1
  • 17
    • 0000439527 scopus 로고
    • Optimal global rates of convergence for nonparametric regression
    • C.J. Stone, "Optimal global rates of convergence for nonparametric regression," Ann. Stat., vol. 10, no. 4, pp. 1040-1053, 1982.
    • (1982) Ann. Stat. , vol.10 , Issue.4 , pp. 1040-1053
    • Stone, C.J.1
  • 21
    • 0000388992 scopus 로고
    • Consistent nonparametric regression
    • C.J. Stone, "Consistent nonparametric regression," Ann. Stat., vol. 5, no. 4, pp. 595-645, 1977.
    • (1977) Ann. Stat. , vol.5 , Issue.4 , pp. 595-645
    • Stone, C.J.1
  • 22
    • 0003327481 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning in average-cost problems: An application to optimal portfolio choice
    • Cambridge, MA: MIT Press
    • D. Ormoneit and P.W. Glynn, "Kernel-based reinforcement learning in average-cost problems: An application to optimal portfolio choice," in Advances in Neural Information Processing Systems 13. Cambridge, MA: MIT Press, 2001.
    • (2001) Advances in Neural Information Processing Systems , vol.13
    • Ormoneit, D.1    Glynn, P.W.2
  • 26
    • 0031258478 scopus 로고    scopus 로고
    • Perturbation realization, potentials, and sensitivity analysis of Markov processes
    • Oct.
    • X.-R. Cat, "Perturbation realization, potentials, and sensitivity analysis of Markov processes," IEEE Trans. Automat. Contr., vol. 42, pp. 1382-1393, Oct. 1997.
    • (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 1382-1393
    • Cat, X.-R.1
  • 27
    • 0037079674 scopus 로고    scopus 로고
    • Hoeffding's inequality for uniformly ergodic Markov chains
    • P.W. Glynn and D. Ormoneit, "Hoeffding's inequality for uniformly ergodic Markov chains," Stat. Probab. Lett., vol. 56, pp. 143-146, 2002.
    • (2002) Stat. Probab. Lett. , vol.56 , pp. 143-146
    • Glynn, P.W.1    Ormoneit, D.2
  • 29
    • 0000954384 scopus 로고    scopus 로고
    • Optimal kernel shapes for local linear regression
    • S.A. Solla, T.K. Leen, and K-R. Müller, Eds. Cambridge, MA: MIT Press
    • D. Ormoneit and T. Hastie, "Optimal kernel shapes for local linear regression," in Advances in Neural Information Processing Systems 12, S.A. Solla, T.K. Leen, and K-R. Müller, Eds. Cambridge, MA: MIT Press, 2000, pp. 540-546.
    • (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 540-546
    • Ormoneit, D.1    Hastie, T.2
  • 30
    • 0017949599 scopus 로고
    • The uniform convergence of nearest neighbor regression function estimators and their application in optimization
    • Feb.
    • L. Devroye, "The uniform convergence of nearest neighbor regression function estimators and their application in optimization," IEEE Trans. Inform. Theory, vol. IT-24, pp. 142-151, Feb. 1978.
    • (1978) IEEE Trans. Inform. Theory , vol.IT-24 , pp. 142-151
    • Devroye, L.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.