-
1
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R.S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, pp. 9-44, 1988.
-
(1988)
Mach. Learn.
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
3
-
-
0029752470
-
Feature-based methods for large-scale dynamic programming
-
J.N. Tsitsiklis and B. Van Roy, "Feature-based methods for large-scale dynamic programming," Mach. Learn., vol. 22, pp. 59-94, 1996.
-
(1996)
Mach. Learn.
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
4
-
-
0036832956
-
Kernel-based reinforcement learning
-
D. Ormoneit and Ś. Sen, "Kernel-based reinforcement learning," Mach. Learn., vol. 49, pp. 161-178, 2002.
-
(2002)
Mach. Learn.
, vol.49
, pp. 161-178
-
-
Ormoneit, D.1
Sen, Ś.2
-
5
-
-
0027557742
-
Discrete-time controlled Markov processes with average cost criterion: A survey
-
A. Arapostathis, V.S. Borkhar, E. Fernández-Gaucherand, M.K. Ghosh, and S.I. Marcus, "Discrete-time controlled Markov processes with average cost criterion: A survey," SIAM J. Control Optim., vol. 31, no. 2, pp. 282-344, 1993.
-
(1993)
SIAM J. Control Optim.
, vol.31
, Issue.2
, pp. 282-344
-
-
Arapostathis, A.1
Borkhar, V.S.2
Fernández-Gaucherand, E.3
Ghosh, M.K.4
Marcus, S.I.5
-
6
-
-
0031344030
-
The policy iteration algorithm for average reward Markov decision processes with general state space
-
Oct.
-
S.P. Meyn, "The policy iteration algorithm for average reward Markov decision processes with general state space," IEEE Trans. Automat. Contr., vol. 42, pp. 1382-1393, Oct. 1997.
-
(1997)
IEEE Trans. Automat. Contr.
, vol.42
, pp. 1382-1393
-
-
Meyn, S.P.1
-
7
-
-
0001509947
-
Using randomization to break the curse of dimensionality
-
J. Rust, "Using randomization to break the curse of dimensionality," Econometrica, vol. 65, no. 3, pp. 487-516, 1997.
-
(1997)
Econometrica
, vol.65
, Issue.3
, pp. 487-516
-
-
Rust, J.1
-
8
-
-
0003989207
-
-
Ph.D. dissertation, Comput. Sci. Dept., Carnegie Mellon Univ., Pittsburgh, PA
-
G. Gordon, "Approximate solutions to Markov decision processes," Ph.D. dissertation, Comput. Sci. Dept., Carnegie Mellon Univ., Pittsburgh, PA, 1999.
-
(1999)
Approximate solutions to Markov decision processes
-
-
Gordon, G.1
-
9
-
-
0001719501
-
Stable fitted reinforcement learning
-
D. Touretzky, M. Mozer, and M. Hasselmo, Eds. Cambridge, MA: MIT Press
-
G.J. Gordon, "Stable fitted reinforcement learning," in Advances in Neural Information Processing Systems, D. Touretzky, M. Mozer, and M. Hasselmo, Eds. Cambridge, MA: MIT Press, 1996, vol. 8.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
-
-
Gordon, G.J.1
-
10
-
-
0003981735
-
-
Ph.D. dissertation, Div. Appl. Sci., Harvard Univ., Cambridge, MA
-
W.L. Baker, "Learning via stochastic approximation in function space," Ph.D. dissertation, Div. Appl. Sci., Harvard Univ., Cambridge, MA, 1997.
-
(1997)
Learning via stochastic approximation in function space
-
-
Baker, W.L.1
-
11
-
-
0034550848
-
A learning algorithm for discrete-time stochastic control
-
V.S. Borkar, "A learning algorithm for discrete-time stochastic control," Probab. Eng. Inform. Sci., vol. 14, pp. 243-258, 2000.
-
(2000)
Probab. Eng. Inform. Sci.
, vol.14
, pp. 243-258
-
-
Borkar, V.S.1
-
12
-
-
0036832953
-
Variable resolution discretization in optimal control
-
R. Munos and A. Moore, "Variable resolution discretization in optimal control," Mach. Learn., vol. 49, pp. 291-324, 2002.
-
(2002)
Mach. Learn.
, vol.49
, pp. 291-324
-
-
Munos, R.1
Moore, A.2
-
13
-
-
0042758707
-
-
Tech. Rep., Lab. Inform. Decision Systems., Mass. Inst. Technol.., Cambridge, MA, Preprint
-
J.N. Tsitsiklis and V.R. Konda, "Actor-critic algorithms," Tech. Rep., Lab. Inform. Decision Systems., Mass. Inst. Technol.., Cambridge, MA, 2001, Preprint.
-
(2001)
Actor-critic algorithms
-
-
Tsitsiklis, J.N.1
Konda, V.R.2
-
15
-
-
0000570382
-
On estimating regression
-
E.A. Nadaraya, "On estimating regression," Theor. Probab. Appl., vol. 9, pp. 141-142, 1964.
-
(1964)
Theor. Probab. Appl.
, vol.9
, pp. 141-142
-
-
Nadaraya, E.A.1
-
16
-
-
0001762424
-
Smooth regression analysis
-
G.S. Watson, "Smooth regression analysis," Sankhyã Series A, vol. 26, pp. 359-372, 1964.
-
(1964)
Sankhyã Series A
, vol.26
, pp. 359-372
-
-
Watson, G.S.1
-
17
-
-
0000439527
-
Optimal global rates of convergence for nonparametric regression
-
C.J. Stone, "Optimal global rates of convergence for nonparametric regression," Ann. Stat., vol. 10, no. 4, pp. 1040-1053, 1982.
-
(1982)
Ann. Stat.
, vol.10
, Issue.4
, pp. 1040-1053
-
-
Stone, C.J.1
-
20
-
-
0003802343
-
-
Belmont, CA: Wadsworth
-
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth, 1983.
-
(1983)
Classification and Regression Trees
-
-
Breiman, L.1
Friedman, J.H.2
Olshen, R.A.3
Stone, C.J.4
-
21
-
-
0000388992
-
Consistent nonparametric regression
-
C.J. Stone, "Consistent nonparametric regression," Ann. Stat., vol. 5, no. 4, pp. 595-645, 1977.
-
(1977)
Ann. Stat.
, vol.5
, Issue.4
, pp. 595-645
-
-
Stone, C.J.1
-
22
-
-
0003327481
-
Kernel-based reinforcement learning in average-cost problems: An application to optimal portfolio choice
-
Cambridge, MA: MIT Press
-
D. Ormoneit and P.W. Glynn, "Kernel-based reinforcement learning in average-cost problems: An application to optimal portfolio choice," in Advances in Neural Information Processing Systems 13. Cambridge, MA: MIT Press, 2001.
-
(2001)
Advances in Neural Information Processing Systems
, vol.13
-
-
Ormoneit, D.1
Glynn, P.W.2
-
24
-
-
4243774602
-
-
Tech. Rep., Dept. Comput. Sci., Stanford Univ., Stanford, CA
-
D. Ormoneit and P.W. Glynn, "Kernel-based reinforcement learning in average-cost problems," Tech. Rep., Dept. Comput. Sci., Stanford Univ., Stanford, CA, 2001.
-
(2001)
Kernel-based reinforcement learning in average-cost problems
-
-
Ormoneit, D.1
Glynn, P.W.2
-
26
-
-
0031258478
-
Perturbation realization, potentials, and sensitivity analysis of Markov processes
-
Oct.
-
X.-R. Cat, "Perturbation realization, potentials, and sensitivity analysis of Markov processes," IEEE Trans. Automat. Contr., vol. 42, pp. 1382-1393, Oct. 1997.
-
(1997)
IEEE Trans. Automat. Contr.
, vol.42
, pp. 1382-1393
-
-
Cat, X.-R.1
-
27
-
-
0037079674
-
Hoeffding's inequality for uniformly ergodic Markov chains
-
P.W. Glynn and D. Ormoneit, "Hoeffding's inequality for uniformly ergodic Markov chains," Stat. Probab. Lett., vol. 56, pp. 143-146, 2002.
-
(2002)
Stat. Probab. Lett.
, vol.56
, pp. 143-146
-
-
Glynn, P.W.1
Ormoneit, D.2
-
29
-
-
0000954384
-
Optimal kernel shapes for local linear regression
-
S.A. Solla, T.K. Leen, and K-R. Müller, Eds. Cambridge, MA: MIT Press
-
D. Ormoneit and T. Hastie, "Optimal kernel shapes for local linear regression," in Advances in Neural Information Processing Systems 12, S.A. Solla, T.K. Leen, and K-R. Müller, Eds. Cambridge, MA: MIT Press, 2000, pp. 540-546.
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 540-546
-
-
Ormoneit, D.1
Hastie, T.2
-
30
-
-
0017949599
-
The uniform convergence of nearest neighbor regression function estimators and their application in optimization
-
Feb.
-
L. Devroye, "The uniform convergence of nearest neighbor regression function estimators and their application in optimization," IEEE Trans. Inform. Theory, vol. IT-24, pp. 142-151, Feb. 1978.
-
(1978)
IEEE Trans. Inform. Theory
, vol.IT-24
, pp. 142-151
-
-
Devroye, L.1
|