-
3
-
-
79951481923
-
Convergent temporal-difference learning with arbitrary smooth function approximation
-
H. Maei, C. Szepesvari, S. Bhatnagar, D. Precup, D. Silver, and R. Sutton. Convergent temporal-difference learning with arbitrary smooth function approximation. In NIPS, pages 1204-1212, 2009.
-
(2009)
NIPS
, pp. 1204-1212
-
-
Maei, H.1
Szepesvari, C.2
Bhatnagar, S.3
Precup, D.4
Silver, D.5
Sutton, R.6
-
7
-
-
33646384929
-
Geometric variance reduction in Markov chains: Application to value function and gradient estimation
-
Rémi Munos. Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation. Journal of Machine Learning Research, 7:413-427, 2006.
-
(2006)
Journal of Machine Learning Research
, vol.7
, pp. 413-427
-
-
Munos, R.1
-
8
-
-
1942482175
-
Optimality of reinforcement learning algorithms with linear function approximation
-
Ralf Schoknecht. Optimality of reinforcement learning algorithms with linear function approximation. In NIPS, pages 1555-1562, 2002.
-
(2002)
NIPS
, pp. 1555-1562
-
-
Schoknecht, R.1
-
9
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
Leemon Baird. Residual algorithms: Reinforcement learning with function approximation. In ICML, 1995.
-
(1995)
ICML
-
-
Baird, L.1
-
10
-
-
0030691430
-
A comparison of direct and model-based reinforcement learning
-
Christopher G. Atkeson and Juan C. Santamaria. A Comparison of Direct and Model-Based Reinforcement Learning. In ICRA, pages 3557-3564, 1997.
-
(1997)
ICRA
, pp. 3557-3564
-
-
Atkeson, C.G.1
Santamaria, J.C.2
-
11
-
-
0029746072
-
Three connectionist implementations of dynamic programming for optimal control: A preliminary comparative analysis
-
H. Bersini and V. Gorrini. Three connectionist implementations of dynamic programming for optimal control: A preliminary comparative analysis. In Nicrosp, 1996.
-
(1996)
Nicrosp
-
-
Bersini, H.1
Gorrini, V.2
-
13
-
-
0001762424
-
Smooth regression analysis
-
G. Watson. Smooth regression analysis. Sankhya, Series, A(26):359-372, 1964.
-
(1964)
Sankhya Series A
, Issue.26
, pp. 359-372
-
-
Watson, G.1
-
14
-
-
0038595396
-
Least-squares temporal difference learning
-
San Francisco, CA, USA, Morgan Kaufmann Publishers Inc
-
Justin A. Boyan. Least-squares temporal difference learning. In ICML, pages 49-56, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
-
(1999)
ICML
, pp. 49-56
-
-
Boyan, J.A.1
-
15
-
-
71149100225
-
Kernelized value function approximation for reinforcement learning
-
New York, NY, USA, ACM
-
Taylor, Gavin and Parr, Ronald. Kernelized value function approximation for reinforcement learning. In ICML, pages 1017-1024, New York, NY, USA, 2009. ACM.
-
(2009)
ICML
, pp. 1017-1024
-
-
Gavin, T.1
Ronald, P.2
-
17
-
-
0001529784
-
Remarks on some nonparametric estimates of a density function
-
September
-
Murray Rosenblatt. Remarks on Some Nonparametric Estimates of a Density Function. The Annals of Mathematical Statistics, 27(3):832-837, September 1956.
-
(1956)
The Annals of Mathematical Statistics
, vol.27
, Issue.3
, pp. 832-837
-
-
Rosenblatt, M.1
-
18
-
-
0001473437
-
On estimation of a probability density function and mode
-
Emanuel Parzen. On Estimation of a Probability Density Function and Mode. The Annals of Mathematical Statistics, 33(3):1065-1076, 1962.
-
(1962)
The Annals of Mathematical Statistics
, vol.33
, Issue.3
, pp. 1065-1076
-
-
Parzen, E.1
-
20
-
-
1942516880
-
Error bounds for approximate policy iteration
-
Rémi Munos. Error bounds for approximate policy iteration. In ICML, pages 560-567, 2003.
-
(2003)
ICML
, pp. 560-567
-
-
Munos, R.1
-
22
-
-
85051703735
-
Consistency of the kernel density estimator: A survey
-
Dominik Wied and Rafael Weissbach. Consistency of the kernel density estimator: a survey. Statistical Papers, pages 1-21, 2010.
-
(2010)
Statistical Papers
, pp. 1-21
-
-
Wied, D.1
Weissbach, R.2
-
23
-
-
31844451013
-
Reinforcement learning with Gaussian processes
-
New York, NY, USA. ACM
-
Yaakov Engel, Shie Mannor, and Ron Meir. Reinforcement learning with Gaussian processes. In ICML, pages 201-208, New York, NY, USA, 2005. ACM.
-
(2005)
ICML
, pp. 201-208
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
24
-
-
33750328566
-
Kernel least-squares temporal difference learning
-
Xin Xu, Tau Xie, Dewen Hu, and Xicheng Lu. Kernel least-squares temporal difference learning. International Journal of Information Technology, 11:54-63, 1997.
-
(1997)
International Journal of Information Technology
, vol.11
, pp. 54-63
-
-
Xu, X.1
Xie, T.2
Hu, D.3
Lu, X.4
-
25
-
-
71149121683
-
Regularization and feature selection in least-squares temporal difference learning
-
ACM
-
J. Zico Kolter and Andrew Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In ICML, pages 521-528. ACM, 2009.
-
(2009)
ICML
, pp. 521-528
-
-
Kolter, J.Z.1
Ng, A.Y.2
-
26
-
-
84874668709
-
Model-based function approximation for reinforcement learning
-
May
-
Nicholas K. Jong and Peter Stone. Model-based function approximation for reinforcement learning. In AAMAS, May 2007.
-
(2007)
AAMAS
-
-
Jong, N.K.1
Stone, P.2
-
27
-
-
0036832956
-
Kernel-Based reinforcement learning
-
November
-
Dirk Ormoneit and Saunak Sen. Kernel-Based reinforcement learning. Machine Learning, 49(2):161-178, November 2002.
-
(2002)
Machine Learning
, vol.49
, Issue.2
, pp. 161-178
-
-
Ormoneit, D.1
Sen, S.2
|