-
1
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3(1):9-44, 1988.
-
(1988)
Machine Learning
, vol.3
, Issue.1
, pp. 9-44
-
-
Sutton, R.S.1
-
2
-
-
0038595396
-
Least-squares temporal difference learning
-
Morgan Kaufmann, San Francisco, CA
-
Justin A. Boyan. Least-squares temporal difference learning. In Proc. Intl. Conf. Machine Learning, pages 49-56. Morgan Kaufmann, San Francisco, CA, 1999.
-
(1999)
Proc. Intl. Conf. Machine Learning
, pp. 49-56
-
-
Boyan, J.A.1
-
3
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
Steven J. Bradtke and Andrew G. Barto. Linear least-squares algorithms for temporal difference learning. In Machine Learning, pages 22-33, 1996.
-
(1996)
Machine Learning
, pp. 22-33
-
-
Bradtke, S.J.1
Barto, A.G.2
-
4
-
-
4644323293
-
Least-squares policy iteration
-
Michail G. Lagoudakis and Ronald Parr. Least-squares policy iteration. J. Mach. Learn. Res., 4:1107-1149, 2003.
-
(2003)
J. Mach. Learn. Res.
, vol.4
, pp. 1107-1149
-
-
Lagoudakis, M.G.1
Parr, R.2
-
5
-
-
56449092660
-
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
-
New York, NY, USA. ACM
-
Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, and Michael L. Littman. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In ICML '08: Proceedings of the 25th international conference on Machine learning, pages 752-759, New York, NY, USA, 2008. ACM.
-
(2008)
ICML '08: Proceedings of the 25th International Conference on Machine Learning
, pp. 752-759
-
-
Parr, R.1
Li, L.2
Taylor, G.3
Painter-Wakefield, C.4
Littman, M.L.5
-
6
-
-
14344256568
-
Learning low dimensional predictive representations
-
Matthew Rosencrantz, Geoffrey J. Gordon, and Sebastian Thrun. Learning low dimensional predictive representations. In Proc. ICML, 2004.
-
(2004)
Proc. ICML
-
-
Rosencrantz, M.1
Gordon, G.J.2
Thrun, S.3
-
8
-
-
85156266716
-
Value-directed compression of pomdps
-
Pascal Poupart and Craig Boutilier. Value-directed compression of pomdps. In NIPS, pages 1547-1554, 2002.
-
(2002)
NIPS
, pp. 1547-1554
-
-
Poupart, P.1
Boutilier, C.2
-
9
-
-
71149121683
-
Regularization and feature selection in least-squares temporal difference learning
-
New York, NY, USA. ACM
-
J. Zico Kolter and Andrew Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 521-528, New York, NY, USA, 2009. ACM.
-
(2009)
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
, pp. 521-528
-
-
Kolter, J.Z.1
Ng, A.Y.2
-
17
-
-
0034198996
-
Observable operator models for discrete stochastic time series
-
Herbert Jaeger. Observable operator models for discrete stochastic time series. Neural Computation, 12:1371-1398, 2000.
-
(2000)
Neural Computation
, vol.12
, pp. 1371-1398
-
-
Jaeger, H.1
-
18
-
-
31844457132
-
Predictive state representations: A new theory for modeling dynamical systems
-
Satinder Singh, Michael James, and Matthew Rudary. Predictive state representations: A new theory for modeling dynamical systems. In Proc. UAI, 2004.
-
(2004)
Proc. UAI
-
-
Singh, S.1
James, M.2
Rudary, M.3
-
21
-
-
84898066687
-
A spectral algorithm for learning hidden Markov models
-
Daniel Hsu, Sham Kakade, and Tong Zhang. A spectral algorithm for learning hidden Markov models. In COLT, 2009.
-
(2009)
COLT
-
-
Hsu, D.1
Kakade, S.2
Zhang, T.3
-
22
-
-
84860648072
-
Improving approximate value iteration using memories and predictive state representations
-
Michael R. James, Ton Wessling, and Nikos A. Vlassis. Improving approximate value iteration using memories and predictive state representations. In AAAI, 2006.
-
(2006)
AAAI
-
-
James, M.R.1
Wessling, T.2
Vlassis, N.A.3
-
24
-
-
0033351917
-
Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
-
John N. Tsitsiklis and Benjamin Van Roy. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Transactions on Automatic Control, 44:1840-1851, 1997.
-
(1997)
IEEE Transactions on Automatic Control
, vol.44
, pp. 1840-1851
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
25
-
-
33646435300
-
A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
-
David Choi and Benjamin Roy. A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynamic Systems, 16(2):207-239, 2006.
-
(2006)
Discrete Event Dynamic Systems
, vol.16
, Issue.2
, pp. 207-239
-
-
Choi, D.1
Roy, B.2
|