-
7
-
-
33646435300
-
A generalized kalman filter for fixed point approximation and efficient temporal-difference learning
-
D. Choi and B. Van Roy, "A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning," Discrete Event Dynamic Systems, vol. 16, pp. 207-239, 2006.
-
(2006)
Discrete Event Dynamic Systems
, vol.16
, pp. 207-239
-
-
Choi, D.1
Van Roy, B.2
-
9
-
-
0029276036
-
Temporal difference learning and TD-gammon
-
March
-
G. Tesauro, "Temporal Difference Learning and TD-Gammon," Communications of the ACM, vol. 38, no. 3, March 1995.
-
(1995)
Communications of the ACM
, vol.38
, Issue.3
-
-
Tesauro, G.1
-
10
-
-
56449091120
-
An analysis of reinforcement learning with function approximation
-
F. S. Melo, S. P. Meyn, and M. I. Ribeiro, "An analysis of reinforcement learning with function approximation," in Proceedings of the 25th International Conference on Machine Learning, 2009, pp. 664-671.
-
(2009)
Proceedings of the 25th International Conference on Machine Learning
, pp. 664-671
-
-
Melo, F.S.1
Meyn, S.P.2
Ribeiro, M.I.3
-
11
-
-
40849145988
-
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
-
A. Antos, C. Szepesvári, and R. Munos, "Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path," Machine Learning, vol. 71, no. 1, pp. 89-129, 2008.
-
(2008)
Machine Learning
, vol.71
, Issue.1
, pp. 89-129
-
-
Antos, A.1
Szepesvári, C.2
Munos, R.3
-
13
-
-
1942421151
-
Bayes meets bellman: The gaussian process approach to temporal difference learning
-
Y. Engel, S. Mannor, and R. Meir, "Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning," in Proceedings of the International Conference on Machine Learning (ICML 03), 2003, pp. 154-161.
-
(2003)
Proceedings of the International Conference on Machine Learning (ICML 03)
, pp. 154-161
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
15
-
-
3543096272
-
The kernel recursive least squares algorithm
-
[Online]
-
Y. Engel, S. Mannor, and R. Meir, "The Kernel Recursive Least Squares Algorithm," IEEE Transactions on Signal Processing, vol. 52, pp. 2275-2285, 2004. [Online]. Available: http://www.cs.ualberta.ca/yaki/
-
(2004)
IEEE Transactions on Signal Processing
, vol.52
, pp. 2275-2285
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
17
-
-
3543103331
-
-
Adaptive Systems Lab, McMaster University, Tech. Rep.
-
Z. Chen, "Bayesian Filtering : From Kalman Filters to Particle Filters, and Beyond," Adaptive Systems Lab, McMaster University, Tech. Rep., 2003.
-
(2003)
Bayesian Filtering : From Kalman Filters to Particle Filters, and beyond
-
-
Chen, Z.1
-
19
-
-
0242393653
-
Eligibility traces for off- policy policy evaluation
-
San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
-
D. Precup, R. S. Sutton, and S. P. Singh, "Eligibility Traces for Off- Policy Policy Evaluation," in Proceedings of the Seventeenth International Conference on Machine Learning (ICML 00). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, pp. 759-766.
-
(2000)
Proceedings of the Seventeenth International Conference on Machine Learning (ICML 00)
, pp. 759-766
-
-
Precup, D.1
Sutton, R.S.2
Singh, S.P.3
-
20
-
-
67650458797
-
Kalman temporal differences: The deterministic case
-
Nashville, TN, USA, April
-
M. Geist, O. Pietquin, and G. Fricout, "Kalman Temporal Differences: the deterministic case," in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, TN, USA, April 2009.
-
(2009)
Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009)
-
-
Geist, M.1
Pietquin, O.2
Fricout, G.3
-
25
-
-
85024429815
-
A new approach to linear filtering and prediction problems
-
R. E. Kalman, "A new approach to linear filtering and prediction problems," Transactions of the ASME-Journal of Basic Engineering, vol. 82, no. Series D, pp. 35-45, 1960.
-
(1960)
Transactions of the ASME-Journal of Basic Engineering
, vol.82
, Issue.SERIES D
, pp. 35-45
-
-
Kalman, R.E.1
-
27
-
-
80052208068
-
Managing uncertainty within value function approximation in reinforcement learning
-
Active Learning and Experimental Design Workshop (collocated with AISTATS 2010) , Sardinia, Italy
-
-, "Managing Uncertainty within Value Function Approximation in Reinforcement Learning," in Active Learning and Experimental Design Workshop (collocated with AISTATS 2010), ser. Journal of Machine Learning Research - Workshop and Conference Proceedings, Sardinia, Italy, 2010.
-
(2010)
Journal of Machine Learning Research - Workshop and Conference Proceedings
-
-
Geist, M.1
Pietquin, O.2
-
28
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
S. J. Bradtke and A. G. Barto, "Linear Least-Squares algorithms for temporal difference learning," Machine Learning, vol. 22, no. 1-3, pp. 33-57, 1996. (Pubitemid 126724362)
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 33-57
-
-
Bradtke, S.J.1
-
29
-
-
0036236260
-
Instrumental variable methods for system identification
-
DOI 10.1007/BF01211647
-
T. Söderström and P. Stoica, "Instrumental variable methods for system identification," Circuits, Systems, and Signal Processing, vol. 21, pp. 1-9, 2002. (Pubitemid 34414642)
-
(2002)
Circuits, Systems, and Signal Processing
, vol.21
, Issue.1
, pp. 1-9
-
-
Soderstrom, T.1
Stoica, P.2
-
31
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
DOI 10.1023/A:1017936530646
-
J. A. Boyan, "Technical Update: Least-Squares Temporal Difference Learning," Machine Learning, vol. 49, no. 2-3, pp. 233-246, 1999. (Pubitemid 34325688)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 233-246
-
-
Boyan, J.A.1
-
32
-
-
77956517288
-
Convergence of least squares temporal difference methods under general conditions
-
H. Yu, "Convergence of Least Squares Temporal Difference Methods Under General Conditions," in International Conference on Machine Learning (ICML 2010), 2010, pp. 1207-1214.
-
(2010)
International Conference on Machine Learning (ICML 2010)
, pp. 1207-1214
-
-
Yu, H.1
-
34
-
-
71149099079
-
Fast gradient-descent methods for temporal-difference learning with linear function approximation
-
New York, NY, USA: ACM
-
R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora, "Fast gradient-descent methods for temporal-difference learning with linear function approximation," in ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. New York, NY, USA: ACM, 2009, pp. 993- 1000.
-
(2009)
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
, pp. 993-1000
-
-
Sutton, R.S.1
Maei, H.R.2
Precup, D.3
Bhatnagar, S.4
Silver, D.5
Szepesvári, C.6
Wiewiora, E.7
-
35
-
-
70349982705
-
Incremental natural actor-critic algorithms
-
Vancouver, Canada
-
S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, "Incremental natural actor-critic algorithms," in Conference on Neural Information Processing Systems (NIPS), Vancouver, Canada, 2007.
-
(2007)
Conference on Neural Information Processing Systems (NIPS)
-
-
Bhatnagar, S.1
Sutton, R.S.2
Ghavamzadeh, M.3
Lee, M.4
-
36
-
-
79951481923
-
Convergent temporal-difference learning with arbitrary smooth function approximation
-
Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds.
-
H. Maei, C. Szepesvari, S. Bhatnagar, D. Precup, D. Silver, and R. S. Sutton, "Convergent temporal-difference learning with arbitrary smooth function approximation," in Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds., 2009, pp. 1204-1212.
-
(2009)
Advances in Neural Information Processing Systems
, vol.22
, pp. 1204-1212
-
-
Maei, H.1
Szepesvari, C.2
Bhatnagar, S.3
Precup, D.4
Silver, D.5
Sutton, R.S.6
-
37
-
-
77954101982
-
GQ(λ): A general gradient algorithm for temporal-differences prediction learning with eligibility traces
-
H. R. Maei and R. S. Sutton, "GQ(λ): a general gradient algorithm for temporal-differences prediction learning with eligibility traces," in Third Conference on Artificial General Intelligence, 2010.
-
(2010)
Third Conference on Artificial General Intelligence
-
-
Maei, H.R.1
Sutton, R.S.2
-
38
-
-
77956541799
-
Toward off-policy learning control with function approximation
-
H. R. Maei, C. Szepesvari, S. Bhatnagar, and R. S. Sutton, "Toward Off-Policy Learning Control with Function Approximation," in 27th conference on Machine Learning (ICML 2010), 2010.
-
(2010)
27th Conference on Machine Learning (ICML 2010)
-
-
Maei, H.R.1
Szepesvari, C.2
Bhatnagar, S.3
Sutton, R.S.4
-
39
-
-
0001201756
-
Some studies in machine learning using the game of checkers
-
A. Samuel, "Some studies in machine learning using the game of checkers," IBM Journal on Research and Development, pp. 210-229, 1959.
-
(1959)
IBM Journal on Research and Development
, pp. 210-229
-
-
Samuel, A.1
-
41
-
-
40949107944
-
Performance bounds in Lp norm for approximate value iteration
-
R. Munos, "Performance Bounds in Lp norm for Approximate Value Iteration," SIAM Journal on Control and Optimization, 2007.
-
(2007)
SIAM Journal on Control and Optimization
-
-
Munos, R.1
-
42
-
-
0036832956
-
Kernel-based reinforcement learning
-
DOI 10.1023/A:1017928328829
-
D. Ormoneit and S. Sen, "Kernel-Based Reinforcement Learning," Machine Learning, vol. 49, pp. 161-178, 2002. (Pubitemid 34325684)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 161-178
-
-
Ormoneit, D.1
Sen, A.2
-
43
-
-
33646687423
-
Neural fitted q iteration - First experiences with a data efficient neural reinforcement learning method
-
M. Riedmiller, "Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method ," in Europeac Conference on Machine Learning (ECML), 2005.
-
(2005)
Europeac Conference on Machine Learning (ECML)
-
-
Riedmiller, M.1
-
44
-
-
21844465127
-
Tree-based batch mode reinforcement learning
-
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-Based Batch Mode Reinforcement Learning," Journal of Machine Learning Research, vol. 6, pp. 503-556, 2005.
-
(2005)
Journal of Machine Learning Research
, vol.6
, pp. 503-556
-
-
Ernst, D.1
Geurts, P.2
Wehenkel, L.3
-
46
-
-
18544370594
-
-
IEEE Press, ch. Improved Temporal Difference Methods with Linear Function Approximation
-
D. P. Bertsekas, V. Borkar, and A. Nedic, Learning and Approximate Dynamic Programming. IEEE Press, 2004, ch. Improved Temporal Difference Methods with Linear Function Approximation, pp. 231-235.
-
(2004)
Learning and Approximate Dynamic Programming
, pp. 231-235
-
-
Bertsekas, D.P.1
Borkar, V.2
Nedic, A.3
-
47
-
-
61849106433
-
Projected equation methods for approximate solution of large linear systems
-
D. P. Bertsekas and H. Yu, "Projected Equation Methods for Approximate Solution of Large Linear Systems," Journal of Computational and Applied Mathematics, vol. 227, pp. 27-50, 2007.
-
(2007)
Journal of Computational and Applied Mathematics
, vol.227
, pp. 27-50
-
-
Bertsekas, D.P.1
Yu, H.2
-
49
-
-
84927748655
-
Q-learning algorithms for optimal stopping based on least squares
-
Kos, Greece
-
H. Yu and D. P. Bertsekas, "Q-Learning Algorithms for Optimal Stopping Based on Least Squares," in Proceedings of European Control Conference, Kos, Greece, 2007.
-
(2007)
Proceedings of European Control Conference
-
-
Yu, H.1
Bertsekas, D.P.2
-
50
-
-
34547098844
-
Kernel-based least squares policy iteration for reinforcement learning
-
DOI 10.1109/TNN.2007.899161, Neural Networks for Feedback Control Systems
-
X. Xu, D. Hu, and X. Lu, "Kernel-Based Least Squares Policy Iteration for Reinforcement Learning," IEEE Transactions on Neural Networks, vol. 18, no. 4, pp. 973-992, July 2007. (Pubitemid 47098876)
-
(2007)
IEEE Transactions on Neural Networks
, vol.18
, Issue.4
, pp. 973-992
-
-
Xu, X.1
Hu, D.2
Lu, X.3
-
52
-
-
33750737011
-
Incremental least-squares temporal difference learning
-
Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
-
A. Geramifard, M. Bowling, and R. S. Sutton, "Incremental Least- Squares Temporal Difference Learning," in 21st Conference of American Association for Artificial Intelligence (AAAI 06), 2006, pp. 356-361. (Pubitemid 44705310)
-
(2006)
Proceedings of the National Conference on Artificial Intelligence
, vol.1
, pp. 356-361
-
-
Geramifard, A.1
Bowling, M.2
Sutton, R.S.3
-
53
-
-
70049096468
-
Regularized policy iteration
-
Vancouver, Canada
-
A. Farahmand, M. Ghavamzadeh, C. Szepesvári, and S. Mannor, "Regularized policy iteration," in 22nd Annual Conference on Neural Information Processing Systems (NIPS 21), Vancouver, Canada, 2008.
-
(2008)
22nd Annual Conference on Neural Information Processing Systems (NIPS 21)
-
-
Farahmand, A.1
Ghavamzadeh, M.2
Szepesvári, C.3
Mannor, S.4
|