-
1
-
-
40849145988
-
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
-
Antos, A., Szepesv́ari, C., & Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89-129.
-
(2008)
Machine Learning
, vol.71
, Issue.1
, pp. 89-129
-
-
Antos, A.1
Szepesv́ari, C.2
Munos, R.3
-
4
-
-
85162049326
-
Incremental natural actor-critic algorithms
-
Vancouver, Canada
-
Bhatnagar, S., Sutton, R. S., Ghavamzadeh, M., & Lee, M. (2008). Incremental Natural Actor-Critic Algorithms. In Proceedings of the Twenty-First Annual Conference on Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada.
-
(2008)
Proceedings of the Twenty-First Annual Conference on Advances in Neural Information Processing Systems (NIPS)
-
-
Bhatnagar, S.1
Sutton, R.S.2
Ghavamzadeh, M.3
Lee, M.4
-
6
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
Boyan, J. A. (1999). Technical Update: Least-Squares Temporal Difference Learning. Machine Learning, 49(2-3), 233-246.
-
(1999)
Machine Learning
, vol.49
, Issue.2-3
, pp. 233-246
-
-
Boyan, J.A.1
-
7
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
Bradtke, S. J., & Barto, A. G. (1996). Linear Least-Squares Algorithms for Temporal Difference Learning. Machine Learning, 22(1-3), 33-57.
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 33-57
-
-
Bradtke, S.J.1
Barto, A.G.2
-
9
-
-
33646435300
-
A generalized kalman filter for fixed point approximation and efficient temporal-difference learning
-
Choi, D., & Van Roy, B. (2006). A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning. Discrete Event Dynamic Systems, 16, 207-239.
-
(2006)
Discrete Event Dynamic Systems
, vol.16
, pp. 207-239
-
-
Choi, D.1
Van Roy, B.2
-
10
-
-
0031619316
-
-
Dearden, R., Friedman, N., & Russell, S. J. (1998). Bayesian q-learning. In AAAI/IAAI, pp. 761-768.
-
(1998)
Bayesian Q-learning AAAI/IAAI
, pp. 761-768
-
-
Dearden, R.1
Friedman, N.2
Russell, S.J.3
-
12
-
-
1942421151
-
Bayes meets bellman: The gaussian process approach to temporal difference learning
-
Engel, Y., Mannor, S., & Meir, R. (2003). Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning. In Proceedings of the International Conference on Machine Learning (ICML 2003), pp. 154-161.
-
(2003)
Proceedings of the International Conference on Machine Learning (ICML 2003
, pp. 154-161
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
16
-
-
79956274048
-
Revisiting natural actor-critics with value function approximation
-
Torra, V., Narukawa, Y., & Daumas, M. (Eds.) Lecture Notes in Artificial Intelligence (LNAI Per-pinya (France). Springer Verlag-Heidelberg Berlin
-
Geist, M., & Pietquin, O. (2010c). Revisiting natural actor-critics with value function approximation. In Torra, V., Narukawa, Y., & Daumas, M. (Eds.), Proceedings of 7th International Conference on Modeling Decisions for Artificial Intelligence (MDAI 2010), Vol. 6408 of Lecture Notes in Artificial Intelligence (LNAI), pp. 207-218, Per-pinya (France). Springer Verlag-Heidelberg Berlin.
-
(2010)
Proceedings of 7th International Conference on Modeling Decisions for Artificial Intelligence (MDAI 2010)
, vol.6408
, pp. 207-218
-
-
Geist, M.1
Pietquin, O.2
-
19
-
-
58449117448
-
Bayesian reward filtering
-
et al. S. G. (Ed.) Lecture Notes in Artificial Intelligence Springer Verlag, Lille (France)
-
Geist, M., Pietquin, O., & Fricout, G. (2008). Bayesian Reward Filtering. In et al., S. G. (Ed.), Proceedings of the European Workshop on Reinforcement Learning (EWRL 2008), Vol. 5323 of Lecture Notes in Artificial Intelligence, pp. 96-109. Springer Verlag, Lille (France).
-
(2008)
Proceedings of the European Workshop on Reinforcement Learning (EWRL 2008)
, vol.5323
, pp. 96-109
-
-
Geist, M.1
Pietquin, O.2
Fricout, G.3
-
20
-
-
67650458797
-
Kalman temporal differences: The deterministic case
-
Nashville, TN, USA
-
Geist, M., Pietquin, O., & Fricout, G. (2009a). Kalman Temporal Differences: the deterministic case. In Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, TN, USA.
-
(2009)
Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009)
-
-
Geist, M.1
Pietquin, O.2
Fricout, G.3
-
22
-
-
33750737011
-
Incremental least-squares temporal difference learning
-
Geramifard, A., Bowling, M., & Sutton, R. S. (2006). Incremental Least-Squares Temporal Difference Learning. In Proceedings of the 21st Conference, American Association for Artificial Intelligence, pp. 356-361.
-
(2006)
Proceedings of the 21st Conference, American Association for Artificial Intelligence
, pp. 356-361
-
-
Geramifard, A.1
Bowling, M.2
Sutton, R.S.3
-
23
-
-
84966204836
-
Methods for modifying matrix factorization
-
Gill, P. E., Golub, G. H., Murray, W., & Saunders, M. A. (1974). Methods for Modifying Matrix Factorization. Mathematics of Computation, 28(126), 505-535.
-
(1974)
Mathematics of Computation
, vol.28
, Issue.126
, pp. 505-535
-
-
Gill, P.E.1
Golub, G.H.2
Murray, W.3
Saunders, M.A.4
-
24
-
-
0003473120
-
-
Dover Publications, Inc., New York, NY, USA
-
Goodwin, G. C., & Sin, K. S. (2009). Adaptive Filtering Prediction and Control. Dover Publications, Inc., New York, NY, USA.
-
(2009)
Adaptive Filtering Prediction and Control
-
-
Goodwin, G.C.1
Sin, K.S.2
-
25
-
-
20544433674
-
Consistent normalized least mean square filtering with noisy data matrix.
-
Jo, S., & Kim, S. W. (2005). Consistent Normalized Least Mean Square Filtering with Noisy Data Matrix. IEEE Transactions on Signal Processing, 53(6), 2112-2123.
-
(2005)
IEEE Transactions on Signal Processing
, vol.53
, Issue.6
, pp. 2112-2123
-
-
Jo, S.1
Kim, S.W.2
-
26
-
-
21244437999
-
Unscented filtering and nonlinear estimation
-
Julier, S. J., & Uhlmann, J. K. (2004). Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3), 401-422.
-
(2004)
Proceedings of the IEEE
, vol.92
, Issue.3
, pp. 401-422
-
-
Julier, S.J.1
Uhlmann, J.K.2
-
27
-
-
33646243319
-
A natural policy gradient
-
[Neural Information Processing Systems (NIPS 2001 Vancouver, British Columbia, Canada
-
Kakade, S. (2001). A natural policy gradient. In Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems (NIPS 2001), pp. 1531-1538, Vancouver, British Columbia, Canada.
-
(2001)
Advances in Neural Information Processing Systems
, vol.14
, pp. 1531-1538
-
-
Kakade, S.1
-
28
-
-
85024429815
-
A new approach to linear filtering and prediction problems
-
Series D
-
Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME-Journal of Basic Engineering, 82(Series D), 35-45.
-
(1960)
Transactions of the ASME-Journal of Basic Engineering
, vol.82
, pp. 35-45
-
-
Kalman, R.E.1
-
30
-
-
4043069840
-
On actor-critic algorithms
-
Konda, V. R., & Tsitsiklis, J. N. (2003). On actor-critic algorithms. SIAM J. Control Optim., 42(4), 1143-1166.
-
(2003)
SIAM J. Control Optim.
, vol.42
, Issue.4
, pp. 1143-1166
-
-
Konda, V.R.1
Tsitsiklis, J.N.2
-
32
-
-
33646413135
-
Natural actor-critic
-
et al. J. G. (Ed.) Lecture Notes in Artificial Intelligence. Springer Verlag
-
Peters, J., Vijayakumar, S., & Schaal, S. (2005). Natural Actor-Critic. In et al., J. G. (Ed.), Proceedings of the European Conference on Machine Learning (ECML 2005), Lecture Notes in Artificial Intelligence. Springer Verlag.
-
(2005)
Proceedings of the European Conference on Machine Learning (ECML 2005)
-
-
Peters, J.1
Vijayakumar, S.2
Schaal, S.3
-
34
-
-
0242393653
-
Eligibility traces for off-policy policy evaluation
-
San Francisco, CA, USA. Morgan Kaufmann Publishers Inc
-
Precup, D., Sutton, R. S., & Singh, S. P. (2000). Eligibility Traces for Off-Policy Policy Evaluation. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML00), pp. 759-766, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
-
(2000)
Proceedings of the Seventeenth International Conference on Machine Learning (ICML00
, pp. 759-766
-
-
Precup, D.1
Sutton, R.S.2
Singh, S.P.3
-
36
-
-
38349020220
-
Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification.
-
De Śa, J. M., Alexandre, L. A., Duch, W., & Mandic, D. P. (Eds.) Lecture Notes in Computer Science Springer
-
Schneegaß, D., Udluft, S., & Martinetz, T. (2007). Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification.. In de Śa, J. M., Alexandre, L. A., Duch, W., & Mandic, D. P. (Eds.), ICANN, Vol. 4668 of Lecture Notes in Computer Science, pp. 109-118. Springer.
-
(2007)
ICANN
, vol.4668
, pp. 109-118
-
-
Schneegaß, D.1
Udluft, S.2
Martinetz, T.3
-
37
-
-
84899025152
-
Optimality of reinforcement learning algorithms with linear function approximation
-
Sigaud, O., & Buffet, O. (Eds.)(2010). Markov Decision Processes and Artificial Intelligence. Wiley-ISTE
-
Schoknecht, R. (2002). Optimality of Reinforcement Learning Algorithms with Linear Function Approximation. In Proceedings of the Conference on Neural Information Processing Systems (NIPS 15). Sigaud, O., & Buffet, O. (Eds.). (2010). Markov Decision Processes and Artificial Intelligence. Wiley-ISTE.
-
(2002)
Proceedings of the Conference on Neural Information Processing Systems (NIPS 15)
-
-
Schoknecht, R.1
-
39
-
-
33749255382
-
Pac model-free reinforcement learning
-
Pittsburgh, PA, USA
-
Strehl, A. L., Li, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006). Pac model-free reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), pp. 881-888, Pittsburgh, PA, USA.
-
(2006)
Proceedings of the 23rd International Conference on Machine Learning (ICML
, pp. 881-888
-
-
Strehl, A.L.1
Li, L.2
Wiewiora, E.3
Langford, J.4
Littman, M.L.5
-
41
-
-
34547991608
-
On the role of tracking in stationary environments
-
New York, NY, USA. ACM
-
Sutton, R. S., Koop, A., & Silver, D. (2007). On the role of tracking in stationary environments. In ICML '07: Proceedings of the 24th international conference on Machine learning, pp. 871-878, New York, NY, USA. ACM.
-
(2007)
ICML '07: Proceedings of the 24th International Conference on Machine Learning
, pp. 871-878
-
-
Sutton, R.S.1
Koop, A.2
Silver, D.3
-
42
-
-
0036236260
-
Instrumental variable methods for system identification
-
S̈oderstr̈om, T., & Stoica, P. (2002). Instrumental variable methods for system identification. Circuits, Systems, and Signal Processing, 21, 1-9.
-
(2002)
Circuits, Systems, and Signal Processing
, vol.21
, pp. 1-9
-
-
S̈oderstr̈om, T.1
Stoica, P.2
-
43
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation.
-
Tsitsiklis, J. N., & Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Roy, B.V.2
-
44
-
-
8344287766
-
-
Ph.D. thesis, OGI School of Science & Engineering, Oregon Health & Science University, Portland, OR, USA
-
van der Merwe, R. (2004). Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models. Ph.D. thesis, OGI School of Science & Engineering, Oregon Health & Science University, Portland, OR, USA.
-
(2004)
Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models
-
-
Van Der Merwe, R.1
|