SCOPUS 정보 검색 플랫폼

Journal of Artificial Intelligence Research

Volumn 39, Issue , 2010, Pages 483-532

Kalman temporal differences

(2) Geist, Matthieu a Pietquin, Olivier a

a UMI Georgia Tech CNRS 2958 (France)

Author keywords

[No Author keywords available]

Indexed keywords

MARKOV PROCESSES; REINFORCEMENT LEARNING;

APPROXIMATION SCHEME; FUNCTION APPROXIMATION; MARKOV DECISION PROCESSES; NON-STATIONARITIES; NONLINEAR APPROXIMATION; STOCHASTIC TRANSITIONS; TEMPORAL DIFFERENCES; UNCERTAINTY MANAGEMENT;

STOCHASTIC SYSTEMS;

EID: 78651465938 PISSN: None EISSN: 10769757 Source Type: Journal
DOI: 10.1613/jair.3077 Document Type: Article

Times cited : (85)

References (45)

1
- 40849145988
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Antos, A., Szepesv́ari, C., & Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89-129.
- (2008) Machine Learning , vol.71 , Issue.1 , pp. 89-129
- Antos, A.¹ Szepesv́ari, C.² Munos, R.³

2
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Baird, L. C. (1995). Residual Algorithms: Reinforcement Learning with Function Approximation. In Proceedings of the International Conference on Machine Learning (ICML 95), pp. 30-37.
- (1995) Proceedings of the International Conference on Machine Learning (ICML 95 , pp. 30-37
- Baird, L.C.¹

3
- 0003487482
- Athena Scientific
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

4
- 85162049326
- Incremental natural actor-critic algorithms
- Vancouver, Canada
- Bhatnagar, S., Sutton, R. S., Ghavamzadeh, M., & Lee, M. (2008). Incremental Natural Actor-Critic Algorithms. In Proceedings of the Twenty-First Annual Conference on Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada.
- (2008) Proceedings of the Twenty-First Annual Conference on Advances in Neural Information Processing Systems (NIPS)
- Bhatnagar, S.¹ Sutton, R.S.² Ghavamzadeh, M.³ Lee, M.⁴

5
- 0003487601
- Oxford University Press, New York, USA
- Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press, New York, USA.
- (1995) Neural Networks for Pattern Recognition
- Bishop, C.M.¹

6
- 0036832950
- Technical update: Least-squares temporal difference learning
- Boyan, J. A. (1999). Technical Update: Least-Squares Temporal Difference Learning. Machine Learning, 49(2-3), 233-246.
- (1999) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
- Boyan, J.A.¹

7
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- Bradtke, S. J., & Barto, A. G. (1996). Linear Least-Squares Algorithms for Temporal Difference Learning. Machine Learning, 22(1-3), 33-57.
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

8
- 3543103331
- Tech. rep., Adaptive Systems Lab, McMaster University
- Chen, Z. (2003). Bayesian Filtering : From Kalman Filters to Particle Filters, and Beyond. Tech. rep., Adaptive Systems Lab, McMaster University.
- (2003) Bayesian Filtering : From Kalman Filters to Particle Filters, and beyond
- Chen, Z.¹

9
- 33646435300
- A generalized kalman filter for fixed point approximation and efficient temporal-difference learning
- Choi, D., & Van Roy, B. (2006). A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning. Discrete Event Dynamic Systems, 16, 207-239.
- (2006) Discrete Event Dynamic Systems , vol.16 , pp. 207-239
- Choi, D.¹ Van Roy, B.²

10
- 0031619316
- Dearden, R., Friedman, N., & Russell, S. J. (1998). Bayesian q-learning. In AAAI/IAAI, pp. 761-768.
- (1998) Bayesian Q-learning AAAI/IAAI , pp. 761-768
- Dearden, R.¹ Friedman, N.² Russell, S.J.³

11
- 31844456714
- Ph.D. thesis, Hebrew University
- Engel, Y. (2005). Algorithms and Representations for Reinforcement Learning. Ph.D. thesis, Hebrew University.
- (2005) Algorithms and Representations for Reinforcement Learning.
- Engel, Y.¹

12
- 1942421151
- Bayes meets bellman: The gaussian process approach to temporal difference learning
- Engel, Y., Mannor, S., & Meir, R. (2003). Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning. In Proceedings of the International Conference on Machine Learning (ICML 2003), pp. 154-161.
- (2003) Proceedings of the International Conference on Machine Learning (ICML 2003 , pp. 154-161
- Engel, Y.¹ Mannor, S.² Meir, R.³

13
- 31844451013
- Reinforcement learning with gaussian processes
- Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement Learning with Gaussian Processes. In Proceedings of International Conference on Machine Learning (ICML-05).
- (2005) Proceedings of International Conference on Machine Learning (ICML-05)
- Engel, Y.¹ Mannor, S.² Meir, R.³

14
- 79951485912
- Eligibility traces through colored noises
- Moscow (Russia). IEEE
- Geist, M., & Pietquin, O. (2010a). Eligibility Traces through Colored Noises. In Proceedings of the IEEE International Conference on Ultra Modern Control systems (ICUMT 2010), Moscow (Russia). IEEE.
- (2010) Proceedings of the IEEE International Conference on Ultra Modern Control Systems (ICUMT 2010)
- Geist, M.¹ Pietquin, O.²

15
- 79951474303
- Managing uncertainty within value function approximation in reinforcement learning
- Sardinia, Italy
- Geist, M., & Pietquin, O. (2010b). Managing Uncertainty within Value Function Approximation in Reinforcement Learning. In Active Learning and Experimental Design workshop (collocated with AISTATS 2010), Sardinia, Italy.
- (2010) Active Learning and Experimental Design Workshop (Collocated with AISTATS 2010)
- Geist, M.¹ Pietquin, O.²

16
- 79956274048
- Revisiting natural actor-critics with value function approximation
- Torra, V., Narukawa, Y., & Daumas, M. (Eds.) Lecture Notes in Artificial Intelligence (LNAI Per-pinya (France). Springer Verlag-Heidelberg Berlin
- Geist, M., & Pietquin, O. (2010c). Revisiting natural actor-critics with value function approximation. In Torra, V., Narukawa, Y., & Daumas, M. (Eds.), Proceedings of 7th International Conference on Modeling Decisions for Artificial Intelligence (MDAI 2010), Vol. 6408 of Lecture Notes in Artificial Intelligence (LNAI), pp. 207-218, Per-pinya (France). Springer Verlag-Heidelberg Berlin.
- (2010) Proceedings of 7th International Conference on Modeling Decisions for Artificial Intelligence (MDAI 2010) , vol.6408 , pp. 207-218
- Geist, M.¹ Pietquin, O.²

17
- 79951499926
- Statistically linearized least-squares temporal differences
- Moscow (Russia). IEEE
- Geist, M., & Pietquin, O. (2010d). Statistically Linearized Least-Squares Temporal Differences. In Proceedings of the IEEE International Conference on Ultra Modern Control systems (ICUMT 2010), Moscow (Russia). IEEE.
- (2010) Proceedings of the IEEE International Conference on Ultra Modern Control Systems (ICUMT 2010)
- Geist, M.¹ Pietquin, O.²

18
- 78449267579
- Statistically linearized recursive least squares
- Kittil̈a (Finland)
- Geist, M., & Pietquin, O. (2010e). Statistically Linearized Recursive Least Squares. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2010), Kittil̈a (Finland).
- (2010) Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2010)
- Geist, M.¹ Pietquin, O.²

19
- 58449117448
- Bayesian reward filtering
- et al. S. G. (Ed.) Lecture Notes in Artificial Intelligence Springer Verlag, Lille (France)
- Geist, M., Pietquin, O., & Fricout, G. (2008). Bayesian Reward Filtering. In et al., S. G. (Ed.), Proceedings of the European Workshop on Reinforcement Learning (EWRL 2008), Vol. 5323 of Lecture Notes in Artificial Intelligence, pp. 96-109. Springer Verlag, Lille (France).
- (2008) Proceedings of the European Workshop on Reinforcement Learning (EWRL 2008) , vol.5323 , pp. 96-109
- Geist, M.¹ Pietquin, O.² Fricout, G.³

20
- 67650458797
- Kalman temporal differences: The deterministic case
- Nashville, TN, USA
- Geist, M., Pietquin, O., & Fricout, G. (2009a). Kalman Temporal Differences: the deterministic case. In Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, TN, USA.
- (2009) Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009)
- Geist, M.¹ Pietquin, O.² Fricout, G.³

21
- 76649127744
- Tracking in reinforcement learning
- Bangkok (Thailande). Springer
- Geist, M., Pietquin, O., & Fricout, G. (2009b). Tracking in Reinforcement Learning. In Proceedings of the 16th International Conference on Neural Information Processing (ICONIP 2009), Bangkok (Thailande). Springer.
- (2009) Proceedings of the 16th International Conference on Neural Information Processing (ICONIP 2009)
- Geist, M.¹ Pietquin, O.² Fricout, G.³

22
- 33750737011
- Incremental least-squares temporal difference learning
- Geramifard, A., Bowling, M., & Sutton, R. S. (2006). Incremental Least-Squares Temporal Difference Learning. In Proceedings of the 21st Conference, American Association for Artificial Intelligence, pp. 356-361.
- (2006) Proceedings of the 21st Conference, American Association for Artificial Intelligence , pp. 356-361
- Geramifard, A.¹ Bowling, M.² Sutton, R.S.³

23
- 84966204836
- Methods for modifying matrix factorization
- Gill, P. E., Golub, G. H., Murray, W., & Saunders, M. A. (1974). Methods for Modifying Matrix Factorization. Mathematics of Computation, 28(126), 505-535.
- (1974) Mathematics of Computation , vol.28 , Issue.126 , pp. 505-535
- Gill, P.E.¹ Golub, G.H.² Murray, W.³ Saunders, M.A.⁴

24
- 0003473120
- Dover Publications, Inc., New York, NY, USA
- Goodwin, G. C., & Sin, K. S. (2009). Adaptive Filtering Prediction and Control. Dover Publications, Inc., New York, NY, USA.
- (2009) Adaptive Filtering Prediction and Control
- Goodwin, G.C.¹ Sin, K.S.²

25
- 20544433674
- Consistent normalized least mean square filtering with noisy data matrix.
- Jo, S., & Kim, S. W. (2005). Consistent Normalized Least Mean Square Filtering with Noisy Data Matrix. IEEE Transactions on Signal Processing, 53(6), 2112-2123.
- (2005) IEEE Transactions on Signal Processing , vol.53 , Issue.6 , pp. 2112-2123
- Jo, S.¹ Kim, S.W.²

26
- 21244437999
- Unscented filtering and nonlinear estimation
- Julier, S. J., & Uhlmann, J. K. (2004). Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3), 401-422.
- (2004) Proceedings of the IEEE , vol.92 , Issue.3 , pp. 401-422
- Julier, S.J.¹ Uhlmann, J.K.²

27
- 33646243319
- A natural policy gradient
- [Neural Information Processing Systems (NIPS 2001 Vancouver, British Columbia, Canada
- Kakade, S. (2001). A natural policy gradient. In Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems (NIPS 2001), pp. 1531-1538, Vancouver, British Columbia, Canada.
- (2001) Advances in Neural Information Processing Systems , vol.14 , pp. 1531-1538
- Kakade, S.¹

28
- 85024429815
- A new approach to linear filtering and prediction problems
- Series D
- Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME-Journal of Basic Engineering, 82(Series D), 35-45.
- (1960) Transactions of the ASME-Journal of Basic Engineering , vol.82 , pp. 35-45
- Kalman, R.E.¹

29
- 71149121683
- Regularization and feature selection in least-squares temporal difference learning
- Montreal Canada
- Kolter, J. Z., & Ng, A. Y. (2009). Regularization and Feature Selection in Least-Squares Temporal Difference Learning. In proceedings of the 26th International Conference on Machine Learning (ICML 2009), Montreal Canada.
- (2009) Proceedings of the 26th International Conference on Machine Learning (ICML 2009)
- Kolter, J.Z.¹ Ng, A.Y.²

30
- 4043069840
- On actor-critic algorithms
- Konda, V. R., & Tsitsiklis, J. N. (2003). On actor-critic algorithms. SIAM J. Control Optim., 42(4), 1143-1166.
- (2003) SIAM J. Control Optim. , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.R.¹ Tsitsiklis, J.N.²

31
- 4644323293
- Least-Squares policy iteration
- Lagoudakis, M. G., & Parr, R. (2003). Least-Squares Policy Iteration. Journal of Machine Learning Research, 4, 1107-1149.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

32
- 33646413135
- Natural actor-critic
- et al. J. G. (Ed.) Lecture Notes in Artificial Intelligence. Springer Verlag
- Peters, J., Vijayakumar, S., & Schaal, S. (2005). Natural Actor-Critic. In et al., J. G. (Ed.), Proceedings of the European Conference on Machine Learning (ECML 2005), Lecture Notes in Artificial Intelligence. Springer Verlag.
- (2005) Proceedings of the European Conference on Machine Learning (ECML 2005)
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

33
- 34547974097
- Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation
- Phua, C. W., & Fitch, R. (2007). Tracking Value Function Dynamics to Improve Reinforcement Learning with Piecewise Linear Function Approximation. In Proceedings of the International Conference on Machine Learning (ICML 07).
- (2007) Proceedings of the International Conference on Machine Learning (ICML 07)
- Phua, C.W.¹ Fitch, R.²

34
- 0242393653
- Eligibility traces for off-policy policy evaluation
- San Francisco, CA, USA. Morgan Kaufmann Publishers Inc
- Precup, D., Sutton, R. S., & Singh, S. P. (2000). Eligibility Traces for Off-Policy Policy Evaluation. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML00), pp. 759-766, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
- (2000) Proceedings of the Seventeenth International Conference on Machine Learning (ICML00 , pp. 759-766
- Precup, D.¹ Sutton, R.S.² Singh, S.P.³

35
- 85102627959
- Wiley-Interscience
- Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

36
- 38349020220
- Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification.
- De Śa, J. M., Alexandre, L. A., Duch, W., & Mandic, D. P. (Eds.) Lecture Notes in Computer Science Springer
- Schneegaß, D., Udluft, S., & Martinetz, T. (2007). Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification.. In de Śa, J. M., Alexandre, L. A., Duch, W., & Mandic, D. P. (Eds.), ICANN, Vol. 4668 of Lecture Notes in Computer Science, pp. 109-118. Springer.
- (2007) ICANN , vol.4668 , pp. 109-118
- Schneegaß, D.¹ Udluft, S.² Martinetz, T.³

37
- 84899025152
- Optimality of reinforcement learning algorithms with linear function approximation
- Sigaud, O., & Buffet, O. (Eds.)(2010). Markov Decision Processes and Artificial Intelligence. Wiley-ISTE
- Schoknecht, R. (2002). Optimality of Reinforcement Learning Algorithms with Linear Function Approximation. In Proceedings of the Conference on Neural Information Processing Systems (NIPS 15). Sigaud, O., & Buffet, O. (Eds.). (2010). Markov Decision Processes and Artificial Intelligence. Wiley-ISTE.
- (2002) Proceedings of the Conference on Neural Information Processing Systems (NIPS 15)
- Schoknecht, R.¹

38
- 84889830739
- Wiley & Sons
- Simon, D. (2006). Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches (1. Auflage edition). Wiley & Sons.
- (2006) Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches (1. Auflage Edition)
- Simon, D.¹

39
- 33749255382
- Pac model-free reinforcement learning
- Pittsburgh, PA, USA
- Strehl, A. L., Li, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006). Pac model-free reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), pp. 881-888, Pittsburgh, PA, USA.
- (2006) Proceedings of the 23rd International Conference on Machine Learning (ICML , pp. 881-888
- Strehl, A.L.¹ Li, L.² Wiewiora, E.³ Langford, J.⁴ Littman, M.L.⁵

40
- 0004102479
- (3rd edition) The MIT Press
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction (3rd edition). The MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

41
- 34547991608
- On the role of tracking in stationary environments
- New York, NY, USA. ACM
- Sutton, R. S., Koop, A., & Silver, D. (2007). On the role of tracking in stationary environments. In ICML '07: Proceedings of the 24th international conference on Machine learning, pp. 871-878, New York, NY, USA. ACM.
- (2007) ICML '07: Proceedings of the 24th International Conference on Machine Learning , pp. 871-878
- Sutton, R.S.¹ Koop, A.² Silver, D.³

42
- 0036236260
- Instrumental variable methods for system identification
- S̈oderstr̈om, T., & Stoica, P. (2002). Instrumental variable methods for system identification. Circuits, Systems, and Signal Processing, 21, 1-9.
- (2002) Circuits, Systems, and Signal Processing , vol.21 , pp. 1-9
- S̈oderstr̈om, T.¹ Stoica, P.²

43
- 0031143730
- An analysis of temporal-difference learning with function approximation.
- Tsitsiklis, J. N., & Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
- (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Roy, B.V.²

44
- 8344287766
- Ph.D. thesis, OGI School of Science & Engineering, Oregon Health & Science University, Portland, OR, USA
- van der Merwe, R. (2004). Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models. Ph.D. thesis, OGI School of Science & Engineering, Oregon Health & Science University, Portland, OR, USA.
- (2004) Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models
- Van Der Merwe, R.¹

45
- 84927748655
- Q-Learning algorithms for optimal stopping based on least squares
- Yu, H., & Bertsekas, D. P. (2007). Q-Learning Algorithms for Optimal Stopping Based on Least Squares. In Proceedings of European Control Conference, Kos, Greece.
- (2007) Proceedings of European Control Conference, Kos, Greece
- Yu, H.¹ Bertsekas, D.P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.