메뉴 건너뛰기




Volumn 39, Issue , 2010, Pages 483-532

Kalman temporal differences

Author keywords

[No Author keywords available]

Indexed keywords

MARKOV PROCESSES; REINFORCEMENT LEARNING;

EID: 78651465938     PISSN: None     EISSN: 10769757     Source Type: Journal    
DOI: 10.1613/jair.3077     Document Type: Article
Times cited : (85)

References (45)
  • 1
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • Antos, A., Szepesv́ari, C., & Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89-129.
    • (2008) Machine Learning , vol.71 , Issue.1 , pp. 89-129
    • Antos, A.1    Szepesv́ari, C.2    Munos, R.3
  • 6
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • Boyan, J. A. (1999). Technical Update: Least-Squares Temporal Difference Learning. Machine Learning, 49(2-3), 233-246.
    • (1999) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
    • Boyan, J.A.1
  • 7
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke, S. J., & Barto, A. G. (1996). Linear Least-Squares Algorithms for Temporal Difference Learning. Machine Learning, 22(1-3), 33-57.
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 9
    • 33646435300 scopus 로고    scopus 로고
    • A generalized kalman filter for fixed point approximation and efficient temporal-difference learning
    • Choi, D., & Van Roy, B. (2006). A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning. Discrete Event Dynamic Systems, 16, 207-239.
    • (2006) Discrete Event Dynamic Systems , vol.16 , pp. 207-239
    • Choi, D.1    Van Roy, B.2
  • 16
    • 79956274048 scopus 로고    scopus 로고
    • Revisiting natural actor-critics with value function approximation
    • Torra, V., Narukawa, Y., & Daumas, M. (Eds.) Lecture Notes in Artificial Intelligence (LNAI Per-pinya (France). Springer Verlag-Heidelberg Berlin
    • Geist, M., & Pietquin, O. (2010c). Revisiting natural actor-critics with value function approximation. In Torra, V., Narukawa, Y., & Daumas, M. (Eds.), Proceedings of 7th International Conference on Modeling Decisions for Artificial Intelligence (MDAI 2010), Vol. 6408 of Lecture Notes in Artificial Intelligence (LNAI), pp. 207-218, Per-pinya (France). Springer Verlag-Heidelberg Berlin.
    • (2010) Proceedings of 7th International Conference on Modeling Decisions for Artificial Intelligence (MDAI 2010) , vol.6408 , pp. 207-218
    • Geist, M.1    Pietquin, O.2
  • 19
    • 58449117448 scopus 로고    scopus 로고
    • Bayesian reward filtering
    • et al. S. G. (Ed.) Lecture Notes in Artificial Intelligence Springer Verlag, Lille (France)
    • Geist, M., Pietquin, O., & Fricout, G. (2008). Bayesian Reward Filtering. In et al., S. G. (Ed.), Proceedings of the European Workshop on Reinforcement Learning (EWRL 2008), Vol. 5323 of Lecture Notes in Artificial Intelligence, pp. 96-109. Springer Verlag, Lille (France).
    • (2008) Proceedings of the European Workshop on Reinforcement Learning (EWRL 2008) , vol.5323 , pp. 96-109
    • Geist, M.1    Pietquin, O.2    Fricout, G.3
  • 25
    • 20544433674 scopus 로고    scopus 로고
    • Consistent normalized least mean square filtering with noisy data matrix.
    • Jo, S., & Kim, S. W. (2005). Consistent Normalized Least Mean Square Filtering with Noisy Data Matrix. IEEE Transactions on Signal Processing, 53(6), 2112-2123.
    • (2005) IEEE Transactions on Signal Processing , vol.53 , Issue.6 , pp. 2112-2123
    • Jo, S.1    Kim, S.W.2
  • 26
    • 21244437999 scopus 로고    scopus 로고
    • Unscented filtering and nonlinear estimation
    • Julier, S. J., & Uhlmann, J. K. (2004). Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3), 401-422.
    • (2004) Proceedings of the IEEE , vol.92 , Issue.3 , pp. 401-422
    • Julier, S.J.1    Uhlmann, J.K.2
  • 27
    • 33646243319 scopus 로고    scopus 로고
    • A natural policy gradient
    • [Neural Information Processing Systems (NIPS 2001 Vancouver, British Columbia, Canada
    • Kakade, S. (2001). A natural policy gradient. In Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems (NIPS 2001), pp. 1531-1538, Vancouver, British Columbia, Canada.
    • (2001) Advances in Neural Information Processing Systems , vol.14 , pp. 1531-1538
    • Kakade, S.1
  • 28
    • 85024429815 scopus 로고
    • A new approach to linear filtering and prediction problems
    • Series D
    • Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME-Journal of Basic Engineering, 82(Series D), 35-45.
    • (1960) Transactions of the ASME-Journal of Basic Engineering , vol.82 , pp. 35-45
    • Kalman, R.E.1
  • 36
    • 38349020220 scopus 로고    scopus 로고
    • Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification.
    • De Śa, J. M., Alexandre, L. A., Duch, W., & Mandic, D. P. (Eds.) Lecture Notes in Computer Science Springer
    • Schneegaß, D., Udluft, S., & Martinetz, T. (2007). Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification.. In de Śa, J. M., Alexandre, L. A., Duch, W., & Mandic, D. P. (Eds.), ICANN, Vol. 4668 of Lecture Notes in Computer Science, pp. 109-118. Springer.
    • (2007) ICANN , vol.4668 , pp. 109-118
    • Schneegaß, D.1    Udluft, S.2    Martinetz, T.3
  • 37
    • 84899025152 scopus 로고    scopus 로고
    • Optimality of reinforcement learning algorithms with linear function approximation
    • Sigaud, O., & Buffet, O. (Eds.)(2010). Markov Decision Processes and Artificial Intelligence. Wiley-ISTE
    • Schoknecht, R. (2002). Optimality of Reinforcement Learning Algorithms with Linear Function Approximation. In Proceedings of the Conference on Neural Information Processing Systems (NIPS 15). Sigaud, O., & Buffet, O. (Eds.). (2010). Markov Decision Processes and Artificial Intelligence. Wiley-ISTE.
    • (2002) Proceedings of the Conference on Neural Information Processing Systems (NIPS 15)
    • Schoknecht, R.1
  • 43
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation.
    • Tsitsiklis, J. N., & Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Roy, B.V.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.