메뉴 건너뛰기




Volumn 4, Issue January, 2014, Pages 3014-3022

Weighted importance sampling for off-policy learning with linear function approximation

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; APPROXIMATION ALGORITHMS; INFORMATION SCIENCE; LEARNING ALGORITHMS; REINFORCEMENT LEARNING;

EID: 84937883130     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (166)

References (23)
  • 1
    • 0000025104 scopus 로고
    • On the choice of alternative measures in importance sampling with Markov chains
    • Andradóttir, S., and Heyman, D. P., Ott, T. J. (1995). On the choice of alternative measures in importance sampling with markov chains. Operations Research, 43(3): 509-519.
    • (1995) Operations Research , vol.43 , Issue.3 , pp. 509-519
    • Andradóttir, S.1    Heyman, D.P.2    Ott, T.J.3
  • 2
    • 61849106433 scopus 로고    scopus 로고
    • Projected equation methods for approximate solution of large linear systems
    • Bertsekas, D. P., Yu, H. (2009). Projected equation methods for approximate solution of large linear systems. Journal of Computational and Applied Mathematics, 227(1): 27-50.
    • (2009) Journal of Computational and Applied Mathematics , vol.227 , Issue.1 , pp. 27-50
    • Bertsekas, D.P.1    Yu, H.2
  • 5
    • 84899800132 scopus 로고    scopus 로고
    • Policy evaluation with temporal differences: A survey and comparison
    • Dann, C., Neumann, G., Peters, J. (2014). Policy evaluation with temporal differences: a survey and comparison. Journal of Machine Learning Research, 15: 809-883.
    • (2014) Journal of Machine Learning Research , vol.15 , pp. 809-883
    • Dann, C.1    Neumann, G.2    Peters, J.3
  • 6
    • 84897081792 scopus 로고    scopus 로고
    • Off-policy learning with eligibility traces: A survey
    • Geist, M., Scherrer, B. (2014). Off-policy learning with eligibility traces: A survey. Journal of Machine Learning Research, 15: 289-333.
    • (2014) Journal of Machine Learning Research , vol.15 , pp. 289-333
    • Geist, M.1    Scherrer, B.2
  • 7
    • 70549113878 scopus 로고    scopus 로고
    • Adaptive importance sampling for value function approximation in off-policy reinforcement learning
    • Hachiya, H., Akiyama, T., Sugiayma, M., Peters, J. (2009). Adaptive importance sampling for value function approximation in off-policy reinforcement learning. Neural Networks, 22(10): 1399-1410.
    • (2009) Neural Networks , vol.22 , Issue.10 , pp. 1399-1410
    • Hachiya, H.1    Akiyama, T.2    Sugiayma, M.3    Peters, J.4
  • 8
    • 84855251060 scopus 로고    scopus 로고
    • Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition
    • Hachiya, H., Sugiyama, M., Ueda, N. (2012). Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition. Neurocomputing, 80: 93-101.
    • (2012) Neurocomputing , vol.80 , pp. 93-101
    • Hachiya, H.1    Sugiyama, M.2    Ueda, N.3
  • 9
    • 0141812007 scopus 로고
    • Ph.D. Dissertation, Statistics Department, Stanford University
    • Hesterberg, T. C. (1988), Advances in importance sampling, Ph.D. Dissertation, Statistics Department, Stanford University.
    • (1988) Advances in Importance Sampling
    • Hesterberg, T.C.1
  • 13
    • 77954101982 scopus 로고    scopus 로고
    • GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
    • Atlantis Press
    • Maei, H. R., Sutton, R. S. (2010). GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Proceedings of the Third Conference on Artificial General Intelligence, pp. 91-96. Atlantis Press.
    • (2010) Proceedings of the Third Conference on Artificial General Intelligence , pp. 91-96
    • Maei, H.R.1    Sutton, R.S.2
  • 20
    • 0037527188 scopus 로고    scopus 로고
    • Improving predictive inference under covariate shift by weighting the log-likelihood function
    • Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2): 227-244.
    • (2000) Journal of Statistical Planning and Inference , vol.90 , Issue.2 , pp. 227-244
    • Shimodaira, H.1
  • 23
    • 77956517288 scopus 로고    scopus 로고
    • Convergence of least squares temporal difference methods under general conditions
    • Yu, H. (2010). Convergence of least squares temporal difference methods under general conditions. In Proceedings of the 27th International Conference on Machine Learning, pp. 1207-1214.
    • (2010) Proceedings of the 27th International Conference on Machine Learning , pp. 1207-1214
    • Yu, H.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.