SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn 4, Issue January, 2014, Pages 3014-3022

Weighted importance sampling for off-policy learning with linear function approximation

(3) Mahmood, A Rupam a Van Hasselt, Hado a Sutton, Richard S a

a UNIVERSITY OF ALBERTA (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; APPROXIMATION ALGORITHMS; INFORMATION SCIENCE; LEARNING ALGORITHMS; REINFORCEMENT LEARNING;

FUNCTION APPROXIMATION; LINEAR FUNCTIONS; POLICY LEARNING; POLICY MODEL; RELIABLE CONVERGENCE; TRAINING SAMPLE;

IMPORTANCE SAMPLING;

EID: 84937883130 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (166)

References (23)

1
- 0000025104
- On the choice of alternative measures in importance sampling with Markov chains
- Andradóttir, S., and Heyman, D. P., Ott, T. J. (1995). On the choice of alternative measures in importance sampling with markov chains. Operations Research, 43(3): 509-519.
- (1995) Operations Research , vol.43 , Issue.3 , pp. 509-519
- Andradóttir, S.¹ Heyman, D.P.² Ott, T.J.³

2
- 61849106433
- Projected equation methods for approximate solution of large linear systems
- Bertsekas, D. P., Yu, H. (2009). Projected equation methods for approximate solution of large linear systems. Journal of Computational and Applied Mathematics, 227(1): 27-50.
- (2009) Journal of Computational and Applied Mathematics , vol.227 , Issue.1 , pp. 27-50
- Bertsekas, D.P.¹ Yu, H.²

3
- 0038595396
- Least-squares temporal difference learning
- Boyan, J. A. (1999). Least-squares temporal difference learning. In Proceedings of the 17th International Conference, pp. 49-56.
- (1999) Proceedings of the 17th International Conference , pp. 49-56
- Boyan, J.A.¹

4
- 0032221050
- Post-processing accept-reject samples: Recycling and rescaling
- Casella, G., Robert, C. P. (1998). Post-processing accept-reject samples: recycling and rescaling. Journal of Computational and Graphical Statistics, 7(2): 139-157.
- (1998) Journal of Computational and Graphical Statistics , vol.7 , Issue.2 , pp. 139-157
- Casella, G.¹ Robert, C.P.²

5
- 84899800132
- Policy evaluation with temporal differences: A survey and comparison
- Dann, C., Neumann, G., Peters, J. (2014). Policy evaluation with temporal differences: a survey and comparison. Journal of Machine Learning Research, 15: 809-883.
- (2014) Journal of Machine Learning Research , vol.15 , pp. 809-883
- Dann, C.¹ Neumann, G.² Peters, J.³

6
- 84897081792
- Off-policy learning with eligibility traces: A survey
- Geist, M., Scherrer, B. (2014). Off-policy learning with eligibility traces: A survey. Journal of Machine Learning Research, 15: 289-333.
- (2014) Journal of Machine Learning Research , vol.15 , pp. 289-333
- Geist, M.¹ Scherrer, B.²

7
- 70549113878
- Adaptive importance sampling for value function approximation in off-policy reinforcement learning
- Hachiya, H., Akiyama, T., Sugiayma, M., Peters, J. (2009). Adaptive importance sampling for value function approximation in off-policy reinforcement learning. Neural Networks, 22(10): 1399-1410.
- (2009) Neural Networks , vol.22 , Issue.10 , pp. 1399-1410
- Hachiya, H.¹ Akiyama, T.² Sugiayma, M.³ Peters, J.⁴

8
- 84855251060
- Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition
- Hachiya, H., Sugiyama, M., Ueda, N. (2012). Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition. Neurocomputing, 80: 93-101.
- (2012) Neurocomputing , vol.80 , pp. 93-101
- Hachiya, H.¹ Sugiyama, M.² Ueda, N.³

9
- 0141812007
- Ph.D. Dissertation, Statistics Department, Stanford University
- Hesterberg, T. C. (1988), Advances in importance sampling, Ph.D. Dissertation, Statistics Department, Stanford University.
- (1988) Advances in Importance Sampling
- Hesterberg, T.C.¹

10
- 0001432119
- Methods of reducing sample size in Monte Carlo computations
- Kahn, H., Marshall, A. W. (1953). Methods of reducing sample size in Monte Carlo computations. In Journal of the Operations Research Society of America, 1(5): 263-278.
- (1953) Journal of the Operations Research Society of America , vol.1 , Issue.5 , pp. 263-278
- Kahn, H.¹ Marshall, A.W.²

11
- 70649111792
- MIT Press, 2009
- Koller, D., Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
- (2009) Probabilistic Graphical Models: Principles and Techniques
- Koller, D.¹ Friedman, N.²

12
- 0004182828
- Berlin, Springer-Verlag
- Liu, J. S. (2001). Monte Carlo strategies in scientific computing. Berlin, Springer-Verlag.
- (2001) Monte Carlo Strategies in Scientific Computing
- Liu, J.S.¹

13
- 77954101982
- GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
- Atlantis Press
- Maei, H. R., Sutton, R. S. (2010). GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Proceedings of the Third Conference on Artificial General Intelligence, pp. 91-96. Atlantis Press.
- (2010) Proceedings of the Third Conference on Artificial General Intelligence , pp. 91-96
- Maei, H.R.¹ Sutton, R.S.²

14
- 84864655352
- PhD thesis, University of Alberta
- Maei, H. R. (2011). Gradient temporal-difference learning algorithms. PhD thesis, University of Alberta.
- (2011) Gradient Temporal-difference Learning Algorithms
- Maei, H.R.¹

15
- 0242393653
- Eligibility traces for off-policy policy evaluation
- Morgan Kaufmann
- Precup, D., and Sutton, R. S., Singh, S. (2000). Eligibility traces for off-policy policy evaluation. In Proceedings of the 17th International Conference on Machine Learning, pp. 759-766. Morgan Kaufmann.
- (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 759-766
- Precup, D.¹ Sutton, R.S.² Singh, S.³

16
- 4644328593
- Off-policy temporal-difference learning with function approximation
- Precup, D., and Sutton, R. S., Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. In Proceedings of the 18th International Conference on Machine Learning.
- (2001) Proceedings of the 18th International Conference on Machine Learning
- Precup, D.¹ Sutton, R.S.² Dasgupta, S.³

17
- 0003919677
- New York, Springer-Verlag
- Robert, C. P., and Casella, G., (2004). Monte Carlo Statistical Methods, New York, Springer-Verlag.
- (2004) Monte Carlo Statistical Methods
- Robert, C.P.¹ Casella, G.²

18
- 0004080531
- New York, Wiley
- Rubinstein, R. Y. (1981). Simulation and the Monte Carlo Method, New York, Wiley.
- (1981) Simulation and the Monte Carlo Method
- Rubinstein, R.Y.¹

19
- 0005942760
- PhD thesis, Massachusetts Institute of Technology
- Shelton, C. R. (2001). Importance Sampling for Reinforcement Learning with Multiple Objectives. PhD thesis, Massachusetts Institute of Technology.
- (2001) Importance Sampling for Reinforcement Learning with Multiple Objectives
- Shelton, C.R.¹

20
- 0037527188
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2): 227-244.
- (2000) Journal of Statistical Planning and Inference , vol.90 , Issue.2 , pp. 227-244
- Shimodaira, H.¹

21
- 0004102479
- MIT Press
- Sutton, R. S., Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

22
- 84919913727
- A new Q(λ) with interim forward view and Monte Carlo equivalence
- Beijing, China
- Sutton, R. S., and Mahmood, A. R., Precup, D., van Hasselt, H. (2014). A new Q(λ) with interim forward view and Monte Carlo equivalence. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China.
- (2014) Proceedings of the 31st International Conference on Machine Learning
- Sutton, R.S.¹ Mahmood, A.R.² Precup, D.³ Van Hasselt, H.⁴

23
- 77956517288
- Convergence of least squares temporal difference methods under general conditions
- Yu, H. (2010). Convergence of least squares temporal difference methods under general conditions. In Proceedings of the 27th International Conference on Machine Learning, pp. 1207-1214.
- (2010) Proceedings of the 27th International Conference on Machine Learning , pp. 1207-1214
- Yu, H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.