SCOPUS 정보 검색 플랫폼

Volumn 22, Issue 10, 2009, Pages 1399-1410

Adaptive importance sampling for value function approximation in off-policy reinforcement learning

(4) Hachiya, Hirotaka a Akiyama, Takayuki a Sugiayma, Masashi a Peters, Jan b

b MAX PLANCK INSTITUTE FOR BIOLOGICAL CYBERNETICS (Germany)

Author keywords

Adaptive importance sampling; Efficient sample reuse; Importance weighted cross validation; Off policy reinforcement learning; Policy iteration; Value function approximation

Indexed keywords

ADAPTIVE IMPORTANCE SAMPLING; BIAS AND VARIANCE; CROSS VALIDATION; DATA SAMPLE; EFFICIENT SAMPLE REUSE; IMPORTANCE SAMPLING; VALUE FUNCTION APPROXIMATION; VALUE FUNCTIONS;

EDUCATION; REINFORCEMENT LEARNING;

REINFORCEMENT;

ALGORITHM; ARTICLE; LEARNING; MATHEMATICAL ANALYSIS; POLICY; PRIORITY JOURNAL; PROBABILITY; REINFORCEMENT; SAMPLING; SIMULATION; VALIDATION PROCESS;

ALGORITHMS; ARTIFICIAL INTELLIGENCE; DATA INTERPRETATION, STATISTICAL; LEARNING; MARKOV CHAINS; MODELS, NEUROLOGICAL; MODELS, STATISTICAL; NEURAL NETWORKS (COMPUTER); PUBLIC POLICY; REINFORCEMENT (PSYCHOLOGY); REPRODUCIBILITY OF RESULTS;

EID: 70549113878 PISSN: 08936080 EISSN: None Source Type: Journal
DOI: 10.1016/j.neunet.2009.01.002 Document Type: Article

Times cited : (45)

References (19)

1
- 0003487482
- Athena Scientific, NH, USA
- Bertsekas P.D., and Tsitsiklis J. Neuro-dynamic programming (1996), Athena Scientific, NH, USA
- (1996) Neuro-dynamic programming
- Bertsekas, P.D.¹ Tsitsiklis, J.²

2
- 84945307039
- Non-linear swing-up and stabilizing control of an inverted pendulum system
- Bugeja, M. (2003). Non-linear swing-up and stabilizing control of an inverted pendulum system. In Proceedings of IEEE Region 8 EUROCON (pp. 437-441)
- (2003) Proceedings of IEEE Region 8 EUROCON , pp. 437-441
- Bugeja, M.¹

3
- 0003489634
- Springer-Verlag, Berlin
- Fishman G.S. Monte carlo: Concepts, algorithms, and applications (1996), Springer-Verlag, Berlin
- (1996) Monte carlo: Concepts, algorithms, and applications
- Fishman, G.S.¹

4
- 0003684449
- Springer-Verlag, New York
- Hastie T., Tibshirani R., and Friedman J. The elements of statistical learning: Data mining, inference, and precition (2001), Springer-Verlag, New York
- (2001) The elements of statistical learning: Data mining, inference, and precition
- Hastie, T.¹ Tibshirani, R.² Friedman, J.³

5
- 84898930479
- A natural policy gradient
- Kakade S. A natural policy gradient. Neural Information Processing Systems 14 (2002) 1531-1538
- (2002) Neural Information Processing Systems , vol.14 , pp. 1531-1538
- Kakade, S.¹

6
- 4644323293
- Least-squares policy iteration
- Lagoudakis M.G., and Parr R. Least-squares policy iteration. Journal of Machine Learning Research 4 Dec (2003) 1107-1149
- (2003) Journal of Machine Learning Research , vol.4 , Issue.Dec , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

7
- 18544382314
- Learning from scarce experience
- Peshkin, L., & Shelton, C. R. (2002). Learning from scarce experience. In Proceedings of international conference on machine learning (pp. 498-505)
- (2002) Proceedings of international conference on machine learning , pp. 498-505
- Peshkin, L.¹ Shelton, C.R.²

8
- 34547964788
- Reinforcement learning by reward-weighted regression for operational space control
- Peters, J., & Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control. In Proceedings of the international conference on machine learning
- (2007) Proceedings of the international conference on machine learning
- Peters, J.¹ Schaal, S.²

9
- 44949241322
- Reinforcement learning of motor skills with policy gradients
- Peters J., and Schaal S. Reinforcement learning of motor skills with policy gradients. Neural Networks 21 (2008) 682-697
- (2008) Neural Networks , vol.21 , pp. 682-697
- Peters, J.¹ Schaal, S.²

10
- 4644328593
- Off-policy temporal-difference learning with function approximation
- Precup, D., Sutton, R. S., & Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. In Proceedings of international conference on machine learning (pp. 417-424)
- (2001) Proceedings of international conference on machine learning , pp. 417-424
- Precup, D.¹ Sutton, R.S.² Dasgupta, S.³

11
- 0242393653
- Eligibility traces for off-policy policy evaluation
- Precup, D., Sutton, R. S., & Singh, S. (2000). Eligibility traces for off-policy policy evaluation. In Proceedings of international conference on machine learning (pp. 759-766)
- (2000) Proceedings of international conference on machine learning , pp. 759-766
- Precup, D.¹ Sutton, R.S.² Singh, S.³

12
- 0003436776
- Wiley, New York
- Rao C.R. Linear statistical inference and its applications (1973), Wiley, New York
- (1973) Linear statistical inference and its applications
- Rao, C.R.¹

13
- 84899025152
- Optimality of reinforcement learning algorithms with linear function approximation
- Schoknecht R. Optimality of reinforcement learning algorithms with linear function approximation. Neural Information Processing Systems 15 (2003) 1555-1562
- (2003) Neural Information Processing Systems , vol.15 , pp. 1555-1562
- Schoknecht, R.¹

14
- 18544374225
- Policy improvement for pomdps using normalized importance sampling
- Shelton, C. R. (2001). Policy improvement for pomdps using normalized importance sampling. In Proceedings of Uncertainty in Artificial Intelligence (pp. 496-503)
- (2001) Proceedings of Uncertainty in Artificial Intelligence , pp. 496-503
- Shelton, C.R.¹

15
- 0037527188
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- Shimodaira H. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90 (2000) 227-244
- (2000) Journal of Statistical Planning and Inference , vol.90 , pp. 227-244
- Shimodaira, H.¹

16
- 1842733198
- Trading variance reduction with unbiasedness: The regularized subspace information criterion for robust model selection in kernel regression
- Sugiyama M., Kawanabe M., and Müller K.-R. Trading variance reduction with unbiasedness: The regularized subspace information criterion for robust model selection in kernel regression. Neural Computation 16 (2004) 1077-1104
- (2004) Neural Computation , vol.16 , pp. 1077-1104
- Sugiyama, M.¹ Kawanabe, M.² Müller, K.-R.³

17
- 34249047899
- Covariate shift adaptation by importance weighted cross validation
- Sugiyama M., Krauledat M., and Müller K.-R. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research 8 May (2007) 985-1005
- (2007) Journal of Machine Learning Research , vol.8 , Issue.May , pp. 985-1005
- Sugiyama, M.¹ Krauledat, M.² Müller, K.-R.³

18
- 0004102479
- The MIT Press, MA, USA
- Sutton R.S., and Barto A.G. Reinforcement learning: An introduction (1998), The MIT Press, MA, USA
- (1998) Reinforcement learning: An introduction
- Sutton, R.S.¹ Barto, A.G.²

19
- 0030082891
- An approach to fuzzy control of nonlinear systems: Stability and design issues
- Wang H.O., Tanaka K., and Griffin M.F. An approach to fuzzy control of nonlinear systems: Stability and design issues. IEEE Transactions on Fuzzy Systems (1996) 14-23
- (1996) IEEE Transactions on Fuzzy Systems , pp. 14-23
- Wang, H.O.¹ Tanaka, K.² Griffin, M.F.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.