SCOPUS 정보 검색 플랫폼

Proceedings of the National Conference on Artificial Intelligence

Volumn 3, Issue , 2008, Pages 1351-1356

Adaptive importance sampling with automatic model selection in value function approximation

(4) Hachiya, Hirotaka a Akiyama, Takayuki a Sugiyama, Masashi a Peters, Jan b

a TOKYO INSTITUTE OF TECHNOLOGY (Japan)

b MAX PLANCK INSTITUTE FOR BIOLOGICAL CYBERNETICS (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

BIONICS; COMMERCE; REINFORCEMENT;

ADAPTIVE IMPORTANCE SAMPLINGS; AUTOMATIC MODEL SELECTIONS; BIAS AND VARIANCES; CROSS VALIDATIONS; DATA SAMPLES; ESSENTIAL PROBLEMS; IMPORTANCE SAMPLINGS; VALUE FUNCTIONS;

ARTIFICIAL INTELLIGENCE;

EID: 57749096203 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (16)

References (13)

1
- 84898962948
- Policy search by dynamic programming
- Bagnell, J. A.; Kakade, S.; Ng, A. Y.; and Schneider, J. 2003. Policy search by dynamic programming. In NIPS 16.
- (2003) NIPS 16
- Bagnell, J.A.¹ Kakade, S.² Ng, A.Y.³ Schneider, J.⁴

2
- 0003489634
- Berlin: Springer- Verlag
- Fishman, G. S. 1996. Monte Carlo: Concepts, Algorithms, and Applications. Berlin: Springer- Verlag.
- (1996) Monte Carlo: Concepts, Algorithms, and Applications
- Fishman, G.S.¹

3
- 84898930479
- A natural policy gradient
- Kakade, S. 2002. A natural policy gradient. In NIPS 14.
- (2002) NIPS 14
- Kakade, S.¹

4
- 4644323293
- Least-squares policy iteration
- Lagoudakis, M. G., and Parr, R. 2003. Least-squares policy iteration. JMLR 4:1107-1149.
- (2003) JMLR , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

5
- 18544382314
- Learning from scarce experience
- Peshkin, L. Shelton, C. 2002. Learning from scarce experience. In Proc. of ICML.
- (2002) Proc. of ICML
- Peshkin, L.¹ Shelton, C.²

6
- 4644328593
- Off-policy temporal-difference learning with function approximation
- Precup, D.; Sutton, R. S.; and Dasgupta, S. 2001. Off-policy temporal-difference learning with function approximation. In Proc. of ICML.
- (2001) Proc. of ICML
- Precup, D.¹ Sutton, R.S.² Dasgupta, S.³

7
- 0242393653
- Eligibility traces for off-policy policy evaluation
- Precup, D.; Sutton, R. S.; and Singh, S. 2000. Eligibility traces for off-policy policy evaluation. In Proc. of ICML.
- (2000) Proc. of ICML
- Precup, D.¹ Sutton, R.S.² Singh, S.³

8
- 84899025152
- Optimality of reinforcement learning algorithms with linear function approximation
- Schoknecht, R. 2003. Optimality of reinforcement learning algorithms with linear function approximation. In NIPS 15.
- (2003) NIPS 15
- Schoknecht, R.¹

9
- 18544374225
- Policy improvement for pomdps using normalized importance sampling
- Shelton, C. R. 2001. Policy improvement for pomdps using normalized importance sampling. In Proc. of UAI.
- (2001) Proc. of UAI
- Shelton, C.R.¹

10
- 0037527188
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- Shimodaira, H. 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90(2):227-244.
- (2000) Journal of Statistical Planning and Inference , vol.90 , Issue.2 , pp. 227-244
- Shimodaira, H.¹

11
- 34249047899
- Covariate shift adaptation by importance weighted cross validation
- Sugiyama, M.; Krauledat, M.; and Müller, K.-R. 2007. Covariate shift adaptation by importance weighted cross validation. JMLR 8:985-1005.
- (2007) JMLR , vol.8 , pp. 985-1005
- Sugiyama, M.¹ Krauledat, M.² Müller, K.-R.³

12
- 0004102479
- The MIT Press
- Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. The MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

13
- 0004049893
- Ph.D. Dissertation, King's College, University of Oxford
- Watkins, C. 1989. Learning from Delayed Rewards. Ph.D. Dissertation, King's College, University of Oxford.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.