SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning

Volumn , Issue , 2005, Pages 961-968

Bayesian sparse sampling for on-line reward optimization

(4) Wang, Tao a Lizotte, Daniel a Bowling, Michael a Schuurmans, Dale a

a UNIVERSITY OF ALBERTA (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATION THEORY; DECISION MAKING; INFORMATION THEORY; SAMPLING;

BAYESIAN EXPLORATION; EXPLOITATION TRADEOFF; REINFORCEMENT LEARNING;

LEARNING SYSTEMS;

EID: 31844436266 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (86)

References (29)

1
- 0004020376
- Princeton
- Bellman, R. (1961). Adaptive control processes. Princeton.
- (1961) Adaptive Control Processes
- Bellman, R.¹

2
- 0004218171
- Chapman Hall
- Berry, D., & Fristedt, B. (1985). Bandit problems. Chapman Hall.
- (1985) Bandit Problems
- Berry, D.¹ Fristedt, B.²

3
- 0003565783
- Athena Scientific
- Bertsekas, D. (1995). Dynamic programming and optimal control, vol. 2. Athena Scientific.
- (1995) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.¹

4
- 0003487482
- Athena Scientific
- Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Athena Scientific.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

5
- 84872997543
- Learning evaluation functions for large acyclic domains
- Boyan, J., & Moore, A. (1996). Learning evaluation functions for large acyclic domains. Proceedings ICML.
- (1996) Proceedings ICML
- Boyan, J.¹ Moore, A.²

6
- 84880854156
- R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
- Brafman, R., &Tennenholtz, M. (2001). R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Proceedings IJCAI.
- (2001) Proceedings IJCAI
- Brafman, R.¹ Tennenholtz, M.²

7
- 1142281527
- Model based Bayesian exploration
- Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. Proceedings UAL
- (1999) Proceedings UAL
- Dearden, R.¹ Friedman, N.² Andre, D.³

8
- 1942450858
- Doctoral dissertation, U. Mass
- Duff, M. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Doctoral dissertation, U. Mass.
- (2002) Optimal Learning: Computational Procedures for Bayes-adaptive Markov Decision Processes
- Duff, M.¹

9
- 1942421151
- Bayes meets Bellman: The Gaussian process approach to temporal difference learning
- Engel, Y., Manner, S., & Meir, R. (2003). Bayes meets Bellman: The Gaussian process approach to temporal difference learning. Proceedings ICML.
- (2003) Proceedings ICML
- Engel, Y.¹ Manner, S.² Meir, R.³

10
- 84891584370
- Wiley
- Gittins, J. (1989). Multi-armed bandit allocation indices. Wiley.
- (1989) Multi-armed Bandit Allocation Indices
- Gittins, J.¹

11
- 0004283231
- MIT Press
- Jordan, M. (Ed.). (1999). Learning in graphical models. MIT Press.
- (1999) Learning in Graphical Models
- Jordan, M.¹

12
- 0028442413
- Associative reinforcement learning: Functions in k-DNF
- Kaelbling, L. P. (1994). Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15, 279-298.
- (1994) Machine Learning , vol.15 , pp. 279-298
- Kaelbling, L.P.¹

13
- 84880649215
- A sparse sampling algorithm for near-optimal planning in large markov decision processes
- Kearns, M., Mansour, Y., & Ng, A. (2001). A sparse sampling algorithm for near-optimal planning in large markov decision processes. JMLR, 1324-1331.
- (2001) JMLR , pp. 1324-1331
- Kearns, M.¹ Mansour, Y.² Ng, A.³

14
- 0012257655
- Near-optimal reinforcement learning in polynomial time
- Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proceedings ICML.
- (1998) Proceedings ICML
- Kearns, M.¹ Singh, S.²

15
- 0036374190
- Nonapproximability results for partially observable Markov decision processes
- Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). Nonapproximability results for partially observable Markov decision processes. JAIR, 14, 83-103.
- (2001) JAIR , vol.14 , pp. 83-103
- Lusena, C.¹ Goldsmith, J.² Mundhenk, M.³

16
- 0040157510
- Wiley
- Martin, J. (1967). Bayesian decision problems and Markov chains. Wiley.
- (1967) Bayesian Decision Problems and Markov Chains
- Martin, J.¹

17
- 0001205548
- Complexity of finite-horizon Markov decision processes
- Mundhenk, M., Goldsmith, J., Lusena, C., & Allender, E. (2000). Complexity of finite-horizon Markov decision processes. JACM, 47, 681-720.
- (2000) JACM , vol.47 , pp. 681-720
- Mundhenk, M.¹ Goldsmith, J.² Lusena, C.³ Allender, E.⁴

18
- 0003611509
- Springer
- Neal, R. (Ed.). (1996). Bayesian learning for neural networks. Springer.
- (1996) Bayesian Learning for Neural Networks
- Neal, R.¹

19
- 0141819580
- Pegasus: A policy search method for large MDPs and POMDPs
- Ng, A., & Jordan, M. (2000). Pegasus: A policy search method for large MDPs and POMDPs. Proceedings UAI.
- (2000) Proceedings UAI
- Ng, A.¹ Jordan, M.²

20
- 33750307958
- On-line search for solving Markov decision processes via heuristic sampling
- Péret, L., & Garcia, F. (2004). On-line search for solving Markov decision processes via heuristic sampling. Proceedings ECAI.
- (2004) Proceedings ECAI
- Péret, L.¹ Garcia, F.²

21
- 4243097786
- Active exploration and learning in real-valued spaces using multi-armed bandit allocation indices
- Salganicoff, M., & Ungar, L. (1995). Active exploration and learning in real-valued spaces using multi-armed bandit allocation indices. Proceedings ICML.
- (1995) Proceedings ICML
- Salganicoff, M.¹ Ungar, L.²

22
- 14344258433
- A Bayesian framework for reinforcement learning
- Strens, M. (2000). A Bayesian framework for reinforcement learning. Proceedings ICML.
- (2000) Proceedings ICML
- Strens, M.¹

23
- 0141607821
- Policy search using paired comparisons
- Strens, M., & Moore, A. (2002). Policy search using paired comparisons. JMLR, 3, 921-950.
- (2002) JMLR , vol.3 , pp. 921-950
- Strens, M.¹ Moore, A.²

24
- 0004102479
- MIT Press
- Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.²

25
- 0001395850
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
- Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25, 285-294.
- (1933) Biometrika , vol.25 , pp. 285-294
- Thompson, W.R.¹

26
- 0004049893
- Doctoral dissertation, King's College Cambridge
- Watkins, C. (1989). Learning from delayed rewards. Doctoral dissertation, King's College Cambridge.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

27
- 0345161977
- Doctoral dissertation, Univ. Amsterdam
- Wiering, M. (1999). Explorations in efficient reinforcement learning. Doctoral dissertation, Univ. Amsterdam.
- (1999) Explorations in Efficient Reinforcement Learning
- Wiering, M.¹

28
- 0003017575
- Prediction with Gaussian processes
- MIT Press
- Williams, C. (1999). Prediction with Gaussian processes. In Learning in graphical models. MIT Press.
- (1999) Learning in Graphical Models
- Williams, C.¹

29
- 15744382410
- Exploration control in reinforcement learning using optimistic model selection
- Wyatt, J. (2001). Exploration control in reinforcement learning using optimistic model selection. Proc. ICML.
- (2001) Proc. ICML
- Wyatt, J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.