메뉴 건너뛰기




Volumn , Issue , 2005, Pages 961-968

Bayesian sparse sampling for on-line reward optimization

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATION THEORY; DECISION MAKING; INFORMATION THEORY; SAMPLING;

EID: 31844436266     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (86)

References (29)
  • 5
    • 84872997543 scopus 로고    scopus 로고
    • Learning evaluation functions for large acyclic domains
    • Boyan, J., & Moore, A. (1996). Learning evaluation functions for large acyclic domains. Proceedings ICML.
    • (1996) Proceedings ICML
    • Boyan, J.1    Moore, A.2
  • 6
    • 84880854156 scopus 로고    scopus 로고
    • R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman, R., &Tennenholtz, M. (2001). R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Proceedings IJCAI.
    • (2001) Proceedings IJCAI
    • Brafman, R.1    Tennenholtz, M.2
  • 9
    • 1942421151 scopus 로고    scopus 로고
    • Bayes meets Bellman: The Gaussian process approach to temporal difference learning
    • Engel, Y., Manner, S., & Meir, R. (2003). Bayes meets Bellman: The Gaussian process approach to temporal difference learning. Proceedings ICML.
    • (2003) Proceedings ICML
    • Engel, Y.1    Manner, S.2    Meir, R.3
  • 12
    • 0028442413 scopus 로고
    • Associative reinforcement learning: Functions in k-DNF
    • Kaelbling, L. P. (1994). Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15, 279-298.
    • (1994) Machine Learning , vol.15 , pp. 279-298
    • Kaelbling, L.P.1
  • 13
    • 84880649215 scopus 로고    scopus 로고
    • A sparse sampling algorithm for near-optimal planning in large markov decision processes
    • Kearns, M., Mansour, Y., & Ng, A. (2001). A sparse sampling algorithm for near-optimal planning in large markov decision processes. JMLR, 1324-1331.
    • (2001) JMLR , pp. 1324-1331
    • Kearns, M.1    Mansour, Y.2    Ng, A.3
  • 14
    • 0012257655 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proceedings ICML.
    • (1998) Proceedings ICML
    • Kearns, M.1    Singh, S.2
  • 15
    • 0036374190 scopus 로고    scopus 로고
    • Nonapproximability results for partially observable Markov decision processes
    • Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). Nonapproximability results for partially observable Markov decision processes. JAIR, 14, 83-103.
    • (2001) JAIR , vol.14 , pp. 83-103
    • Lusena, C.1    Goldsmith, J.2    Mundhenk, M.3
  • 17
    • 0001205548 scopus 로고    scopus 로고
    • Complexity of finite-horizon Markov decision processes
    • Mundhenk, M., Goldsmith, J., Lusena, C., & Allender, E. (2000). Complexity of finite-horizon Markov decision processes. JACM, 47, 681-720.
    • (2000) JACM , vol.47 , pp. 681-720
    • Mundhenk, M.1    Goldsmith, J.2    Lusena, C.3    Allender, E.4
  • 19
    • 0141819580 scopus 로고    scopus 로고
    • Pegasus: A policy search method for large MDPs and POMDPs
    • Ng, A., & Jordan, M. (2000). Pegasus: A policy search method for large MDPs and POMDPs. Proceedings UAI.
    • (2000) Proceedings UAI
    • Ng, A.1    Jordan, M.2
  • 20
    • 33750307958 scopus 로고    scopus 로고
    • On-line search for solving Markov decision processes via heuristic sampling
    • Péret, L., & Garcia, F. (2004). On-line search for solving Markov decision processes via heuristic sampling. Proceedings ECAI.
    • (2004) Proceedings ECAI
    • Péret, L.1    Garcia, F.2
  • 21
    • 4243097786 scopus 로고
    • Active exploration and learning in real-valued spaces using multi-armed bandit allocation indices
    • Salganicoff, M., & Ungar, L. (1995). Active exploration and learning in real-valued spaces using multi-armed bandit allocation indices. Proceedings ICML.
    • (1995) Proceedings ICML
    • Salganicoff, M.1    Ungar, L.2
  • 22
    • 14344258433 scopus 로고    scopus 로고
    • A Bayesian framework for reinforcement learning
    • Strens, M. (2000). A Bayesian framework for reinforcement learning. Proceedings ICML.
    • (2000) Proceedings ICML
    • Strens, M.1
  • 23
    • 0141607821 scopus 로고    scopus 로고
    • Policy search using paired comparisons
    • Strens, M., & Moore, A. (2002). Policy search using paired comparisons. JMLR, 3, 921-950.
    • (2002) JMLR , vol.3 , pp. 921-950
    • Strens, M.1    Moore, A.2
  • 25
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25, 285-294.
    • (1933) Biometrika , vol.25 , pp. 285-294
    • Thompson, W.R.1
  • 26
    • 0004049893 scopus 로고
    • Doctoral dissertation, King's College Cambridge
    • Watkins, C. (1989). Learning from delayed rewards. Doctoral dissertation, King's College Cambridge.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.1
  • 28
  • 29
    • 15744382410 scopus 로고    scopus 로고
    • Exploration control in reinforcement learning using optimistic model selection
    • Wyatt, J. (2001). Exploration control in reinforcement learning using optimistic model selection. Proc. ICML.
    • (2001) Proc. ICML
    • Wyatt, J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.