메뉴 건너뛰기




Volumn 12, Issue , 2012, Pages 359-386

Bayesian reinforcement learning

Author keywords

Covariance

Indexed keywords


EID: 85042936847     PISSN: 18674534     EISSN: 18674542     Source Type: Book Series    
DOI: 10.1007/978-3-642-27645-3_11     Document Type: Chapter
Times cited : (63)

References (83)
  • 1
    • 80053140972 scopus 로고    scopus 로고
    • Learning wireless network association control with Gaussian process temporal difference methods
    • Aharony, N., Zehavi, T., Engel, Y.: Learning wireless network association control with Gaussian process temporal difference methods. In: Proceedings of OPNETWORK (2005)
    • (2005) Proceedings of OPNETWORK
    • Aharony, N.1    Zehavi, T.2    Engel, Y.3
  • 7
    • 0004870746 scopus 로고
    • A problem in sequential design of experiments
    • Bellman, R.: A problem in sequential design of experiments. Sankhya 16, 221–229 (1956)
    • (1956) Sankhya , vol.16 , pp. 221-229
    • Bellman, R.1
  • 12
    • 0041965975 scopus 로고    scopus 로고
    • R-max - A general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman, R., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3, 213–231 (2002)
    • (2002) JMLR , vol.3 , pp. 213-231
    • Brafman, R.1    Tennenholtz, M.2
  • 13
    • 0031189914 scopus 로고    scopus 로고
    • Multitask learning
    • Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)
    • (1997) Machine Learning , vol.28 , Issue.1 , pp. 41-75
    • Caruana, R.1
  • 18
    • 85042917175 scopus 로고
    • Optimal sequential decision making under uncertainty. Master’s thesis, Massachusetts Institute of
    • Cozzolino, J.M.: Optimal sequential decision making under uncertainty. Master’s thesis, Massachusetts Institute of Technology (1964)
    • (1964) Technology
    • Cozzolino, J.M.1
  • 20
    • 1142281527 scopus 로고    scopus 로고
    • Model based Bayesian exploration
    • Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: UAI, pp. 150– 159 (1999)
    • (1999) UAI , pp. 150-159
    • Dearden, R.1    Friedman, N.2    Andre, D.3
  • 22
    • 77249117255 scopus 로고    scopus 로고
    • Percentile optimization for Markov decision processes with parameter uncertainty
    • Delage, E., Mannor, S.: Percentile optimization for Markov decision processes with parameter uncertainty. Operations Research 58(1), 203–213 (2010)
    • (2010) Operations Research , vol.58 , Issue.1 , pp. 203-213
    • Delage, E.1    Mannor, S.2
  • 23
    • 77956270152 scopus 로고    scopus 로고
    • Complexity of stochastic branch and bound methods for belief tree search in bayesian reinforcement learning
    • Dimitrakakis, C.: Complexity of stochastic branch and bound methods for belief tree search in bayesian reinforcement learning. In: ICAART (1), pp. 259–264 (2010)
    • (2010) ICAART , vol.1 , pp. 259-264
    • Dimitrakakis, C.1
  • 25
    • 85162013390 scopus 로고    scopus 로고
    • Nonparametric Bayesian policy priors for reinforcement learning
    • Doshi-Velez, F., Wingate, D., Roy, N., Tenenbaum, J.: Nonparametric Bayesian policy priors for reinforcement learning. In: NIPS (2010)
    • (2010) NIPS
    • Doshi-Velez, F.1    Wingate, D.2    Roy, N.3    Tenenbaum, J.4
  • 27
    • 1942421168 scopus 로고    scopus 로고
    • Design for an optimal probe
    • Duff, M.: Design for an optimal probe. In: ICML, pp. 131–138 (2003)
    • (2003) ICML , pp. 131-138
    • Duff, M.1
  • 29
    • 84945284029 scopus 로고    scopus 로고
    • Sparse Online Greedy Support Vector Regression
    • Elomaa, T., Mannila, H., Toivonen, H. (eds.), Springer, Heidelberg
    • Engel, Y., Mannor, S., Meir, R.: Sparse Online Greedy Support Vector Regression. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 84–96. Springer, Heidelberg (2002)
    • (2002) ECML 2002. LNCS (LNAI) , vol.2430 , pp. 84-96
    • Engel, Y.1    Mannor, S.2    Meir, R.3
  • 33
    • 85162020739 scopus 로고    scopus 로고
    • PAC-Bayesian model selection for reinforcement learning
    • Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.)
    • Fard, M.M., Pineau, J.: PAC-Bayesian model selection for reinforcement learning. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 1624–1632 (2010)
    • (2010) Advances in Neural Information Processing Systems , vol.23 , pp. 1624-1632
    • Fard, M.M.1    Pineau, J.2
  • 37
    • 84897694817 scopus 로고    scopus 로고
    • Variance reduction techniques for gradient estimates in reinforcement learning
    • Greensmith, E., Bartlett, P., Baxter, J.: Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research 5, 1471–1530 (2004)
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
    • Greensmith, E.1    Bartlett, P.2    Baxter, J.3
  • 42
    • 84880649215 scopus 로고    scopus 로고
    • A sparse sampling algorithm for near-optimal planning in large Markov decision processes
    • Kearns, M., Mansour, Y., Ng, A.: A sparse sampling algorithm for near-optimal planning in large Markov decision processes. In: Proc. IJCAI (1999)
    • (1999) Proc. IJCAI
    • Kearns, M.1    Mansour, Y.2    Ng, A.3
  • 46
    • 56049125072 scopus 로고    scopus 로고
    • Transfer of samples in batch reinforcement learning
    • Lazaric, A., Restelli, M., Bonarini, A.: Transfer of samples in batch reinforcement learning. In: Proceedings of ICML, vol. 25, pp. 544–551 (2008)
    • (2008) Proceedings of ICML , vol.25 , pp. 544-551
    • Lazaric, A.1    Restelli, M.2    Bonarini, A.3
  • 47
    • 33847336943 scopus 로고    scopus 로고
    • Bias and variance approximation in value function estimates
    • Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Management Science 53(2), 308–322 (2007)
    • (2007) Management Science , vol.53 , Issue.2 , pp. 308-322
    • Mannor, S.1    Simester, D.2    Sun, P.3    Tsitsiklis, J.N.4
  • 50
    • 55149090494 scopus 로고    scopus 로고
    • Transfer in variable-reward hierarchical reinforcement learning
    • Mehta, N., Natarajan, S., Tadepalli, P., Fern, A.: Transfer in variable-reward hierarchical reinforcement learning. Machine Learning 73(3), 289–312 (2008)
    • (2008) Machine Learning , vol.73 , Issue.3 , pp. 289-312
    • Mehta, N.1    Natarajan, S.2    Tadepalli, P.3    Fern, A.4
  • 51
    • 0032679082 scopus 로고    scopus 로고
    • Exploration of multi-state environments: Local measures and backpropagation of uncertainty
    • Meuleau, N., Bourgine, P.: Exploration of multi-state environments: local measures and backpropagation of uncertainty. Machine Learning 35, 117–154 (1999)
    • (1999) Machine Learning , vol.35 , pp. 117-154
    • Meuleau, N.1    Bourgine, P.2
  • 52
    • 14344250395 scopus 로고    scopus 로고
    • Robust control of Markov decision processes with uncertain transition matrices
    • Nilim, A., El Ghaoui, L.: Robust control of Markov decision processes with uncertain transition matrices. Operations Research 53(5), 780–798 (2005)
    • (2005) Operations Research , vol.53 , Issue.5 , pp. 780-798
    • Nilim, A.1    El Ghaoui, L.2
  • 53
    • 0013521749 scopus 로고
    • Monte Carlo is fundamentally unsound
    • O’Hagan, A.: Monte Carlo is fundamentally unsound. The Statistician 36, 247–249 (1987)
    • (1987) The Statistician , vol.36 , pp. 247-249
    • O’Hagan, A.1
  • 56
    • 44949241322 scopus 로고    scopus 로고
    • Reinforcement learning of motor skills with policy gradients
    • Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4), 682–697 (2008)
    • (2008) Neural Networks , vol.21 , Issue.4 , pp. 682-697
    • Peters, J.1    Schaal, S.2
  • 58
    • 33646413135 scopus 로고    scopus 로고
    • Natural Actor-Critic
    • Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.), Springer, Heidelberg
    • Peters, J., Vijayakumar, S., Schaal, S.: Natural Actor-Critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 280–291. Springer, Heidelberg (2005)
    • (2005) ECML 2005. LNCS (LNAI) , vol.3720 , pp. 280-291
    • Peters, J.1    Vijayakumar, S.2    Schaal, S.3
  • 69
    • 34547984629 scopus 로고
    • Tech. Rep. Technical Report No. 1, Research in the Control of Complex Systems. Operations Research Center, Massachusetts Institute of Technology
    • Silver, E.A.: Markov decision processes with uncertain transition probabilities or rewards. Tech. Rep. Technical Report No. 1, Research in the Control of Complex Systems. Operations Research Center, Massachusetts Institute of Technology (1963)
    • (1963) Markov Decision Processes with Uncertain Transition Probabilities Or Rewards
    • Silver, E.A.1
  • 71
    • 34548745051 scopus 로고    scopus 로고
    • Incremental model-based learners with formal learning-time guarantees
    • Strehl, A.L., Li, L., Littman, M.L.: Incremental model-based learners with formal learning-time guarantees. In: UAI (2006)
    • (2006) UAI
    • Strehl, A.L.1    Li, L.2    Littman, M.L.3
  • 72
    • 14344258433 scopus 로고    scopus 로고
    • A Bayesian framework for reinforcement learning
    • Strens, M.: A Bayesian framework for reinforcement learning. In: ICML (2000)
    • (2000) ICML
    • Strens, M.1
  • 74
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.1
  • 76
    • 34848816477 scopus 로고    scopus 로고
    • Transfer learning via inter-task mappings for temporal difference learning
    • Taylor, M., Stone, P., Liu, Y.: Transfer learning via inter-task mappings for temporal difference learning. JMLR 8, 2125–2167 (2007)
    • (2007) JMLR , vol.8 , pp. 2125-2167
    • Taylor, M.1    Stone, P.2    Liu, Y.3
  • 77
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
    • (1933) Biometrika , vol.25 , pp. 285-294
    • Thompson, W.R.1
  • 78
    • 85167449087 scopus 로고    scopus 로고
    • Reinforcement learning via AIXI approximation
    • Veness, J., Ng, K.S., Hutter, M., Silver, D.: Reinforcement learning via AIXI approximation. In: AAAI (2010)
    • (2010) AAAI
    • Veness, J.1    Ng, K.S.2    Hutter, M.3    Silver, D.4
  • 79
    • 31844436266 scopus 로고    scopus 로고
    • Bayesian sparse sampling for on-line reward optimization
    • Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: ICML (2005)
    • (2005) ICML
    • Wang, T.1    Lizotte, D.2    Bowling, M.3    Schuurmans, D.4
  • 82
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.1
  • 83
    • 78650267403 scopus 로고    scopus 로고
    • Multi-task reinforcement learning: A hierarchical Bayesian approach
    • Wilson, A., Fern, A., Ray, S., Tadepalli, P.: Multi-task reinforcement learning: A hierarchical Bayesian approach. In: Proceedings of ICML, vol. 24, pp. 1015–1022 (2007)
    • (2007) Proceedings of ICML , vol.24 , pp. 1015-1022
    • Wilson, A.1    Fern, A.2    Ray, S.3    Tadepalli, P.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.