SCOPUS 정보 검색 플랫폼

Adaptation, Learning, and Optimization

Volumn 12, Issue , 2012, Pages 359-386

Bayesian reinforcement learning

(4) Vlassis, Nikos a,b Ghavamzadeh, Mohammad c Mannor, Shie d Poupart, Pascal e

a UNIVERSITY OF LUXEMBOURG (Luxembourg)

b OneTree Technologies (Luxembourg)

c INRIA (France)

d TECHNION ISRAEL INSTITUTE OF TECHNOLOGY (Israel)

e UNIVERSITY OF WATERLOO (Canada)

Author keywords

Covariance

Indexed keywords

EID: 85042936847 PISSN: 18674534 EISSN: 18674542 Source Type: Book Series
DOI: 10.1007/978-3-642-27645-3_11 Document Type: Chapter

Times cited : (63)

References (83)

1
- 80053140972
- Learning wireless network association control with Gaussian process temporal difference methods
- Aharony, N., Zehavi, T., Engel, Y.: Learning wireless network association control with Gaussian process temporal difference methods. In: Proceedings of OPNETWORK (2005)
- (2005) Proceedings of OPNETWORK
- Aharony, N.¹ Zehavi, T.² Engel, Y.³

2
- 78649507911
- A Bayesian sampling approach to exploration in reinforcement learning
- AUAI Press
- Asmuth, J., Li, L., Littman, M.L., Nouri, A., Wingate, D.: A Bayesian sampling approach to exploration in reinforcement learning. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2009, pp. 19–26. AUAI Press (2009)
- (2009) Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2009 , pp. 19-26
- Asmuth, J.¹ Li, L.² Littman, M.L.³ Nouri, A.⁴ Wingate, D.⁵

3
- 84858765598
- Covariant policy search
- Bagnell, J., Schneider, J.: Covariant policy search. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (2003)
- (2003) Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence
- Bagnell, J.¹ Schneider, J.²

4
- 0020970738
- Neuron-like elements that can solve difficult learning control problems
- Barto, A., Sutton, R., Anderson, C.: Neuron-like elements that can solve difficult learning control problems. IEEE Transaction on Systems, Man and Cybernetics 13, 835–846 (1983)
- (1983) IEEE Transaction on Systems, Man and Cybernetics , vol.13 , pp. 835-846
- Barto, A.¹ Sutton, R.² Anderson, C.³

5
- 14344277592
- A model of inductive bias learning
- Baxter, J.: A model of inductive bias learning. Journal of Artificial Intelligence Research 12, 149–198 (2000)
- (2000) Journal of Artificial Intelligence Research , vol.12 , pp. 149-198
- Baxter, J.¹

6
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter, J., Bartlett, P.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.²

7
- 0004870746
- A problem in sequential design of experiments
- Bellman, R.: A problem in sequential design of experiments. Sankhya 16, 221–229 (1956)
- (1956) Sankhya , vol.16 , pp. 221-229
- Bellman, R.¹

8
- 84863764672
- Adaptive Control Processes: A Guided Tour
- Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press (1961)
- (1961) Princeton University Press
- Bellman, R.¹

9
- 84938011869
- On adaptive control processes
- Bellman, R., Kalaba, R.: On adaptive control processes. Transactions on Automatic Control, IRE 4(2), 1–9 (1959)
- (1959) Transactions on Automatic Control, IRE , vol.4 , Issue.2 , pp. 1-9
- Bellman, R.¹ Kalaba, R.²

10
- 85162049326
- Incremental natural actor-critic algorithms
- MIT Press
- Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M.: Incremental natural actor-critic algorithms. In: Proceedings of Advances in Neural Information Processing Systems, vol. 20, pp. 105–112. MIT Press (2007)
- (2007) Proceedings of Advances in Neural Information Processing Systems , vol.20 , pp. 105-112
- Bhatnagar, S.¹ Sutton, R.² Ghavamzadeh, M.³ Lee, M.⁴

11
- 70349984547
- Natural actor-critic algorithms
- Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Automatica 45(11), 2471–2482 (2009)
- (2009) Automatica , vol.45 , Issue.11 , pp. 2471-2482
- Bhatnagar, S.¹ Sutton, R.² Ghavamzadeh, M.³ Lee, M.⁴

12
- 0041965975
- R-max - A general polynomial time algorithm for near-optimal reinforcement learning
- Brafman, R., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3, 213–231 (2002)
- (2002) JMLR , vol.3 , pp. 213-231
- Brafman, R.¹ Tennenholtz, M.²

13
- 0031189914
- Multitask learning
- Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)
- (1997) Machine Learning , vol.28 , Issue.1 , pp. 41-75
- Caruana, R.¹

14
- 70349431917
- Using linear programming for Bayesian exploration in Markov decision processes
- Castro, P., Precup, D.: Using linear programming for Bayesian exploration in Markov decision processes. In: Proc. 20th International Joint Conference on Artificial Intelligence (2007)
- (2007) Proc. 20Th International Joint Conference on Artificial Intelligence
- Castro, P.¹ Precup, D.²

15
- 1142280924
- Coordination in multi-agent reinforcement learning: A Bayesian approach
- Chalkiadakis, G., Boutilier, C.: Coordination in multi-agent reinforcement learning: A Bayesian approach. In: International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 709–716 (2003)
- (2003) International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) , pp. 709-716
- Chalkiadakis, G.¹ Boutilier, C.²

16
- 4544293081
- Bayesian reinforcement learning for coalition formation under uncertainty
- Chalkiadakis, G., Boutilier, C.: Bayesian reinforcement learning for coalition formation under uncertainty. In: International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1090–1097 (2004)
- (2004) International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) , pp. 1090-1097
- Chalkiadakis, G.¹ Boutilier, C.²

17
- 84900550689
- Tech. Rep. Technical Report No. 11, Research in the Control of Complex Systems. Operations Research Center, Massachusetts Institute of Technology
- Cozzolino, J., Gonzales-Zubieta, R., Miller, R.L.: Markovian decision processes with uncertain transition probabilities. Tech. Rep. Technical Report No. 11, Research in the Control of Complex Systems. Operations Research Center, Massachusetts Institute of Technology (1965)
- (1965) Markovian Decision Processes with Uncertain Transition Probabilities
- Cozzolino, J.¹ Gonzales-Zubieta, R.² Miller, R.L.³

18
- 85042917175
- Optimal sequential decision making under uncertainty. Master’s thesis, Massachusetts Institute of
- Cozzolino, J.M.: Optimal sequential decision making under uncertainty. Master’s thesis, Massachusetts Institute of Technology (1964)
- (1964) Technology
- Cozzolino, J.M.¹

19
- 0031619316
- Learning
- Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 761–768 (1998)
- (1998) Proceedings of the Fifteenth National Conference on Artificial Intelligence , pp. 761-768
- Dearden, R.¹ Friedman, N.² Russell, S.³ Bayesian, Q.⁴

20
- 1142281527
- Model based Bayesian exploration
- Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: UAI, pp. 150– 159 (1999)
- (1999) UAI , pp. 150-159
- Dearden, R.¹ Friedman, N.² Andre, D.³

21
- 0003759417
- McGraw-Hill, New York
- DeGroot, M.H.: Optimal Statistical Decisions. McGraw-Hill, New York (1970)
- (1970) Optimal Statistical Decisions
- Degroot, M.H.¹

22
- 77249117255
- Percentile optimization for Markov decision processes with parameter uncertainty
- Delage, E., Mannor, S.: Percentile optimization for Markov decision processes with parameter uncertainty. Operations Research 58(1), 203–213 (2010)
- (2010) Operations Research , vol.58 , Issue.1 , pp. 203-213
- Delage, E.¹ Mannor, S.²

23
- 77956270152
- Complexity of stochastic branch and bound methods for belief tree search in bayesian reinforcement learning
- Dimitrakakis, C.: Complexity of stochastic branch and bound methods for belief tree search in bayesian reinforcement learning. In: ICAART (1), pp. 259–264 (2010)
- (2010) ICAART , vol.1 , pp. 259-264
- Dimitrakakis, C.¹

24
- 77958539351
- The infinite partially observable Markov decision process
- Doshi-Velez, F.: The infinite partially observable Markov decision process. In: Neural Information Processing Systems (2009)
- (2009) Neural Information Processing Systems
- Doshi-Velez, F.¹

25
- 85162013390
- Nonparametric Bayesian policy priors for reinforcement learning
- Doshi-Velez, F., Wingate, D., Roy, N., Tenenbaum, J.: Nonparametric Bayesian policy priors for reinforcement learning. In: NIPS (2010)
- (2010) NIPS
- Doshi-Velez, F.¹ Wingate, D.² Roy, N.³ Tenenbaum, J.⁴

26
- 1942450858
- PhD thesis, University of Massassachusetts Amherst
- Duff, M.: Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massassachusetts Amherst (2002)
- (2002) Optimal Learning: Computational Procedures for Bayes-Adaptive Markov Decision Processes
- Duff, M.¹

27
- 1942421168
- Design for an optimal probe
- Duff, M.: Design for an optimal probe. In: ICML, pp. 131–138 (2003)
- (2003) ICML , pp. 131-138
- Duff, M.¹

28
- 31844456714
- PhD thesis, The Hebrew University of Jerusalem, Israel
- Engel, Y.: Algorithms and representations for reinforcement learning. PhD thesis, The Hebrew University of Jerusalem, Israel (2005)
- (2005) Algorithms and Representations for Reinforcement Learning
- Engel, Y.¹

29
- 84945284029
- Sparse Online Greedy Support Vector Regression
- Elomaa, T., Mannila, H., Toivonen, H. (eds.), Springer, Heidelberg
- Engel, Y., Mannor, S., Meir, R.: Sparse Online Greedy Support Vector Regression. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 84–96. Springer, Heidelberg (2002)
- (2002) ECML 2002. LNCS (LNAI) , vol.2430 , pp. 84-96
- Engel, Y.¹ Mannor, S.² Meir, R.³

30
- 1942421151
- Bayes meets Bellman: The Gaussian process approach to temporal difference learning
- Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 154–161 (2003)
- (2003) Proceedings of the Twentieth International Conference on Machine Learning , pp. 154-161
- Engel, Y.¹ Mannor, S.² Meir, R.³

31
- 31844451013
- Reinforcement learning with Gaussian processes
- Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: Proceedings of the Twenty Second International Conference on Machine Learning, pp. 201– 208 (2005a)
- (2005) Proceedings of the Twenty Second International Conference on Machine Learning , pp. 201-208
- Engel, Y.¹ Mannor, S.² Meir, R.³

32
- 34548800682
- Learning to control an octopus arm with Gaussian process temporal difference methods
- MIT Press
- Engel, Y., Szabo, P., Volkinshtein, D.: Learning to control an octopus arm with Gaussian process temporal difference methods. In: Proceedings of Advances in Neural Information Processing Systems, vol. 18, pp. 347–354. MIT Press (2005b)
- (2005) Proceedings of Advances in Neural Information Processing Systems , vol.18 , pp. 347-354
- Engel, Y.¹ Szabo, P.² Volkinshtein, D.³

33
- 85162020739
- PAC-Bayesian model selection for reinforcement learning
- Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.)
- Fard, M.M., Pineau, J.: PAC-Bayesian model selection for reinforcement learning. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 1624–1632 (2010)
- (2010) Advances in Neural Information Processing Systems , vol.23 , pp. 1624-1632
- Fard, M.M.¹ Pineau, J.²

34
- 85093445500
- Bayesian policy gradient algorithms
- MIT Press
- Ghavamzadeh, M., Engel, Y.: Bayesian policy gradient algorithms. In: Proceedings of Advances in Neural Information Processing Systems, vol. 19, MIT Press (2006)
- (2006) Proceedings of Advances in Neural Information Processing Systems , vol.19
- Ghavamzadeh, M.¹ Engel, Y.²

35
- 85042910071
- Bayesian Actor-Critic algorithms
- Ghavamzadeh, M., Engel, Y.: Bayesian Actor-Critic algorithms. In: Proceedings of the Twenty-Fourth International Conference on Machine Learning (2007)
- (2007) Proceedings of the Twenty-Fourth International Conference on Machine Learning
- Ghavamzadeh, M.¹ Engel, Y.²

36
- 29344465971
- A framework for sequential planning in multi-agent settings
- Gmytrasiewicz, P., Doshi, P.: A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research (JAIR) 24, 49–79 (2005)
- (2005) Journal of Artificial Intelligence Research (JAIR) , vol.24 , pp. 49-79
- Gmytrasiewicz, P.¹ Doshi, P.²

37
- 84897694817
- Variance reduction techniques for gradient estimates in reinforcement learning
- Greensmith, E., Bartlett, P., Baxter, J.: Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research 5, 1471–1530 (2004)
- (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
- Greensmith, E.¹ Bartlett, P.² Baxter, J.³

38
- 25444493818
- Robust dynamic programming
- Iyengar, G.N.: Robust dynamic programming. Mathematics of Operations Research 30(2), 257–280 (2005)
- (2005) Mathematics of Operations Research , vol.30 , Issue.2 , pp. 257-280
- Iyengar, G.N.¹

39
- 84898982939
- Exploiting generative models in discriminative classifiers
- MIT Press
- Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Proceedings of Advances in Neural Information Processing Systems, vol. 11, MIT Press (1999)
- (1999) Proceedings of Advances in Neural Information Processing Systems , vol.11
- Jaakkola, T.¹ Haussler, D.²

40
- 0004280606
- MIT Press
- Kaelbling, L.P.: Learning in Embedded Systems. MIT Press (1993)
- (1993) Learning in Embedded Systems
- Kaelbling, L.P.¹

41
- 84898930479
- A natural policy gradient
- Kakade, S.: A natural policy gradient. In: Proceedings of Advances in Neural Information Processing Systems, vol. 14 (2002)
- (2002) Proceedings of Advances in Neural Information Processing Systems , vol.14
- Kakade, S.¹

42
- 84880649215
- A sparse sampling algorithm for near-optimal planning in large Markov decision processes
- Kearns, M., Mansour, Y., Ng, A.: A sparse sampling algorithm for near-optimal planning in large Markov decision processes. In: Proc. IJCAI (1999)
- (1999) Proc. IJCAI
- Kearns, M.¹ Mansour, Y.² Ng, A.³

43
- 71149109483
- Near-bayesian exploration in polynomial time
- ACM, New York
- Kolter, J.Z., Ng, A.Y.: Near-bayesian exploration in polynomial time. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 513–520. ACM, New York (2009)
- (2009) Proceedings of the 26Th Annual International Conference on Machine Learning, ICML 2009 , pp. 513-520
- Kolter, J.Z.¹ Ng, A.Y.²

44
- 84898938510
- Actor-Critic algorithms
- Konda, V., Tsitsiklis, J.: Actor-Critic algorithms. In: Proceedings of Advances in Neural Information Processing Systems, vol. 12, pp. 1008–1014 (2000)
- (2000) Proceedings of Advances in Neural Information Processing Systems , vol.12 , pp. 1008-1014
- Konda, V.¹ Tsitsiklis, J.²

45
- 77956497402
- Bayesian multi-task reinforcement learning
- Lazaric, A., Ghavamzadeh, M.: Bayesian multi-task reinforcement learning. In: Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp. 599–606 (2010)
- (2010) Proceedings of the Twenty-Seventh International Conference on Machine Learning , pp. 599-606
- Lazaric, A.¹ Ghavamzadeh, M.²

46
- 56049125072
- Transfer of samples in batch reinforcement learning
- Lazaric, A., Restelli, M., Bonarini, A.: Transfer of samples in batch reinforcement learning. In: Proceedings of ICML, vol. 25, pp. 544–551 (2008)
- (2008) Proceedings of ICML , vol.25 , pp. 544-551
- Lazaric, A.¹ Restelli, M.² Bonarini, A.³

47
- 33847336943
- Bias and variance approximation in value function estimates
- Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Management Science 53(2), 308–322 (2007)
- (2007) Management Science , vol.53 , Issue.2 , pp. 308-322
- Mannor, S.¹ Simester, D.² Sun, P.³ Tsitsiklis, J.N.⁴

48
- 0009011171
- PhD thesis, Massachusetts Institute of Technology
- Marbach, P.: Simulated-based methods for Markov decision processes. PhD thesis, Massachusetts Institute of Technology (1998)
- (1998) Simulated-Based Methods for Markov Decision Processes
- Marbach, P.¹

49
- 0040157510
- John Wiley, New York
- Martin, J.J.: Bayesian decision problems and Markov chains. John Wiley, New York (1967)
- (1967) Bayesian Decision Problems and Markov Chains
- Martin, J.J.¹

50
- 55149090494
- Transfer in variable-reward hierarchical reinforcement learning
- Mehta, N., Natarajan, S., Tadepalli, P., Fern, A.: Transfer in variable-reward hierarchical reinforcement learning. Machine Learning 73(3), 289–312 (2008)
- (2008) Machine Learning , vol.73 , Issue.3 , pp. 289-312
- Mehta, N.¹ Natarajan, S.² Tadepalli, P.³ Fern, A.⁴

51
- 0032679082
- Exploration of multi-state environments: Local measures and backpropagation of uncertainty
- Meuleau, N., Bourgine, P.: Exploration of multi-state environments: local measures and backpropagation of uncertainty. Machine Learning 35, 117–154 (1999)
- (1999) Machine Learning , vol.35 , pp. 117-154
- Meuleau, N.¹ Bourgine, P.²

52
- 14344250395
- Robust control of Markov decision processes with uncertain transition matrices
- Nilim, A., El Ghaoui, L.: Robust control of Markov decision processes with uncertain transition matrices. Operations Research 53(5), 780–798 (2005)
- (2005) Operations Research , vol.53 , Issue.5 , pp. 780-798
- Nilim, A.¹ El Ghaoui, L.²

53
- 0013521749
- Monte Carlo is fundamentally unsound
- O’Hagan, A.: Monte Carlo is fundamentally unsound. The Statistician 36, 247–249 (1987)
- (1987) The Statistician , vol.36 , pp. 247-249
- O’Hagan, A.¹

54
- 0007302261
- Bayes-Hermite quadrature
- O’Hagan, A.: Bayes-Hermite quadrature. Journal of Statistical Planning and Inference 29, 245–260 (1991)
- (1991) Journal of Statistical Planning and Inference , vol.29 , pp. 245-260
- O’Hagan, A.¹

55
- 85042915085
- Towards global reinforcement learning
- Pavlov, M., Poupart, P.: Towards global reinforcement learning. In: NIPS Workshop on Model Uncertainty and Risk in Reinforcement Learning (2008)
- (2008) NIPS Workshop on Model Uncertainty and Risk in Reinforcement Learning
- Pavlov, M.¹ Poupart, P.²

56
- 44949241322
- Reinforcement learning of motor skills with policy gradients
- Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4), 682–697 (2008)
- (2008) Neural Networks , vol.21 , Issue.4 , pp. 682-697
- Peters, J.¹ Schaal, S.²

57
- 34447553096
- Reinforcement learning for humanoid robotics
- Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots (2003)
- (2003) Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

58
- 33646413135
- Natural Actor-Critic
- Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.), Springer, Heidelberg
- Peters, J., Vijayakumar, S., Schaal, S.: Natural Actor-Critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 280–291. Springer, Heidelberg (2005)
- (2005) ECML 2005. LNCS (LNAI) , vol.3720 , pp. 280-291
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

59
- 84959528732
- Robot planning in partially observable continuous domains
- Porta, J.M., Spaan, M.T., Vlassis, N.: Robot planning in partially observable continuous domains. In: Proc. Robotics: Science and Systems (2005)
- (2005) Proc. Robotics: Science and Systems
- Porta, J.M.¹ Spaan, M.T.² Vlassis, N.³

60
- 77950356463
- Model-based Bayesian reinforcement learning in partially observable domains
- Poupart, P., Vlassis, N.: Model-based Bayesian reinforcement learning in partially observable domains. In: International Symposium on Artificial Intelligence and Mathematics, ISAIM (2008)
- (2008) International Symposium on Artificial Intelligence and Mathematics, ISAIM
- Poupart, P.¹ Vlassis, N.²

61
- 33749251297
- An analytic solution to discrete Bayesian reinforcement learning
- Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: Proc. Int. Conf. on Machine Learning, Pittsburgh, USA (2006)
- (2006) Proc. Int. Conf. on Machine Learning, Pittsburgh, USA
- Poupart, P.¹ Vlassis, N.² Hoey, J.³ Regan, K.⁴

62
- 84899013778
- Bayesian Monte Carlo
- MIT Press
- Rasmussen, C., Ghahramani, Z.: Bayesian Monte Carlo. In: Proceedings of Advances in Neural Information Processing Systems, vol. 15, pp. 489–496. MIT Press (2003)
- (2003) Proceedings of Advances in Neural Information Processing Systems , vol.15 , pp. 489-496
- Rasmussen, C.¹ Ghahramani, Z.²

63
- 25444448065
- MIT Press
- Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. MIT Press (2006)
- (2006) Gaussian Processes for Machine Learning
- Rasmussen, C.¹ Williams, C.²

64
- 56449116172
- Online kernel selection for Bayesian reinforcement learning
- Reisinger, J., Stone, P., Miikkulainen, R.: Online kernel selection for Bayesian reinforcement learning. In: Proceedings of the Twenty-Fifth Conference on Machine Learning, pp. 816– 823 (2008)
- (2008) Proceedings of the Twenty-Fifth Conference on Machine Learning , pp. 816-823
- Reisinger, J.¹ Stone, P.² Miikkulainen, R.³

65
- 77955213275
- Model-based Bayesian reinforcement learning in large structured domains
- Ross, S., Pineau, J.: Model-based Bayesian reinforcement learning in large structured domains. In: Uncertainty in Artificial Intelligence, UAI (2008)
- (2008) Uncertainty in Artificial Intelligence, UAI
- Ross, S.¹ Pineau, J.²

66
- 84858778653
- Bayes-adaptive POMDPs
- Ross, S., Chaib-Draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: Advances in Neural Information Processing Systems, NIPS (2007)
- (2007) Advances in Neural Information Processing Systems, NIPS
- Ross, S.¹ Chaib-Draa, B.² Pineau, J.³

67
- 51649091499
- Bayesian reinforcement learning in continuous POMDPs with application to robot navigation
- Ross, S., Chaib-Draa, B., Pineau, J.: Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2845–2851 (2008)
- (2008) IEEE International Conference on Robotics and Automation (ICRA) , pp. 2845-2851
- Ross, S.¹ Chaib-Draa, B.² Pineau, J.³

68
- 14644392676
- Cambridge University Press
- Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)
- (2004) Kernel Methods for Pattern Analysis
- Shawe-Taylor, J.¹ Cristianini, N.²

69
- 34547984629
- Tech. Rep. Technical Report No. 1, Research in the Control of Complex Systems. Operations Research Center, Massachusetts Institute of Technology
- Silver, E.A.: Markov decision processes with uncertain transition probabilities or rewards. Tech. Rep. Technical Report No. 1, Research in the Control of Complex Systems. Operations Research Center, Massachusetts Institute of Technology (1963)
- (1963) Markov Decision Processes with Uncertain Transition Probabilities Or Rewards
- Silver, E.A.¹

70
- 31144472319
- Perseus: Randomized point-based value iteration for POMDPs
- Spaan, M.T.J., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24, 195–220 (2005)
- (2005) Journal of Artificial Intelligence Research , vol.24 , pp. 195-220
- Spaan, M.T.J.¹ Vlassis, N.²

71
- 34548745051
- Incremental model-based learners with formal learning-time guarantees
- Strehl, A.L., Li, L., Littman, M.L.: Incremental model-based learners with formal learning-time guarantees. In: UAI (2006)
- (2006) UAI
- Strehl, A.L.¹ Li, L.² Littman, M.L.³

72
- 14344258433
- A Bayesian framework for reinforcement learning
- Strens, M.: A Bayesian framework for reinforcement learning. In: ICML (2000)
- (2000) ICML
- Strens, M.¹

73
- 0003617454
- PhD thesis, University of Massachusetts Amherst
- Sutton, R.: Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts Amherst (1984)
- (1984) Temporal Credit Assignment in Reinforcement Learning
- Sutton, R.¹

74
- 33847202724
- Learning to predict by the methods of temporal differences
- Sutton, R.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.¹

75
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of Advances in Neural Information Processing Systems, vol. 12, pp. 1057–1063 (2000)
- (2000) Proceedings of Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
- Sutton, R.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

76
- 34848816477
- Transfer learning via inter-task mappings for temporal difference learning
- Taylor, M., Stone, P., Liu, Y.: Transfer learning via inter-task mappings for temporal difference learning. JMLR 8, 2125–2167 (2007)
- (2007) JMLR , vol.8 , pp. 2125-2167
- Taylor, M.¹ Stone, P.² Liu, Y.³

77
- 0001395850
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
- Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
- (1933) Biometrika , vol.25 , pp. 285-294
- Thompson, W.R.¹

78
- 85167449087
- Reinforcement learning via AIXI approximation
- Veness, J., Ng, K.S., Hutter, M., Silver, D.: Reinforcement learning via AIXI approximation. In: AAAI (2010)
- (2010) AAAI
- Veness, J.¹ Ng, K.S.² Hutter, M.³ Silver, D.⁴

79
- 31844436266
- Bayesian sparse sampling for on-line reward optimization
- Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: ICML (2005)
- (2005) ICML
- Wang, T.¹ Lizotte, D.² Bowling, M.³ Schuurmans, D.⁴

80
- 0004049893
- PhD thesis, Kings College, Cambridge, England
- Watkins, C.: Learning from delayed rewards. PhD thesis, Kings College, Cambridge, England (1989)
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

81
- 0345161977
- PhD thesis, University of Amsterdam
- Wiering, M.: Explorations in efficient reinforcement learning. PhD thesis, University of Amsterdam (1999)
- (1999) Explorations in Efficient Reinforcement Learning
- Wiering, M.¹

82
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.¹

83
- 78650267403
- Multi-task reinforcement learning: A hierarchical Bayesian approach
- Wilson, A., Fern, A., Ray, S., Tadepalli, P.: Multi-task reinforcement learning: A hierarchical Bayesian approach. In: Proceedings of ICML, vol. 24, pp. 1015–1022 (2007)
- (2007) Proceedings of ICML , vol.24 , pp. 1015-1022
- Wilson, A.¹ Fern, A.² Ray, S.³ Tadepalli, P.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.