-
3
-
-
78649507911
-
A Bayesian sampling approach to exploration in reinforcement learning
-
Asmuth, J., Li, L., Littman, M., Nouri, A., & Wingate, D. (2009). A Bayesian sampling approach to exploration in reinforcement learning. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 19-26.
-
(2009)
Proceedings of the Twenty-fifth Conference on Uncertainty in Artificial Intelligence
, pp. 19-26
-
-
Asmuth, J.1
Li, L.2
Littman, M.3
Nouri, A.4
Wingate, D.5
-
5
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
DOI 10.1023/A:1013689704352, Computational Learning Theory
-
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2), 235-256. (Pubitemid 34126111)
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
6
-
-
84966211467
-
The theory of dynamic programming
-
Bellman, R. (1954). The theory of dynamic programming. Bull. Amer. Math. Soc, 60(6), 503-515.
-
(1954)
Bull. Amer. Math. Soc
, vol.60
, Issue.6
, pp. 503-515
-
-
Bellman, R.1
-
7
-
-
0041965975
-
R-max-a general polynomial time algorithm for near-optimal reinforcement learning
-
Brafman, R., & Tennenholtz, M. (2003). R-max-a general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3, 213-231.
-
(2003)
The Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.1
Tennenholtz, M.2
-
8
-
-
77952070805
-
Pure exploration in multi-armed bandits problems
-
Springer-Verlag
-
Bubeck, S., Munos, R., & Stoltz, G. (2009). Pure exploration in multi-armed bandits problems. In Proceedings of the 20th international conference on Algorithmic learning theory, pp. 23-37. Springer-Verlag.
-
(2009)
Proceedings of the 20th International Conference on Algorithmic Learning Theory
, pp. 23-37
-
-
Bubeck, S.1
Munos, R.2
Stoltz, G.3
-
13
-
-
84900550689
-
-
Tech. rep., 11, Operations Research Center, MIT
-
Cozzolino, J., Gonzalez-Zubieta, R., & Miller, R. (1965). Markov decision processes with uncertain transition probabilities. Tech. rep., 11, Operations Research Center, MIT.
-
(1965)
Markov Decision Processes with Uncertain Transition Probabilities
-
-
Cozzolino, J.1
Gonzalez-Zubieta, R.2
Miller, R.3
-
14
-
-
0031639838
-
Applying online search techniques to reinforcement learning
-
Davies, S., Ng, A., & Moore, A. (1998). Applying online search techniques to reinforcement learning. In Proceedings of the National Conference on Artificial Intelligence, pp. 753-760.
-
(1998)
Proceedings of the National Conference on Artificial Intelligence
, pp. 753-760
-
-
Davies, S.1
Ng, A.2
Moore, A.3
-
15
-
-
0030260201
-
Exploration bonuses and dual control
-
Dayan, P., & Sejnowski, T. (1996). Exploration bonuses and dual control. Machine Learning, 25(1), 5-22. (Pubitemid 126724387)
-
(1996)
Machine Learning
, vol.25
, Issue.1
, pp. 5-22
-
-
Dayan, P.1
Sejnowski, T.J.2
-
16
-
-
0031619316
-
Bayesian Q-learning
-
Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. In Proceedings of the National Conference on Artificial Intelligence, pp. 761-768.
-
(1998)
Proceedings of the National Conference on Artificial Intelligence
, pp. 761-768
-
-
Dearden, R.1
Friedman, N.2
Russell, S.3
-
17
-
-
85162013390
-
Nonparametric bayesian policy priors for reinforcement learning
-
Doshi-Velez, F., Wingate, D., Roy, N., & Tenenbaum, J. (2010). Nonparametric bayesian policy priors for reinforcement learning. In Advances in Neural Information Processing Systems (NIPS).
-
(2010)
Advances in Neural Information Processing Systems (NIPS)
-
-
Doshi-Velez, F.1
Wingate, D.2
Roy, N.3
Tenenbaum, J.4
-
21
-
-
84893179806
-
Efficient Bayesian parameter estimation in large discrete domains
-
Friedman, N., & Singer, Y. (1999). Efficient Bayesian parameter estimation in large discrete domains. Advances in Neural Information Processing Systems (NIPS), 1(1), 417-423.
-
(1999)
Advances in Neural Information Processing Systems (NIPS)
, vol.1
, Issue.1
, pp. 417-423
-
-
Friedman, N.1
Singer, Y.2
-
22
-
-
84857593211
-
The grand challenge of computer Go: Monte Carlo tree search and extensions
-
Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvári, C., & Teytaud, O. (2012). The grand challenge of computer Go: Monte Carlo tree search and extensions. Communications of the ACM, 55(3), 106-113.
-
(2012)
Communications of the ACM
, vol.55
, Issue.3
, pp. 106-113
-
-
Gelly, S.1
Kocsis, L.2
Schoenauer, M.3
Sebag, M.4
Silver, D.5
Szepesvári, C.6
Teytaud, O.7
-
25
-
-
84877781573
-
Efficient Bayes-adaptive reinforcement learning using sample-based search
-
Guez, A., Silver, D., & Dayan, P. (2012). Efficient Bayes-adaptive reinforcement learning using sample-based search. In Advances in Neural Information Processing Systems (NIPS), pp. 1034-1042.
-
(2012)
Advances in Neural Information Processing Systems (NIPS)
, pp. 1034-1042
-
-
Guez, A.1
Silver, D.2
Dayan, P.3
-
26
-
-
84899829959
-
A formal basis for the heuristic determination of minimum cost paths
-
Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on, 4(2), 100-107.
-
(1968)
Systems Science and Cybernetics, IEEE Transactions on
, vol.4
, Issue.2
, pp. 100-107
-
-
Hart, P.1
Nilsson, N.2
Raphael, B.3
-
27
-
-
77951952841
-
Near-optimal regret bounds for reinforcement learning
-
Jaksch, T., Ortner, R., & Auer, P. (2010). Near-optimal regret bounds for reinforcement learning. The Journal of Machine Learning Research, 99, 1563-1600.
-
(2010)
The Journal of Machine Learning Research
, vol.99
, pp. 1563-1600
-
-
Jaksch, T.1
Ortner, R.2
Auer, P.3
-
28
-
-
84880649215
-
A sparse sampling algorithm for near-optimal planning in large Markov decision processes
-
Kearns, M., Mansour, Y., & Ng, A. (1999). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. In Proceedings of the 16th international joint conference on Artificial intelligence-Volume 2, pp. 1324-1331.
-
(1999)
Proceedings of the 16th International Joint Conference on Artificial Intelligence
, vol.2
, pp. 1324-1331
-
-
Kearns, M.1
Mansour, Y.2
Ng, A.3
-
31
-
-
0037840849
-
On the undecidability of probabilistic planning and related stochastic optimization problems
-
Madani, O., Hanks, S., & Condon, A. (2003). On the undecidability of probabilistic planning and related stochastic optimization problems. Artificial Intelligence, 147(1), 5-34.
-
(2003)
Artificial Intelligence
, vol.147
, Issue.1
, pp. 5-34
-
-
Madani, O.1
Hanks, S.2
Condon, A.3
-
33
-
-
0032679082
-
Exploration of multi-state environments: Local measures and back-propagation of uncertainty
-
Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: Local measures and back-propagation of uncertainty. Machine Learning, 35(2), 117-154.
-
(1999)
Machine Learning
, vol.35
, Issue.2
, pp. 117-154
-
-
Meuleau, N.1
Bourgine, P.2
-
34
-
-
0001205548
-
Complexity of finitehorizon markov decision process problems
-
Mundhenk, M., Goldsmith, J., Lusena, C., & Allender, E. (2000). Complexity of finitehorizon markov decision process problems. Journal of the ACM (JACM), 47(4), 681-720.
-
(2000)
Journal of the ACM (JACM)
, vol.47
, Issue.4
, pp. 681-720
-
-
Mundhenk, M.1
Goldsmith, J.2
Lusena, C.3
Allender, E.4
-
36
-
-
33749251297
-
An analytic solution to discrete Bayesian reinforcement learning
-
ACM
-
Poupart, P., Vlassis, N., Hoey, J., & Regan, K. (2006). An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, pp. 697-704. ACM.
-
(2006)
Proceedings of the 23rd International Conference on Machine Learning
, pp. 697-704
-
-
Poupart, P.1
Vlassis, N.2
Hoey, J.3
Regan, K.4
-
37
-
-
79960110381
-
A Bayesian approach for learning and planning in Partially Observable Markov Decision Processes
-
Ross, S., Pineau, J., Chaib-draa, B., & Kreitmann, P. (2011). A Bayesian approach for learning and planning in Partially Observable Markov Decision Processes. Journal of Machine Learning Research, 12, 1729-1770.
-
(2011)
Journal of Machine Learning Research
, vol.12
, pp. 1729-1770
-
-
Ross, S.1
Pineau, J.2
Chaib-draa, B.3
Kreitmann, P.4
-
38
-
-
52249086942
-
Online planning algorithms for POMDPs
-
Ross, S., Pineau, J., Paquet, S., & Chaib-Draa, B. (2008). Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 32(1), 663-704.
-
(2008)
Journal of Artificial Intelligence Research
, vol.32
, Issue.1
, pp. 663-704
-
-
Ross, S.1
Pineau, J.2
Paquet, S.3
Chaib-Draa, B.4
-
42
-
-
56449110907
-
Sample-based learning and search with permanent and transient memories
-
ACM
-
Silver, D., Sutton, R. S., & Müller, M. (2008). Sample-based learning and search with permanent and transient memories. In Proceedings of the 25th international conference on Machine learning, pp. 968-975. ACM.
-
(2008)
Proceedings of the 25th International Conference on Machine Learning
, pp. 968-975
-
-
Silver, D.1
Sutton, R.S.2
Müller, M.3
-
44
-
-
73549084301
-
Reinforcement learning in finite MDPs: PAC analysis
-
Strehl, A., Li, L., & Littman, M. (2009). Reinforcement learning in finite MDPs: PAC analysis. The Journal of Machine Learning Research, 10, 2413-2444.
-
(2009)
The Journal of Machine Learning Research
, vol.10
, pp. 2413-2444
-
-
Strehl, A.1
Li, L.2
Littman, M.3
-
46
-
-
85132026293
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
-
Citeseer
-
Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, Vol. 216, p. 224. Citeseer.
-
(1990)
Proceedings of the Seventh International Conference on Machine Learning
, vol.216
, pp. 224
-
-
Sutton, R.1
-
48
-
-
77955824148
-
-
Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers
-
Szepesvári, C. (2010). Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers.
-
(2010)
Algorithms for Reinforcement Learning
-
-
Szepesvári, C.1
-
49
-
-
0001395850
-
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
-
Thompson, W. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285-294.
-
(1933)
Biometrika
, vol.25
, Issue.3-4
, pp. 285-294
-
-
Thompson, W.1
-
51
-
-
84873601966
-
Monte carlo tree search for bayesian reinforcement learning
-
IEEE
-
Vien, N. A., & Ertel, W. (2012). Monte carlo tree search for bayesian reinforcement learning. In Machine Learning and Applications (ICMLA), 2012 11th International Conference on, Vol. 1, pp. 138-143. IEEE.
-
(2012)
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
, vol.1
, pp. 138-143
-
-
Vien, N.A.1
Ertel, W.2
-
53
-
-
31844436266
-
Bayesian sparse sampling for on-line reward optimization
-
Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. In Proceedings of the 22nd International Conference on Machine learning, pp. 956-963.
-
(2005)
Proceedings of the 22nd International Conference on Machine Learning
, pp. 956-963
-
-
Wang, T.1
Lizotte, D.2
Bowling, M.3
Schuurmans, D.4
-
54
-
-
84867122397
-
Monte Carlo Bayesian reinforcement learning
-
Wang, Y., Won, K., Hsu, D., & Lee, W. (2012). Monte Carlo Bayesian reinforcement learning. In Proceedings of the 29th International Conference on Machine Learning.
-
(2012)
Proceedings of the 29th International Conference on Machine Learning
-
-
Wang, Y.1
Won, K.2
Hsu, D.3
Lee, W.4
-
56
-
-
84881042664
-
Bayesian policy search with policy priors
-
Wingate, D., Goodman, N., Roy, D., Kaelbling, L., & Tenenbaum, J. (2011). Bayesian policy search with policy priors. In Proceedings of the International Joint Conferences on Artificial Intelligence.
-
(2011)
Proceedings of the International Joint Conferences on Artificial Intelligence
-
-
Wingate, D.1
Goodman, N.2
Roy, D.3
Kaelbling, L.4
Tenenbaum, J.5
|