-
2
-
-
78649507911
-
A Bayesian sampling approach to exploration in reinforcement learning
-
AUAI Press
-
Asmuth, John, Li, Lihong, Littman, Michael L, Nouri, Ali, and Wingate, David. A Bayesian sampling approach to exploration in reinforcement learning. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 19-26. AUAI Press, 2009.
-
(2009)
Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
, pp. 19-26
-
-
Asmuth, J.1
Li, L.2
Littman, M.L.3
Nouri, A.4
Wingate, D.5
-
4
-
-
84938011869
-
On adaptive control processes
-
Bellman, Richard and Kalaba, Robert. On adaptive control processes. IRE Transactions on Automatic Control, 4(2): 1-9, 1959.
-
(1959)
IRE Transactions on Automatic, Control
, vol.4
, Issue.2
, pp. 1-9
-
-
Bellman, R.1
Kalaba, R.2
-
5
-
-
0041965975
-
R-max - A general polynomial time algorithm for near-optimal reinforcement learning
-
Brafman, Ronen I. and Tennenholtz, Moshe. R-max - A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231, 2002.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
6
-
-
0007298971
-
Sub-Gaussian random variables
-
Buldygin, Valerii V and Kozachenko, Y V. Sub-gaussian random variables. Ukrainian Mathematical Journal, 32 (6):483-489, 1980.
-
(1980)
Ukrainian Mathematical, Journal
, vol.32
, Issue.6
, pp. 483-489
-
-
Buldygin, V.V.1
Kozachenko, Y.V.2
-
7
-
-
0031070051
-
Optimal adaptive policies for markov decision processes
-
Burnetas, Apostolos N and Katehakis, Michael N. Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, 22(1):222-255, 1997.
-
(1997)
Mathematics of Operations Research
, vol.22
, Issue.1
, pp. 222-255
-
-
Burnetas, A.N.1
Katehakis, M.N.2
-
8
-
-
84898072179
-
Stochastic linear optimization under bandit feedback
-
Dani, Varsha, Hayes, Thomas P., and Kakadc, Sham M. Stochastic linear optimization under bandit feedback. In COLT, pp. 355-366, 2008.
-
(2008)
COLT
, pp. 355-366
-
-
Dani, V.1
Hayes, T.P.2
Kakadc, S.M.3
-
10
-
-
79952428999
-
Optimism in reinforcement learning and kullback-leibler divergence
-
IEEE
-
Filippi, Sarah, Cappe, Olivier, and Garivier, Aurdlien. Optimism in reinforcement learning and kullback-leibler divergence. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pp. 115-122. IEEE, 2010.
-
(2010)
Communication, Control, and Computing (Allerton), 2010 48th Annual, Allerton Conference on
, pp. 115-122
-
-
Filippi, S.1
Cappe, O.2
Garivier, A.3
-
11
-
-
85047021517
-
An optimistic posterior sampling strategy for Bayesian reinforcement learning
-
Fonteneau, Raphael, Korda, Nathan, and Munos, Remi. An optimistic posterior sampling strategy for Bayesian reinforcement learning. In NIPS 2013 Workshop on Bayesian Optimization (Bayes0pt2013), 2013.
-
(2013)
NIPS 2013 Workshop on Bayesian Optimization, (Bayes0pt2013)
-
-
Fonteneau, R.1
Korda, N.2
Munos, R.3
-
14
-
-
77951952841
-
Near- optimal regret bounds for reinforcement learning
-
Jaksch, Thomas, Ortner, Ronald, and Auer, Peter. Near- optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11:1563-1600, 2010.
-
(2010)
Journal of Machine Learning Research
, vol.11
, pp. 1563-1600
-
-
Jaksch, T.1
Ortner, R.2
Auer, P.3
-
16
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Kearns, Michael J. and Singh, Satinder P. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 209-232
-
-
Kearns, M.J.1
Singh, S.P.2
-
25
-
-
84899019264
-
(More) Efficient reinforcement learning via posterior sampling
-
Curran Associates, Inc
-
Osband, Ian, Russo, Daniel, and Van Roy, Benjamin. (More) Efficient reinforcement learning via posterior sampling. In NIPS, pp. 3003-3011. Curran Associates, Inc., 2013.
-
(2013)
NIPS
, pp. 3003-3011
-
-
Osband, I.1
Russo, D.2
Van Roy Benjamin3
-
27
-
-
84908172119
-
Learning to optimize via posterior sampling
-
Russo, Daniel and Van Roy, Benjamin. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4): 1221-1243, 2014.
-
(2014)
Mathematics of Operations Research
, vol.39
, Issue.4
, pp. 1221-1243
-
-
Russo, D.1
Van Roy, B.2
-
29
-
-
34250700033
-
PAC model-free reinforcement learning
-
Strehl, Alexander L, Li, Lihong, Wiewiora, Eric, Lang-Ford, John, and Littman, Michael L. PAC model-free reinforcement learning. In ICML, pp. 881-888, 2006.
-
(2006)
ICML
, pp. 881-888
-
-
Strehl, A.L.1
Li, L.2
Wiewiora, E.3
Lang-Ford, J.4
Littman, M.L.5
-
30
-
-
14344258433
-
A Bayesian framework for reinforcement learning
-
Strens, Malcolm J. A. A Bayesian framework for reinforcement learning. In ICML, pp. 943-950, 2000.
-
(2000)
ICML
, pp. 943-950
-
-
Strens, M.J.A.1
-
33
-
-
0001395850
-
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
-
Thompson, W.R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285-294, 1933.
-
(1933)
Biometrika
, vol.25
, Issue.3-4
, pp. 285-294
-
-
Thompson, W.R.1
-
34
-
-
85042936847
-
Bayesian reinforcement learning
-
Springer
-
Vlassis, Nikos, Ghavamzadeh, Mohammad, Mannor, Shie, and Poupart, Pascal. Bayesian reinforcement learning. In Reinforcement Learning, pp. 359-386. Springer, 2012.
-
(2012)
Reinforcement Learning
, pp. 359-386
-
-
Vlassis, N.1
Ghavamzadeh, M.2
Mannor, S.3
Poupart, P.4
|