SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn , Issue , 2013, Pages

(More) efficient reinforcement learning via posterior sampling

(3) Osband, Ian a Van Roy, Benjamin a Russo, Daniel a

a STANFORD UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; MARKOV PROCESSES;

ACTION SPACES; CARDINALITIES; COMPUTATIONALLY EFFICIENT; MARKOV DECISION PROCESSES; PRIOR DISTRIBUTION; PRIOR KNOWLEDGE; REGRET BOUNDS; STATE OF THE ART;

REINFORCEMENT LEARNING;

EID: 84899019264 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (502)

References (22)

1
- 0031070051
- Optimal adaptive policies for markov decision processes
- A. N. Burnetas and M. N. Katehakis. Optimal adaptive policies for markov decision processes. Mathematics of Operations Research, 22(1):222-255, 1997.
- (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
- Burnetas, A.N.¹ Katehakis, M.N.²

2
- 0003691637
- Prentice-Hall, Inc.
- P. R. Kumar and P. Varaiya. Stochastic systems: estimation, identification and adaptive control. Prentice-Hall, Inc., 1986.
- (1986) Stochastic Systems: Estimation, Identification and Adaptive Control
- Kumar, P.R.¹ Varaiya, P.²

3
- 0002899547
- Asymptotically efficient adaptive allocation rules
- T.L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4-22, 1985.
- (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

4
- 77951952841
- Near-optimal regret bounds for reinforcement learning
- T. Jaksch, R. Ortner, and P. Auer. Near-optimal regret bounds for reinforcement learning. The Journal of Machine Learning Research, 99:1563-1600, 2010.
- (2010) The Journal of Machine Learning Research , vol.99 , pp. 1563-1600
- Jaksch, T.¹ Ortner, R.² Auer, P.³

5
- 80053161827
- Regal: A regularization based algorithm for reinforcement learning in weakly communicating mdps
- AUAI Press
- P. L. Bartlett and A. Tewari. Regal: A regularization based algorithm for reinforcement learning in weakly communicating mdps. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 35-42. AUAI Press, 2009.
- (2009) Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence , pp. 35-42
- Bartlett, P.L.¹ Tewari, A.²

6
- 0041965975
- R-max-A general polynomial time algorithm for nearoptimal reinforcement learning
- R. I. Brafman and M. Tennenholtz. R-max-A general polynomial time algorithm for nearoptimal reinforcement learning. The Journal of Machine Learning Research, 3:213-231, 2003.
- (2003) The Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

7
- 23244466805
- PhD thesis, University of London
- S. M. Kakade. On the sample complexity of reinforcement learning. PhD thesis, University of London, 2003.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.M.¹

8
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
- Kearns, M.¹ Singh, S.²

9
- 0001395850
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
- W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285-294, 1933.
- (1933) Biometrika , vol.25 , Issue.3-4 , pp. 285-294
- Thompson, W.R.¹

10
- 85162416700
- An empirical evaluation of Thompson sampling
- O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In Neural Information Processing Systems (NIPS), 2011.
- (2011) Neural Information Processing Systems (NIPS)
- Chapelle, O.¹ Li, L.²

11
- 78650505735
- A modern Bayesian look at the multi-Armed bandit
- S.L. Scott. A modern Bayesian look at the multi-Armed bandit. Applied Stochastic Models in Business and Industry, 26(6):639-658, 2010.
- (2010) Applied Stochastic Models in Business and Industry , vol.26 , Issue.6 , pp. 639-658
- Scott, S.L.¹

12
- 84898944381
- arXiv preprint arXiv:1209.3353
- S. Agrawal and N. Goyal. Further optimal regret bounds for Thompson sampling. arXiv preprint arXiv:1209.3353, 2012.
- (2012) Further Optimal Regret Bounds for Thompson Sampling
- Agrawal, S.¹ Goyal, N.²

13
- 84979873896
- arXiv preprint arXiv:1209.3352
- S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. arXiv preprint arXiv:1209.3352, 2012.
- (2012) Thompson Sampling for Contextual Bandits with Linear Payoffs
- Agrawal, S.¹ Goyal, N.²

14
- 84887459202
- Thompson sampling: An asymptotically optimal finite time analysis
- E. Kauffmann, N. Korda, and R. Munos. Thompson sampling: an asymptotically optimal finite time analysis. In International Conference on Algorithmic Learning Theory, 2012.
- (2012) International Conference on Algorithmic Learning Theory
- Kauffmann, E.¹ Korda, N.² Munos, R.³

15
- 84893254104
- Learning to optimize via posterior sampling
- abs/1301.2609
- D. Russo and B. Van Roy. Learning to optimize via posterior sampling. CoRR, abs/1301.2609, 2013.
- (2013) CoRR
- Russo, D.¹ Van Roy, B.²

16
- 14344258433
- A Bayesian framework for reinforcement learning
- M. Strens. A Bayesian framework for reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning, pages 943-950, 2000.
- (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 943-950
- Strens, M.¹

17
- 71149109483
- Near-Bayesian exploration in polynomial time
- ACM
- J. Z. Kolter and A. Y. Ng. Near-Bayesian exploration in polynomial time. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 513-520. ACM, 2009.
- (2009) Proceedings of the 26th Annual International Conference on Machine Learning , pp. 513-520
- Kolter, J.Z.¹ Ng, A.Y.²

18
- 31844436266
- Bayesian sparse sampling for on-line reward optimization
- ACM
- T. Wang, D. Lizotte, M. Bowling, and D. Schuurmans. Bayesian sparse sampling for on-line reward optimization. In Proceedings of the 22nd international conference on Machine learning, pages 956-963. ACM, 2005.
- (2005) Proceedings of the 22nd International Conference on Machine Learning , pp. 956-963
- Wang, T.¹ Lizotte, D.² Bowling, M.³ Schuurmans, D.⁴

19
- 84898960216
- arXiv preprint arXiv:1205. 3109
- A. Guez, D. Silver, and P. Dayan. Efficient bayes-Adaptive reinforcement learning using samplebased search. arXiv preprint arXiv:1205.3109, 2012.
- (2012) Efficient Bayes-Adaptive Reinforcement Learning Using Samplebased Search
- Guez, A.¹ Silver, D.² Dayan, P.³

20
- 84896062754
- Approaching bayes-optimalilty using monte-carlo tree search
- J. Asmuth and M. L. Littman. Approaching bayes-optimalilty using monte-carlo tree search. In Proc. 21st Int. Conf. Automat. Plan. Sched., Freiburg, Germany, 2011.
- (2011) Proc. 21st Int. Conf. Automat. Plan. Sched., Freiburg, Germany
- Asmuth, J.¹ Littman, M.L.²

21
- 55549110436
- An analysis of model-based interval estimation for markov decision processes
- A. L. Strehl and M. L. Littman. An analysis of model-based interval estimation for markov decision processes. Journal of Computer and System Sciences, 74(8):1309-1331, 2008.
- (2008) Journal of Computer and System Sciences , vol.74 , Issue.8 , pp. 1309-1331
- Strehl, A.L.¹ Littman, M.L.²

22
- 84898972955
- Optimism in reinforcement learning based on kullbackleibler divergence
- abs/1004.5229
- S. Filippi, O. Cappé, and A. Garivier. Optimism in reinforcement learning based on kullbackleibler divergence. CoRR, abs/1004.5229, 2010.
- (2010) CoRR
- Filippi, S.¹ Cappé, O.² Garivier, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.