SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference

Volumn , Issue , 2008, Pages

Optimistic Linear Programming gives logarithmic regret for irreducible MDPs

(2) Tewari, Ambuj a Bartlett, Peter L a

a UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; MARKOV PROCESSES;

AVERAGE REWARD; LINEAR PROGRAMS; LINEAR-PROGRAMMING; MARKOV DECISION PROCESSES; OPTIMAL POLICIES; OPTIMISTICS; REGRET BOUNDS; SIMPLE++; STATE TRANSITION PROBABILITIES; TRANSITION PROBABILITIES;

LINEAR PROGRAMMING;

EID: 85162041468 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (84)

References (9)

1
- 0031070051
- Optimal adaptive policies for markov decision processes
- Burnetas, A.N. & Katehakis, M.N. (1997) Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research 22(1):222-255 (Pubitemid 127621321)
- (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
- Burnetas, A.N.¹ Katehakis, M.N.²

2
- 56449090814
- Logarithmic online regret bounds for undiscounted reinforcement learning
- Cambridge, MA: MIT Press
- Auer, P. & Ortner, R. (2007) Logarithmic online regret bounds for undiscounted reinforcement learning. Advances in Neural Information Processing Systems 19. Cambridge, MA: MIT Press.
- (2007) Advances in Neural Information Processing Systems , vol.19
- Auer, P.¹ Ortner, R.²

3
- 0002899547
- Asymptotically efficient adaptive allocation rules
- Lai, T.L. & Robbins, H. (1985) Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6(1):4-22.
- (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

4
- 0041965975
- R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
- Brafman, R.I. & Tennenholtz, M. (2002) R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3:213-231.
- (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

5
- 0041966002
- Using confidence bounds for exploitation-exploration trade-offs
- Auer, P. (2002) Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3:397-422.
- (2002) Journal of Machine Learning Research , vol.3 , pp. 397-422
- Auer, P.¹

6
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- DOI 10.1023/A:1013689704352, Computational Learning Theory
- Auer, P., Cesa-Bianchi, N. & and Fischer, P. (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3):235-256. (Pubitemid 34126111)
- (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

7
- 31844432138
- A theoretical analysis of model-based interval estimation
- ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
- Strehl, A.L. & Littman, M. (2005) A theoretical analysis of model-based interval estimation. In Proceedings of the Twenty-Second International Conference on Machine Learning, pp. 857-864. ACM Press. (Pubitemid 43183415)
- (2005) ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning , pp. 857-864
- Strehl, A.L.¹ Littman, M.L.²

8
- 84858757495
- PhD thesis, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley
- Tewari, A. (2007) Reinforcement Learning in Large or Unknown MDPs. PhD thesis, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley.
- (2007) Reinforcement Learning in Large or Unknown MDPs
- Tewari, A.¹

9
- 85102627959
- New York: John Wiley and Sons
- Puterman, M.L. (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: John Wiley and Sons.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.