SCOPUS 정보 검색 플랫폼

Journal of Computer and System Sciences

Volumn 74, Issue 8, 2008, Pages 1309-1331

An analysis of model-based Interval Estimation for Markov Decision Processes

(2) Strehl, Alexander L a Littman, Michael L b

a YAHOO INC (United States)

b RUTGERS UNIVERSITY (United States)

Author keywords

Learning theory; Markov Decision Processes; Reinforcement learning

Indexed keywords

DECISION THEORY; LEARNING ALGORITHMS; MARKOV PROCESSES;

AVERAGE LOSS; BALANCING EXPLORATION AND EXPLOITATIONS; INTERVAL ESTIMATION; LEARNING THEORY; MARKOV DECISION PROCESSES; MODEL-BASED OPC; NEAR-OPTIMAL POLICIES; PERFORMANCE METRICES;

REINFORCEMENT LEARNING;

EID: 55549110436 PISSN: 00220000 EISSN: 10902724 Source Type: Journal
DOI: 10.1016/j.jcss.2007.08.009 Document Type: Article

Times cited : (555)

References (25)

1
- 0041966002
- Using confidence bounds for exploitation-exploration trade-offs
- Auer P. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3 (2002) 397-422
- (2002) J. Mach. Learn. Res. , vol.3 , pp. 397-422
- Auer, P.¹

2
- 55549083745
- P. Auer, R. Ortner, Online regret bounds for a new reinforcement learning algorithm, in: First Austrian Cognitive Vision Workshop, 2005, pp. 35-42
- P. Auer, R. Ortner, Online regret bounds for a new reinforcement learning algorithm, in: First Austrian Cognitive Vision Workshop, 2005, pp. 35-42

3
- 0041965975
- R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning
- Brafman R.I., and Tennenholtz M. R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3 (2002) 213-231
- (2002) J. Mach. Learn. Res. , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

4
- 84937398609
- E. Even-Dar, S. Mannor, Y. Mansour, PAC bounds for multi-armed bandit and Markov decision processes, in: 15th Annual Conference on Computational Learning Theory (COLT), 2002, pp. 255-270
- E. Even-Dar, S. Mannor, Y. Mansour, PAC bounds for multi-armed bandit and Markov decision processes, in: 15th Annual Conference on Computational Learning Theory (COLT), 2002, pp. 255-270

5
- 1942421149
- E. Even-Dar, S. Mannor, Y. Mansour, Action elimination and stopping conditions for reinforcement learning, in: The Twentieth International Conference on Machine Learning (ICML 2003), 2003, pp. 162-169
- E. Even-Dar, S. Mannor, Y. Mansour, Action elimination and stopping conditions for reinforcement learning, in: The Twentieth International Conference on Machine Learning (ICML 2003), 2003, pp. 162-169

6
- 55549099461
- C.-N. Fiechter, Expected mistake bound model for on-line reinforcement learning, in: Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 116-124
- C.-N. Fiechter, Expected mistake bound model for on-line reinforcement learning, in: Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 116-124

7
- 55549113709
- P.W.L. Fong, A quantitative study of hypothesis selection, in: Proceedings of the Twelfth International Conference on Machine Learning (ICML-95), 1995, pp. 226-234
- P.W.L. Fong, A quantitative study of hypothesis selection, in: Proceedings of the Twelfth International Conference on Machine Learning (ICML-95), 1995, pp. 226-234

8
- 0034272032
- Bounded-parameter Markov decision processes
- Givan R., Leach S., and Dean T. Bounded-parameter Markov decision processes. Artificial Intelligence 122 1-2 (2000) 71-109
- (2000) Artificial Intelligence , vol.122 , Issue.1-2 , pp. 71-109
- Givan, R.¹ Leach, S.² Dean, T.³

9
- 0004280606
- The MIT Press, Cambridge, MA
- Kaelbling L.P. Learning in Embedded Systems (1993), The MIT Press, Cambridge, MA
- (1993) Learning in Embedded Systems
- Kaelbling, L.P.¹

10
- 55549141728
- S.M. Kakade, On the sample complexity of reinforcement learning, PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003
- S.M. Kakade, On the sample complexity of reinforcement learning, PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003

11
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- Kearns M.J., and Singh S.P. Near-optimal reinforcement learning in polynomial time. Machine Learning 49 2-3 (2002) 209-232
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
- Kearns, M.J.¹ Singh, S.P.²

12
- 0003879107
- The MIT Press, Cambridge, MA
- Kearns M.J., and Vazirani U.V. An Introduction to Computational Learning Theory (1994), The MIT Press, Cambridge, MA
- (1994) An Introduction to Computational Learning Theory
- Kearns, M.J.¹ Vazirani, U.V.²

13
- 0000854435
- Adaptive treatment allocation and the multi-armed bandit problem
- Lai T.L. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15 3 (1987) 1091-1114
- (1987) Ann. Statist. , vol.15 , Issue.3 , pp. 1091-1114
- Lai, T.L.¹

14
- 13244260002
- A. Nilim, L.E. Ghaoui, Robustness in Markov decision problems with uncertain transition matrices, in: Advances in Neural Information Processing Systems 16 (NIPS-03), 2004
- A. Nilim, L.E. Ghaoui, Robustness in Markov decision problems with uncertain transition matrices, in: Advances in Neural Information Processing Systems 16 (NIPS-03), 2004

15
- 0003998452
- John Wiley & Sons, Inc., New York, NY
- Puterman M.L. Markov Decision Processes-Discrete Stochastic Dynamic Programming (1994), John Wiley & Sons, Inc., New York, NY
- (1994) Markov Decision Processes-Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

16
- 55549119838
- M.J. Streeter, S.F. Smith, A simple distribution-free approach to the max k-armed bandit problem, in: CP 2006: Proceedings of the Twelfth International Conference on Principles and Practice of Constraint Programming, 2006
- M.J. Streeter, S.F. Smith, A simple distribution-free approach to the max k-armed bandit problem, in: CP 2006: Proceedings of the Twelfth International Conference on Principles and Practice of Constraint Programming, 2006

17
- 34548745051
- A.L. Strehl, L. Li, M.L. Littman, Incremental model-based learners with formal learning-time guarantees, in: UAI-06: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, 2006, pp. 485-493
- A.L. Strehl, L. Li, M.L. Littman, Incremental model-based learners with formal learning-time guarantees, in: UAI-06: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, 2006, pp. 485-493

18
- 34250700033
- A.L. Strehl, L. Li, E. Wiewiora, J. Langford, M.L. Littman, PAC model-free reinforcement learning, in: ICML-06: Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 881-888
- A.L. Strehl, L. Li, E. Wiewiora, J. Langford, M.L. Littman, PAC model-free reinforcement learning, in: ICML-06: Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 881-888

19
- 16244391087
- A.L. Strehl, M.L. Littman, An empirical evaluation of interval estimation for Markov decision processes, in: The 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2004), 2004, pp. 128-135
- A.L. Strehl, M.L. Littman, An empirical evaluation of interval estimation for Markov decision processes, in: The 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2004), 2004, pp. 128-135

20
- 31844432138
- A.L. Strehl, M.L. Littman, A theoretical analysis of model-based interval estimation, in: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML-05), 2005, pp. 857-864
- A.L. Strehl, M.L. Littman, A theoretical analysis of model-based interval estimation, in: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML-05), 2005, pp. 857-864

21
- 0004102479
- The MIT Press
- Sutton R.S., and Barto A.G. Reinforcement Learning: An Introduction (1998), The MIT Press
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

22
- 0021518106
- A theory of the learnable
- Valiant L.G. A theory of the learnable. Comm. ACM 27 11 (1984) 1134-1142
- (1984) Comm. ACM , vol.27 , Issue.11 , pp. 1134-1142
- Valiant, L.G.¹

23
- 55549133483
- T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, M.J. Weinberger, Inequalities for the L1 deviation of the empirical distribution, Tech. Rep. HPL-2003-97R1, Hewlett-Packard Labs, 2003
- T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, M.J. Weinberger, Inequalities for the L1 deviation of the empirical distribution, Tech. Rep. HPL-2003-97R1, Hewlett-Packard Labs, 2003

24
- 55549143204
- M. Wiering, J. Schmidhuber, Efficient model-based exploration, in: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (SAB'98), 1998, pp. 223-228
- M. Wiering, J. Schmidhuber, Efficient model-based exploration, in: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (SAB'98), 1998, pp. 223-228

25
- 55549109611
- J.L. Wyatt, Exploration control in reinforcement learning using optimistic model selection, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), 2001, pp. 593-600
- J.L. Wyatt, Exploration control in reinforcement learning using optimistic model selection, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), 2001, pp. 593-600

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.