메뉴 건너뛰기




Volumn 74, Issue 8, 2008, Pages 1309-1331

An analysis of model-based Interval Estimation for Markov Decision Processes

Author keywords

Learning theory; Markov Decision Processes; Reinforcement learning

Indexed keywords

DECISION THEORY; LEARNING ALGORITHMS; MARKOV PROCESSES;

EID: 55549110436     PISSN: 00220000     EISSN: 10902724     Source Type: Journal    
DOI: 10.1016/j.jcss.2007.08.009     Document Type: Article
Times cited : (555)

References (25)
  • 1
    • 0041966002 scopus 로고    scopus 로고
    • Using confidence bounds for exploitation-exploration trade-offs
    • Auer P. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3 (2002) 397-422
    • (2002) J. Mach. Learn. Res. , vol.3 , pp. 397-422
    • Auer, P.1
  • 2
    • 55549083745 scopus 로고    scopus 로고
    • P. Auer, R. Ortner, Online regret bounds for a new reinforcement learning algorithm, in: First Austrian Cognitive Vision Workshop, 2005, pp. 35-42
    • P. Auer, R. Ortner, Online regret bounds for a new reinforcement learning algorithm, in: First Austrian Cognitive Vision Workshop, 2005, pp. 35-42
  • 3
    • 0041965975 scopus 로고    scopus 로고
    • R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman R.I., and Tennenholtz M. R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3 (2002) 213-231
    • (2002) J. Mach. Learn. Res. , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 4
    • 84937398609 scopus 로고    scopus 로고
    • E. Even-Dar, S. Mannor, Y. Mansour, PAC bounds for multi-armed bandit and Markov decision processes, in: 15th Annual Conference on Computational Learning Theory (COLT), 2002, pp. 255-270
    • E. Even-Dar, S. Mannor, Y. Mansour, PAC bounds for multi-armed bandit and Markov decision processes, in: 15th Annual Conference on Computational Learning Theory (COLT), 2002, pp. 255-270
  • 5
    • 1942421149 scopus 로고    scopus 로고
    • E. Even-Dar, S. Mannor, Y. Mansour, Action elimination and stopping conditions for reinforcement learning, in: The Twentieth International Conference on Machine Learning (ICML 2003), 2003, pp. 162-169
    • E. Even-Dar, S. Mannor, Y. Mansour, Action elimination and stopping conditions for reinforcement learning, in: The Twentieth International Conference on Machine Learning (ICML 2003), 2003, pp. 162-169
  • 6
    • 55549099461 scopus 로고    scopus 로고
    • C.-N. Fiechter, Expected mistake bound model for on-line reinforcement learning, in: Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 116-124
    • C.-N. Fiechter, Expected mistake bound model for on-line reinforcement learning, in: Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 116-124
  • 7
    • 55549113709 scopus 로고    scopus 로고
    • P.W.L. Fong, A quantitative study of hypothesis selection, in: Proceedings of the Twelfth International Conference on Machine Learning (ICML-95), 1995, pp. 226-234
    • P.W.L. Fong, A quantitative study of hypothesis selection, in: Proceedings of the Twelfth International Conference on Machine Learning (ICML-95), 1995, pp. 226-234
  • 8
    • 0034272032 scopus 로고    scopus 로고
    • Bounded-parameter Markov decision processes
    • Givan R., Leach S., and Dean T. Bounded-parameter Markov decision processes. Artificial Intelligence 122 1-2 (2000) 71-109
    • (2000) Artificial Intelligence , vol.122 , Issue.1-2 , pp. 71-109
    • Givan, R.1    Leach, S.2    Dean, T.3
  • 10
    • 55549141728 scopus 로고    scopus 로고
    • S.M. Kakade, On the sample complexity of reinforcement learning, PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003
    • S.M. Kakade, On the sample complexity of reinforcement learning, PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003
  • 11
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Kearns M.J., and Singh S.P. Near-optimal reinforcement learning in polynomial time. Machine Learning 49 2-3 (2002) 209-232
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
    • Kearns, M.J.1    Singh, S.P.2
  • 13
    • 0000854435 scopus 로고
    • Adaptive treatment allocation and the multi-armed bandit problem
    • Lai T.L. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15 3 (1987) 1091-1114
    • (1987) Ann. Statist. , vol.15 , Issue.3 , pp. 1091-1114
    • Lai, T.L.1
  • 14
    • 13244260002 scopus 로고    scopus 로고
    • A. Nilim, L.E. Ghaoui, Robustness in Markov decision problems with uncertain transition matrices, in: Advances in Neural Information Processing Systems 16 (NIPS-03), 2004
    • A. Nilim, L.E. Ghaoui, Robustness in Markov decision problems with uncertain transition matrices, in: Advances in Neural Information Processing Systems 16 (NIPS-03), 2004
  • 16
    • 55549119838 scopus 로고    scopus 로고
    • M.J. Streeter, S.F. Smith, A simple distribution-free approach to the max k-armed bandit problem, in: CP 2006: Proceedings of the Twelfth International Conference on Principles and Practice of Constraint Programming, 2006
    • M.J. Streeter, S.F. Smith, A simple distribution-free approach to the max k-armed bandit problem, in: CP 2006: Proceedings of the Twelfth International Conference on Principles and Practice of Constraint Programming, 2006
  • 17
    • 34548745051 scopus 로고    scopus 로고
    • A.L. Strehl, L. Li, M.L. Littman, Incremental model-based learners with formal learning-time guarantees, in: UAI-06: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, 2006, pp. 485-493
    • A.L. Strehl, L. Li, M.L. Littman, Incremental model-based learners with formal learning-time guarantees, in: UAI-06: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, 2006, pp. 485-493
  • 18
    • 34250700033 scopus 로고    scopus 로고
    • A.L. Strehl, L. Li, E. Wiewiora, J. Langford, M.L. Littman, PAC model-free reinforcement learning, in: ICML-06: Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 881-888
    • A.L. Strehl, L. Li, E. Wiewiora, J. Langford, M.L. Littman, PAC model-free reinforcement learning, in: ICML-06: Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 881-888
  • 19
    • 16244391087 scopus 로고    scopus 로고
    • A.L. Strehl, M.L. Littman, An empirical evaluation of interval estimation for Markov decision processes, in: The 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2004), 2004, pp. 128-135
    • A.L. Strehl, M.L. Littman, An empirical evaluation of interval estimation for Markov decision processes, in: The 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2004), 2004, pp. 128-135
  • 20
    • 31844432138 scopus 로고    scopus 로고
    • A.L. Strehl, M.L. Littman, A theoretical analysis of model-based interval estimation, in: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML-05), 2005, pp. 857-864
    • A.L. Strehl, M.L. Littman, A theoretical analysis of model-based interval estimation, in: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML-05), 2005, pp. 857-864
  • 22
    • 0021518106 scopus 로고
    • A theory of the learnable
    • Valiant L.G. A theory of the learnable. Comm. ACM 27 11 (1984) 1134-1142
    • (1984) Comm. ACM , vol.27 , Issue.11 , pp. 1134-1142
    • Valiant, L.G.1
  • 23
    • 55549133483 scopus 로고    scopus 로고
    • T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, M.J. Weinberger, Inequalities for the L1 deviation of the empirical distribution, Tech. Rep. HPL-2003-97R1, Hewlett-Packard Labs, 2003
    • T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, M.J. Weinberger, Inequalities for the L1 deviation of the empirical distribution, Tech. Rep. HPL-2003-97R1, Hewlett-Packard Labs, 2003
  • 24
    • 55549143204 scopus 로고    scopus 로고
    • M. Wiering, J. Schmidhuber, Efficient model-based exploration, in: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (SAB'98), 1998, pp. 223-228
    • M. Wiering, J. Schmidhuber, Efficient model-based exploration, in: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (SAB'98), 1998, pp. 223-228
  • 25
    • 55549109611 scopus 로고    scopus 로고
    • J.L. Wyatt, Exploration control in reinforcement learning using optimistic model selection, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), 2001, pp. 593-600
    • J.L. Wyatt, Exploration control in reinforcement learning using optimistic model selection, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), 2001, pp. 593-600


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.