메뉴 건너뛰기




Volumn 382, Issue , 2009, Pages

The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

BENCHMARK DOMAINS; FEATURE SELECTION; LOWER BOUNDS; NAVIGATION PROBLEM; STATE-OF-THE-ART ALGORITHMS; STRUCTURE-LEARNING; UPPER BOUND;

EID: 70049104382     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1553374.1553406     Document Type: Conference Paper
Times cited : (8)

References (21)
  • 1
    • 33747670266 scopus 로고    scopus 로고
    • Learning factor graphs in polynomial time and sample complexity
    • Abbeel, P., Koller, D., & Ng, A. Y. (2006). Learning factor graphs in polynomial time and sample complexity. Journal of Machine Learning Research, 7, 1743-1788.
    • (2006) Journal of Machine Learning Research , vol.7 , pp. 1743-1788
    • Abbeel, P.1    Koller, D.2    Ng, A.Y.3
  • 2
    • 0346942368 scopus 로고    scopus 로고
    • Decisiontheoretic planning: Structural assumptions and computational leverage
    • Boutilier, C., Dean, T., & Hanks, S. (1999). Decisiontheoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1-94.
    • (1999) Journal of Artificial Intelligence Research , vol.11 , pp. 1-94
    • Boutilier, C.1    Dean, T.2    Hanks, S.3
  • 3
    • 0041965975 scopus 로고    scopus 로고
    • R-max a general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman, R. I., & Tennenholtz, M. (2002). R-max a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213-231.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 6
    • 84990553353 scopus 로고
    • A model for reasoning about persistence and causation
    • Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142-150.
    • (1989) Computational Intelligence , vol.5 , pp. 142-150
    • Dean, T.1    Kanazawa, K.2
  • 8
    • 23244466805 scopus 로고    scopus 로고
    • Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London, UK
    • Kakade, S. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London, UK.
    • (2003) On the sample complexity of reinforcement learning
    • Kakade, S.1
  • 10
  • 11
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Kearns, M. J., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209-232.
    • (2002) Machine Learning , vol.49 , pp. 209-232
    • Kearns, M.J.1    Singh, S.P.2
  • 13
    • 70049090614 scopus 로고    scopus 로고
    • Li, L. (2009). A unifying framework for computational reinforcement learning theory. Doctoral dissertation, Department of Computer Science, Rutgers University, New Brunswick, NJ. Li, L., Littman, M. L., & Walsh, T. J. (2008). Knows whatit knows: A framework for self-aware learning. Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML-08) (pp. 568-575).
    • Li, L. (2009). A unifying framework for computational reinforcement learning theory. Doctoral dissertation, Department of Computer Science, Rutgers University, New Brunswick, NJ. Li, L., Littman, M. L., & Walsh, T. J. (2008). Knows whatit knows: A framework for self-aware learning. Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML-08) (pp. 568-575).
  • 14
    • 30044441333 scopus 로고    scopus 로고
    • The sample complexity of exploration in the multi-armed bandit problem
    • Mannor, S., & Tsitsiklis, J. N. (2004). The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5, 623- 648.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 623-648
    • Mannor, S.1    Tsitsiklis, J.N.2
  • 20
    • 0021518106 scopus 로고
    • A theory of the learnable
    • Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27, 1134-1142.
    • (1984) Communications of the ACM , vol.27 , pp. 1134-1142
    • Valiant, L.G.1
  • 21
    • 0000819141 scopus 로고
    • A learning criterion for stochastic rules
    • Yamanishi, K. (1992). A learning criterion for stochastic rules. Machine Learning, 9, 165-203.
    • (1992) Machine Learning , vol.9 , pp. 165-203
    • Yamanishi, K.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.