메뉴 건너뛰기




Volumn 4131 LNCS - I, Issue , 2006, Pages 850-859

Nearly optimal exploration-exploitation decision thresholds

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; DECISION MAKING; PROBLEM SOLVING;

EID: 33749818313     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/11840817_88     Document Type: Conference Paper
Times cited : (10)

References (14)
  • 1
    • 0003503387 scopus 로고
    • John Wiley & Sons Republished by Dover in 2004
    • Wald, A.: Sequential Analysis. John Wiley & Sons (1947) Republished by Dover in 2004.
    • (1947) Sequential Analysis
    • Wald, A.1
  • 3
    • 0004870746 scopus 로고
    • A problem in the sequential design of experiments
    • Bellman, R.E.: A problem in the sequential design of experiments. Sankhya 16 (1957) 221-229
    • (1957) Sankhya , vol.16 , pp. 221-229
    • Bellman, R.E.1
  • 4
    • 30044441333 scopus 로고    scopus 로고
    • The sample complexity of exploration in the multiarmed bandit problem
    • Marmor, S., Tsitsiklis, J.N.: The sample complexity of exploration in the multiarmed bandit problem. Journal of Machine Learning Research 5 (2004) 623-648
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 623-648
    • Marmor, S.1    Tsitsiklis, J.N.2
  • 6
    • 33745295134 scopus 로고    scopus 로고
    • Action elimination and stopping conditions for the multi-armed and reinforcement learning problems
    • to appear
    • Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed and reinforcement learning problems. Journal of Machine Learning Research (2006) to appear.
    • (2006) Journal of Machine Learning Research
    • Even-Dar, E.1    Mannor, S.2    Mansour, Y.3
  • 9
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and serni-MDPs: A framework for temporal abstraction in reinforcement learning
    • Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and serni-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1-2) (1999) 181-211
    • (1999) Artificial Intelligence , vol.112 , Issue.1-2 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.P.3
  • 10
    • 9444252980 scopus 로고    scopus 로고
    • The budgeted multi-armed bandit problem
    • Learning Theory: 17th Annual Conference on earning Theory, COLT 2004, Springer-Verlag
    • Madani, O., Lizotte, D.J., Greiner, R.: The budgeted multi-armed bandit problem. In: Learning Theory: 17th Annual Conference on earning Theory, COLT 2004. Volume 3120 of Lecture Notes in Computer Science., Springer-Verlag (2004) 643-645
    • (2004) Lecture Notes in Computer Science , vol.3120 , pp. 643-645
    • Madani, O.1    Lizotte, D.J.2    Greiner, R.3
  • 12
    • 33749851788 scopus 로고    scopus 로고
    • Models for trading exploration and exploitation using upper confidence bounds
    • PASCAL workshop on principled methods of trading exploration and exploitation
    • Auer, P.: Models for trading exploration and exploitation using upper confidence bounds, In: PASCAL workshop on principled methods of trading exploration and exploitation, PASCAL Network (2005)
    • (2005) PASCAL Network
    • Auer, P.1
  • 13
    • 0000626524 scopus 로고
    • Expected information as expected utility
    • Institute of Mathematical Statistics
    • Bernardo, J.M.: Expected information as expected utility, In: The Annals of Statistics. Volume 7., Institute of Mathematical Statistics (1979) 686-690
    • (1979) The Annals of Statistics , vol.7 , pp. 686-690
    • Bernardo, J.M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.