메뉴 건너뛰기




Volumn , Issue , 2010, Pages

Multi-armed bandits with episode context

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER GO; CONTEXTUAL BANDITS; MULTI ARMED BANDIT; SIDE INFORMATION; WORSTCASE;

EID: 84874141248     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (8)

References (30)
  • 1
    • 84898079018 scopus 로고    scopus 로고
    • Minimax policies for adversarial and stochastic bandits
    • Audibert, J.-Y., and Bubeck, S. 2009. Minimax policies for adversarial and stochastic bandits. In COLT.
    • (2009) COLT
    • Audibert, J.-Y.1    Bubeck, S.2
  • 2
    • 62949181077 scopus 로고    scopus 로고
    • Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    • Audibert, J.-Y.; Munos, R.; and Szepesvári, C. 2009. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci. 410(19):1876-1902.
    • (2009) Theor. Comput. Sci. , vol.410 , Issue.19 , pp. 1876-1902
    • Audibert, J.-Y.1    Munos, R.2    Szepesvári, C.3
  • 4
    • 0036568025 scopus 로고    scopus 로고
    • Finitetime analysis of the multiarmed bandit problem
    • Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finitetime analysis of the multiarmed bandit problem. Machine Learning 47(2-3):235-256.
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 5
    • 0035479281 scopus 로고    scopus 로고
    • Computer Go: An AI oriented survey
    • Bouzy, B., and Cazenave, T. 2001. Computer Go: An AI oriented survey. Artif. Intell. 132(1):39-103.
    • (2001) Artif. Intell. , vol.132 , Issue.1 , pp. 39-103
    • Bouzy, B.1    Cazenave, T.2
  • 6
    • 80053655011 scopus 로고    scopus 로고
    • Bayesian generation and integration of K-nearest-neighbor patterns for 19×19 Go
    • Bouzy, B., and Chaslot, G. 2005. Bayesian generation and integration of K-nearest-neighbor patterns for 19×19 Go. In IEEE Symposium on Computational Intelligence in Games, 176-181.
    • (2005) IEEE Symposium on Computational Intelligence in Games , pp. 176-181
    • Bouzy, B.1    Chaslot, G.2
  • 7
    • 24944458186 scopus 로고    scopus 로고
    • Monte-carlo go developments
    • van den Herik, H. J.; Iida, H.; and Heinz, E. A., eds., volume 263 of IFIP, Kluwer
    • Bouzy, B., and Helmstetter, B. 2003. Monte-Carlo Go developments. In van den Herik, H. J.; Iida, H.; and Heinz, E. A., eds., ACG, volume 263 of IFIP, 159-174. Kluwer.
    • (2003) ACG , pp. 159-174
    • Bouzy, B.1    Helmstetter, B.2
  • 8
    • 77952070805 scopus 로고    scopus 로고
    • Pure exploration in multi-armed bandits problems
    • Gavaldà, R.; Lugosi, G.; Zeugmann, T.; and Zilles, S., eds., ALT, Springer
    • Bubeck, S.; Munos, R.; and Stoltz, G. 2009. Pure exploration in multi-armed bandits problems. In Gavaldà, R.; Lugosi, G.; Zeugmann, T.; and Zilles, S., eds., ALT, volume 5809 of Lecture Notes in Computer Science, 23-37. Springer.
    • (2009) Lecture Notes in Computer Science , vol.5809 , pp. 23-37
    • Bubeck, S.1    Munos, R.2    Stoltz, G.3
  • 9
    • 34250005402 scopus 로고    scopus 로고
    • Computer go: A grand challenge to ai
    • Duch,W., and Mandziuk, J., eds., volume 63 of Studies in Computational Intelligence. Springer
    • Cai, X., and Wunsch, D. C. 2007. Computer Go: A grand challenge to AI. In Duch,W., and Mandziuk, J., eds., Challenges for Computational Intelligence, volume 63 of Studies in Computational Intelligence. Springer. 443-465.
    • (2007) Challenges for Computational Intelligence , pp. 443-465
    • Cai, X.1    Wunsch, D.C.2
  • 12
    • 38049037928 scopus 로고    scopus 로고
    • Efficient selectivity and backup operators in Monte-Carlo tree search
    • van den Herik, H. J.; Ciancarini, P.; and Donkers, H. H. L. M., eds., volume 4630 of Lecture Notes in Computer Science, Springer
    • Coulom, R. 2006. Efficient selectivity and backup operators in Monte-Carlo tree search. In van den Herik, H. J.; Ciancarini, P.; and Donkers, H. H. L. M., eds., Computers and Games, volume 4630 of Lecture Notes in Computer Science, 72-83. Springer.
    • (2006) Computers and Games , pp. 72-83
    • Coulom, R.1
  • 13
    • 70349287633 scopus 로고    scopus 로고
    • Computing Elo ratings of move patterns in the game of Go
    • Coulom, R. 2007a. Computing Elo ratings of move patterns in the game of Go. In Computer Games Workshop.
    • (2007) Computer Games Workshop
    • Coulom, R.1
  • 15
    • 70049104257 scopus 로고    scopus 로고
    • Bandit-based optimization on graphs with application to library performance tuning
    • Danyluk, A. P.; Bottou, L.; and Littman, M. L., eds., ICML,. ACM
    • de Mesmay, F.; Rimmel, A.; Voronenko, Y.; and Püschel, M. 2009. Bandit-based optimization on graphs with application to library performance tuning. In Danyluk, A. P.; Bottou, L.; and Littman, M. L., eds., ICML, volume 382 of ACM International Conference Proceeding Series, 92. ACM.
    • (2009) ACM International Conference Proceeding Series , vol.382 , pp. 92
    • De Mesmay, F.1    Rimmel, A.2    Voronenko, Y.3    Püschel, M.4
  • 16
    • 33745295134 scopus 로고    scopus 로고
    • Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
    • Even-Dar, E.; Mannor, S.; and Mansour, Y. 2006. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research 7:1079-1105.
    • (2006) Journal of Machine Learning Research , vol.7 , pp. 1079-1105
    • Even-Dar, E.1    Mannor, S.2    Mansour, Y.3
  • 17
    • 57749181518 scopus 로고    scopus 로고
    • Simulation-based approach to general game playing
    • 2008
    • Finnsson, H., and Björnsson, Y. 2008. Simulation-based approach to general game playing. In Fox and Gomes (2008), 259-264.
    • (2008) Fox and Gomes , pp. 259-264
    • Finnsson, H.1    Björnsson, Y.2
  • 19
    • 34547990649 scopus 로고    scopus 로고
    • Combining online and offline knowledge in UCT
    • Ghahramani, Z., ed. ICML. ACM
    • Gelly, S., and Silver, D. 2007. Combining online and offline knowledge in UCT. In Ghahramani, Z., ed., ICML, volume 227 of ACM International Conference Proceeding Series, 273-280. ACM.
    • (2007) ACM International Conference Proceeding Series , vol.227 , pp. 273-280
    • Gelly, S.1    Silver, D.2
  • 20
    • 57749091602 scopus 로고    scopus 로고
    • Achieving master level play in 9 × 9 computer Go
    • 2008
    • Gelly, S., and Silver, D. 2008. Achieving master level play in 9 × 9 computer Go. In Fox and Gomes (2008), 1537-1540.
    • (2008) Fox and Gomes , pp. 1537-1540
    • Gelly, S.1    Silver, D.2
  • 22
    • 56449104477 scopus 로고    scopus 로고
    • Efficient bandit algorithms for online multiclass prediction
    • Cohen, W. W.; McCallum, A.; and Roweis, S. T., eds., ICML. ACM
    • Kakade, S. M.; Shalev-Shwartz, S.; and Tewari, A. 2008. Efficient bandit algorithms for online multiclass prediction. In Cohen, W. W.; McCallum, A.; and Roweis, S. T., eds., ICML, volume 307 of ACM International Conference Proceeding Series, 440-447. ACM.
    • (2008) ACM International Conference Proceeding Series , vol.30 , pp. 440-447
    • Kakade, S.M.1    Shalev-Shwartz, S.2    Tewari, A.3
  • 23
    • 34547975806 scopus 로고    scopus 로고
    • Bandit based monte-carlo planning
    • Kocsis, L., and Szepesvari, C. 2006. Bandit based Monte-Carlo planning. In ECML.
    • (2006) ECML
    • Kocsis, L.1    Szepesvari, C.2
  • 24
    • 83055177001 scopus 로고    scopus 로고
    • The epoch-greedy algorithm for multi-armed bandits with side information
    • Platt, J. C.; Koller, D.; Singer, Y.; and Roweis, S. T., eds., MIT Press
    • Langford, J., and Zhang, T. 2007. The epoch-greedy algorithm for multi-armed bandits with side information. In Platt, J. C.; Koller, D.; Singer, Y.; and Roweis, S. T., eds., NIPS. MIT Press.
    • (2007) NIPS
    • Langford, J.1    Zhang, T.2
  • 25
    • 30044441333 scopus 로고    scopus 로고
    • The sample complexity of exploration in the multi-armed bandit problem
    • Mannor, S., and Tsitsiklis, J. N. 2004. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research 5:623-648.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 623-648
    • Mannor, S.1    Tsitsiklis, J.N.2
  • 27
    • 33750375100 scopus 로고    scopus 로고
    • A simple distribution-free approach to the max k-armed bandit problem
    • Benhamou, F., ed., CP. Springer
    • Streeter, M. J., and Smith, S. F. 2006. A simple distribution-free approach to the max k-armed bandit problem. In Benhamou, F., ed., CP, volume 4204 of Lecture Notes in Computer Science, 560-574. Springer.
    • (2006) Lecture Notes in Computer Science , vol.4204 , pp. 560-574
    • Streeter, M.J.1    Smith, S.F.2
  • 28
    • 34250750797 scopus 로고    scopus 로고
    • Experience-efficient learning in associative bandit problems
    • Cohen, W. W., and Moore, A., eds., ICML. ACM
    • Strehl, A. L.; Mesterharm, C.; Littman, M. L.; and Hirsh, H. 2006. Experience-efficient learning in associative bandit problems. In Cohen, W. W., and Moore, A., eds., ICML, volume 148 of ACM International Conference Proceeding Series, 889-896. ACM.
    • (2006) ACM International Conference Proceeding Series , vol.148 , pp. 889-896
    • Strehl, A.L.1    Mesterharm, C.2    Littman, M.L.3    Hirsh, H.4
  • 30


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.