메뉴 건너뛰기




Volumn 61, Issue 3, 2011, Pages 203-230

Multi-armed bandits with episode context

Author keywords

Computational learning theory; Computer Go; Contextual bandits; Multi armed bandits; PUCB; UCB

Indexed keywords


EID: 82355173286     PISSN: 10122443     EISSN: None     Source Type: Journal    
DOI: 10.1007/s10472-011-9258-6     Document Type: Article
Times cited : (259)

References (31)
  • 2
    • 62949181077 scopus 로고    scopus 로고
    • Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    • Audibert, J. Y., Munos, R., Szepesvári, C.: Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci. 410(19), 1876-1902 (2009).
    • (2009) Theor. Comput. Sci. , vol.410 , Issue.19 , pp. 1876-1902
    • Audibert, J.Y.1    Munos, R.2    Szepesvári, C.3
  • 3
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2-3), 235-256 (2002).
    • (2002) Mach. Learn. , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 5
    • 0035479281 scopus 로고    scopus 로고
    • Computer Go: An AI oriented survey
    • Bouzy, B., Cazenave, T.: Computer Go: An AI oriented survey. Artif. Intell. 132(1), 39-103 (2001).
    • (2001) Artif. Intell. , vol.132 , Issue.1 , pp. 39-103
    • Bouzy, B.1    Cazenave, T.2
  • 7
    • 84902513084 scopus 로고    scopus 로고
    • Monte-Carlo Go developments
    • H. J. Herikvan den, H. Iida, and E. A. Heinz (Eds.), New York: Springer
    • Bouzy, B., Helmstetter, B.: Monte-Carlo Go developments. In: van den Herik, H. J., Iida, H., Heinz, E. A. (eds.) Advances in Computer Games (ACG 2003), IFIP, vol. 263, pp. 159-174. Springer, New York (2003).
    • (2003) Advances in Computer Games (ACG 2003), IFIP, Vol. 263 , pp. 159-174
    • Bouzy, B.1    Helmstetter, B.2
  • 8
    • 77952070805 scopus 로고    scopus 로고
    • Pure exploration in multi-armed bandits problems
    • Lecture Notes in Computer Science, R. Gavaldà, G. Lugosi, T. Zeugmann, and S. Zilles (Eds.), New York: Springer
    • Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in multi-armed bandits problems. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) Algorithmic Learning Theory (ALT 2009), Lecture Notes in Computer Science, vol. 5809, pp. 23-37. Springer, New York (2009).
    • (2009) Algorithmic Learning Theory (ALT 2009) , vol.5809 , pp. 23-37
    • Bubeck, S.1    Munos, R.2    Stoltz, G.3
  • 9
    • 34250005402 scopus 로고    scopus 로고
    • Computer Go: A grand challenge to AI
    • Studies in Computational Intelligence, W. Duch and J. Mandziuk (Eds.), New York: Springer
    • Cai, X., Wunsch, D. C.: Computer Go: A grand challenge to AI. In: Duch, W., Mandziuk, J. (eds.) Challenges for Computational Intelligence, Studies in Computational Intelligence, vol. 63, pp. 443-465. Springer, New York (2007).
    • (2007) Challenges for Computational Intelligence , vol.63 , pp. 443-465
    • Cai, X.1    Wunsch, D.C.2
  • 13
    • 38049037928 scopus 로고    scopus 로고
    • Efficient selectivity and backup operators in Monte-Carlo tree search
    • Lecture Notes in Computer Science, H. J. Herikvan den, P. Ciancarini, and HHLMDonkers (Eds.), New York: Springer
    • Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H. J., Ciancarini, P., Donkers, H. H. L. M. (eds.) Computers and Games (CG 2006), Lecture Notes in Computer Science, vol. 4630, pp. 72-83. Springer, New York (2006).
    • (2006) Computers and Games (CG 2006) , vol.4630 , pp. 72-83
    • Coulom, R.1
  • 14
    • 70349287633 scopus 로고    scopus 로고
    • Computing Elo ratings of move patterns in the game of Go
    • Coulom, R.: Computing Elo ratings of move patterns in the game of Go. In: Computer Games Workshop 2007 (2007).
    • (2007) Computer Games Workshop 2007
    • Coulom, R.1
  • 16
    • 71149107214 scopus 로고    scopus 로고
    • Bandit-based optimization on graphs with application to library performance tuning
    • A. P. Danyluk, L. Bottou, and M. L. Littman (Eds.), New York: ACM
    • de Mesmay, F., Rimmel, A., Voronenko, Y., Püschel, M.: Bandit-based optimization on graphs with application to library performance tuning. In: Danyluk, A. P., Bottou, L., Littman, M. L. (eds.) International Conference on Machine Learning (ICML 2009), pp. 729-736. ACM, New York (2009).
    • (2009) International Conference on Machine Learning (ICML 2009) , pp. 729-736
    • de Mesmay, F.1    Rimmel, A.2    Voronenko, Y.3    Püschel, M.4
  • 17
    • 33745295134 scopus 로고    scopus 로고
    • Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
    • Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7, 1079-1105 (2006).
    • (2006) J. Mach. Learn. Res. , vol.7 , pp. 1079-1105
    • Even-Dar, E.1    Mannor, S.2    Mansour, Y.3
  • 18
    • 57749181518 scopus 로고    scopus 로고
    • Simulation-based approach to general game playing
    • Fox, D., Gomes, C. P. (eds.) AAAI 2008, Chicago, IL, USA, 13-17 July 2008 AAAI Press, Menlo Park (2008)
    • Finnsson, H., Björnsson, Y.: Simulation-based approach to general game playing. In: Fox, D., Gomes, C. P. (eds.) Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, IL, USA, 13-17 July 2008, pp. 259-264. AAAI Press, Menlo Park (2008).
    • Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence , pp. 259-264
    • Finnsson, H.1    Björnsson, Y.2
  • 19
  • 20
    • 57749091602 scopus 로고    scopus 로고
    • Achieving master level play in 9 x 9 computer Go
    • Fox, D., Gomes, C. P. (eds.) Chicago, IL, USA, 13-17 July 2008, AAAI Press, Menlo Park
    • Gelly, S., Silver, D.: Achieving master level play in 9 x 9 computer Go. In: Fox, D., Gomes, C. P. (eds.) Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, IL, USA, 13-17 July 2008, pp. 1537-1540. AAAI Press, Menlo Park (2008).
    • (2008) Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008 , pp. 1537-1540
    • Gelly, S.1    Silver, D.2
  • 24
    • 83055177001 scopus 로고    scopus 로고
    • The epoch-greedy algorithm for multi-armed bandits with side information
    • J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis (Eds.), Cambridge: MIT Press
    • Langford, J., Zhang, T.: The epoch-greedy algorithm for multi-armed bandits with side information. In: Platt, J. C., Koller, D., Singer, Y., Roweis, S. T. (eds.) Neural Information Processing Systems (NIPS). MIT Press, Cambridge (2007).
    • (2007) Neural Information Processing Systems (NIPS)
    • Langford, J.1    Zhang, T.2
  • 25
    • 30044441333 scopus 로고    scopus 로고
    • The sample complexity of exploration in the multi-armed bandit problem
    • Mannor, S., Tsitsiklis, J. N.: The sample complexity of exploration in the multi-armed bandit problem. J. Mach. Learn. Res. 5, 623-648 (2004).
    • (2004) J. Mach. Learn. Res. , vol.5 , pp. 623-648
    • Mannor, S.1    Tsitsiklis, J.N.2
  • 28
    • 33750375100 scopus 로고    scopus 로고
    • A simple distribution-free approach to the max k-armed bandit problem
    • Lecture Notes in Computer Science, F. Benhamou (Ed.), New York: Springer
    • Streeter, M. J., Smith, S. F.: A simple distribution-free approach to the max k-armed bandit problem. In: Benhamou, F. (ed.) Principles and Practice of Constraint Programming (CP 2006), Lecture Notes in Computer Science, vol. 4204, pp. 560-574. Springer, New York (2006).
    • (2006) Principles and Practice of Constraint Programming (CP 2006) , vol.4204 , pp. 560-574
    • Streeter, M.J.1    Smith, S.F.2
  • 30
    • 85162031443 scopus 로고    scopus 로고
    • Learning from Logged Implicit Exploration Data
    • Lafferty, J., Williams, C. K. I, Shawe-Taylor, J., Zemel, R. S., Culotta, A. (eds.)
    • Strehl, A. L., Langford, J., Li, L., Kakade, S. M.: Learning from Logged Implicit Exploration Data. In: Lafferty, J., Williams, C. K. I, Shawe-Taylor, J., Zemel, R. S., Culotta, A. (eds.) Neural Information Processing Systems (NIPS) (2010).
    • (2010) Neural Information Processing Systems (NIPS)
    • Strehl, A.L.1    Langford, J.2    Li, L.3    Kakade, S.M.4
  • 31
    • 82355180344 scopus 로고    scopus 로고
    • Teytaud, O., Gelly, S., Sebag, M.: Anytime many-armed bandits. In: Zucker, J., Cornuéjols, A. (eds.) Conférence d'Apprentissage (CAP07), pp. 387-402 (2007).


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.