-
2
-
-
62949181077
-
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
-
Audibert, J. Y., Munos, R., Szepesvári, C.: Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci. 410(19), 1876-1902 (2009).
-
(2009)
Theor. Comput. Sci.
, vol.410
, Issue.19
, pp. 1876-1902
-
-
Audibert, J.Y.1
Munos, R.2
Szepesvári, C.3
-
3
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2-3), 235-256 (2002).
-
(2002)
Mach. Learn.
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
4
-
-
0037709910
-
The nonstochastic multiarmed bandit problem
-
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R. E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48-77 (2002).
-
(2002)
SIAM J. Comput.
, vol.32
, Issue.1
, pp. 48-77
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.E.4
-
5
-
-
0035479281
-
Computer Go: An AI oriented survey
-
Bouzy, B., Cazenave, T.: Computer Go: An AI oriented survey. Artif. Intell. 132(1), 39-103 (2001).
-
(2001)
Artif. Intell.
, vol.132
, Issue.1
, pp. 39-103
-
-
Bouzy, B.1
Cazenave, T.2
-
7
-
-
84902513084
-
Monte-Carlo Go developments
-
H. J. Herikvan den, H. Iida, and E. A. Heinz (Eds.), New York: Springer
-
Bouzy, B., Helmstetter, B.: Monte-Carlo Go developments. In: van den Herik, H. J., Iida, H., Heinz, E. A. (eds.) Advances in Computer Games (ACG 2003), IFIP, vol. 263, pp. 159-174. Springer, New York (2003).
-
(2003)
Advances in Computer Games (ACG 2003), IFIP, Vol. 263
, pp. 159-174
-
-
Bouzy, B.1
Helmstetter, B.2
-
8
-
-
77952070805
-
Pure exploration in multi-armed bandits problems
-
Lecture Notes in Computer Science, R. Gavaldà, G. Lugosi, T. Zeugmann, and S. Zilles (Eds.), New York: Springer
-
Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in multi-armed bandits problems. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) Algorithmic Learning Theory (ALT 2009), Lecture Notes in Computer Science, vol. 5809, pp. 23-37. Springer, New York (2009).
-
(2009)
Algorithmic Learning Theory (ALT 2009)
, vol.5809
, pp. 23-37
-
-
Bubeck, S.1
Munos, R.2
Stoltz, G.3
-
9
-
-
34250005402
-
Computer Go: A grand challenge to AI
-
Studies in Computational Intelligence, W. Duch and J. Mandziuk (Eds.), New York: Springer
-
Cai, X., Wunsch, D. C.: Computer Go: A grand challenge to AI. In: Duch, W., Mandziuk, J. (eds.) Challenges for Computational Intelligence, Studies in Computational Intelligence, vol. 63, pp. 443-465. Springer, New York (2007).
-
(2007)
Challenges for Computational Intelligence
, vol.63
, pp. 443-465
-
-
Cai, X.1
Wunsch, D.C.2
-
10
-
-
67650687540
-
Progressive strategies for Monte-Carlo tree search
-
Chaslot, G., Winands, M., Uiterwijk, J., van den Herik, H., Bouzy, B.: Progressive strategies for Monte-Carlo tree search. In: Proceedings of the 10th Joint Conference on Information Sciences (JCIS 2007), pp. 655-661 (2007).
-
(2007)
Proceedings of the 10th Joint Conference on Information Sciences (JCIS 2007)
, pp. 655-661
-
-
Chaslot, G.1
Winands, M.2
Uiterwijk, J.3
van den Herik, H.4
Bouzy, B.5
-
11
-
-
82355189414
-
-
Chaslot, G., Chatriot, L., Fiter, C., Gelly, S., Hoock, J., Perez, J., Rimmel, A., Teytaud, O.: Combining expert, offline, transient and online knowledge in Monte-Carlo exploration. http://www. lri. fr/~teytaud/eg. pdf (2008).
-
(2008)
Combining expert, offline, transient and online knowledge in Monte-Carlo exploration
-
-
Chaslot, G.1
Chatriot, L.2
Fiter, C.3
Gelly, S.4
Hoock, J.5
Perez, J.6
Rimmel, A.7
Teytaud, O.8
-
12
-
-
77953762833
-
Adding expert knowledge and exploration in Monte-Carlo tree search
-
New York: Springer
-
Chaslot, G., Fiter, C., Hoock, J. B., Rimmel, A., Teytaud, O.: Adding expert knowledge and exploration in Monte-Carlo tree search. In: Advances in Computer Games (ACG12). Springer, New York (2009).
-
(2009)
Advances in Computer Games (ACG12)
-
-
Chaslot, G.1
Fiter, C.2
Hoock, J.B.3
Rimmel, A.4
Teytaud, O.5
-
13
-
-
38049037928
-
Efficient selectivity and backup operators in Monte-Carlo tree search
-
Lecture Notes in Computer Science, H. J. Herikvan den, P. Ciancarini, and HHLMDonkers (Eds.), New York: Springer
-
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H. J., Ciancarini, P., Donkers, H. H. L. M. (eds.) Computers and Games (CG 2006), Lecture Notes in Computer Science, vol. 4630, pp. 72-83. Springer, New York (2006).
-
(2006)
Computers and Games (CG 2006)
, vol.4630
, pp. 72-83
-
-
Coulom, R.1
-
14
-
-
70349287633
-
Computing Elo ratings of move patterns in the game of Go
-
Coulom, R.: Computing Elo ratings of move patterns in the game of Go. In: Computer Games Workshop 2007 (2007).
-
(2007)
Computer Games Workshop 2007
-
-
Coulom, R.1
-
16
-
-
71149107214
-
Bandit-based optimization on graphs with application to library performance tuning
-
A. P. Danyluk, L. Bottou, and M. L. Littman (Eds.), New York: ACM
-
de Mesmay, F., Rimmel, A., Voronenko, Y., Püschel, M.: Bandit-based optimization on graphs with application to library performance tuning. In: Danyluk, A. P., Bottou, L., Littman, M. L. (eds.) International Conference on Machine Learning (ICML 2009), pp. 729-736. ACM, New York (2009).
-
(2009)
International Conference on Machine Learning (ICML 2009)
, pp. 729-736
-
-
de Mesmay, F.1
Rimmel, A.2
Voronenko, Y.3
Püschel, M.4
-
17
-
-
33745295134
-
Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
-
Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7, 1079-1105 (2006).
-
(2006)
J. Mach. Learn. Res.
, vol.7
, pp. 1079-1105
-
-
Even-Dar, E.1
Mannor, S.2
Mansour, Y.3
-
18
-
-
57749181518
-
Simulation-based approach to general game playing
-
Fox, D., Gomes, C. P. (eds.) AAAI 2008, Chicago, IL, USA, 13-17 July 2008 AAAI Press, Menlo Park (2008)
-
Finnsson, H., Björnsson, Y.: Simulation-based approach to general game playing. In: Fox, D., Gomes, C. P. (eds.) Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, IL, USA, 13-17 July 2008, pp. 259-264. AAAI Press, Menlo Park (2008).
-
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence
, pp. 259-264
-
-
Finnsson, H.1
Björnsson, Y.2
-
19
-
-
34547990649
-
Combining online and offline knowledge in UCT
-
Ghahramani, Z. (ed.)ACM, New York
-
Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Ghahramani, Z. (ed.) International Conference on Machine Learning (ICML 2007), pp. 273-280. ACM, New York (2007).
-
(2007)
International Conference on Machine Learning (ICML 2007)
, pp. 273-280
-
-
Gelly, S.1
Silver, D.2
-
20
-
-
57749091602
-
Achieving master level play in 9 x 9 computer Go
-
Fox, D., Gomes, C. P. (eds.) Chicago, IL, USA, 13-17 July 2008, AAAI Press, Menlo Park
-
Gelly, S., Silver, D.: Achieving master level play in 9 x 9 computer Go. In: Fox, D., Gomes, C. P. (eds.) Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, IL, USA, 13-17 July 2008, pp. 1537-1540. AAAI Press, Menlo Park (2008).
-
(2008)
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008
, pp. 1537-1540
-
-
Gelly, S.1
Silver, D.2
-
21
-
-
82355180345
-
Gap-free bounds for multi-armed stochastic bandit
-
Juditsky, A., Nazin, A., Tsybakov, A., Vayatis, N.: Gap-free bounds for multi-armed stochastic bandit. In: World Congress of the International Federation of Automatic Control (IFAC) 2008 (2008).
-
(2008)
World Congress of the International Federation of Automatic Control (IFAC) 2008
-
-
Juditsky, A.1
Nazin, A.2
Tsybakov, A.3
Vayatis, N.4
-
22
-
-
56449104477
-
Efficient bandit algorithms for online multiclass prediction
-
Cohen, W. W., McCallum, A., Roweis, S. T. (eds.)
-
Kakade, S. M., Shalev-Shwartz, S., Tewari, A.: Efficient bandit algorithms for online multiclass prediction. In: Cohen, W. W., McCallum, A., Roweis, S. T. (eds.) International Conference on Machine Learning (ICML 2008), pp. 440-447. ACM, New York (2008).
-
(2008)
International Conference on Machine Learning (ICML 2008) ACM, New York
, pp. 440-447
-
-
Kakade, S.M.1
Shalev-Shwartz, S.2
Tewari, A.3
-
24
-
-
83055177001
-
The epoch-greedy algorithm for multi-armed bandits with side information
-
J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis (Eds.), Cambridge: MIT Press
-
Langford, J., Zhang, T.: The epoch-greedy algorithm for multi-armed bandits with side information. In: Platt, J. C., Koller, D., Singer, Y., Roweis, S. T. (eds.) Neural Information Processing Systems (NIPS). MIT Press, Cambridge (2007).
-
(2007)
Neural Information Processing Systems (NIPS)
-
-
Langford, J.1
Zhang, T.2
-
25
-
-
30044441333
-
The sample complexity of exploration in the multi-armed bandit problem
-
Mannor, S., Tsitsiklis, J. N.: The sample complexity of exploration in the multi-armed bandit problem. J. Mach. Learn. Res. 5, 623-648 (2004).
-
(2004)
J. Mach. Learn. Res.
, vol.5
, pp. 623-648
-
-
Mannor, S.1
Tsitsiklis, J.N.2
-
27
-
-
84898064829
-
Stochastic convex optimization
-
Shalev-Shwartz, S., Shamir, O., Srebro, N., Sridharan, K.: Stochastic convex optimization. In: 22nd Annual Conference on Learning Theory (COLT 2009) (2009).
-
(2009)
Annual Conference on Learning Theory (COLT 2009)
-
-
Shalev-Shwartz, S.1
Shamir, O.2
Srebro, N.3
Sridharan, K.4
-
28
-
-
33750375100
-
A simple distribution-free approach to the max k-armed bandit problem
-
Lecture Notes in Computer Science, F. Benhamou (Ed.), New York: Springer
-
Streeter, M. J., Smith, S. F.: A simple distribution-free approach to the max k-armed bandit problem. In: Benhamou, F. (ed.) Principles and Practice of Constraint Programming (CP 2006), Lecture Notes in Computer Science, vol. 4204, pp. 560-574. Springer, New York (2006).
-
(2006)
Principles and Practice of Constraint Programming (CP 2006)
, vol.4204
, pp. 560-574
-
-
Streeter, M.J.1
Smith, S.F.2
-
29
-
-
33749242078
-
Experience-efficient learning in associative bandit problems
-
Cohen, W. W., Moore, A. (eds.) ACM, New York
-
Strehl, A. L., Mesterharm, C., Littman, M. L., Hirsh, H.: Experience-efficient learning in associative bandit problems. In: Cohen, W. W., Moore, A. (eds.) International Conference on Machine Learning (ICML 2006), pp. 889-896. ACM, New York (2006).
-
(2006)
International Conference on Machine Learning (ICML 2006)
, pp. 889-896
-
-
Strehl, A.L.1
Mesterharm, C.2
Littman, M.L.3
Hirsh, H.4
-
30
-
-
85162031443
-
Learning from Logged Implicit Exploration Data
-
Lafferty, J., Williams, C. K. I, Shawe-Taylor, J., Zemel, R. S., Culotta, A. (eds.)
-
Strehl, A. L., Langford, J., Li, L., Kakade, S. M.: Learning from Logged Implicit Exploration Data. In: Lafferty, J., Williams, C. K. I, Shawe-Taylor, J., Zemel, R. S., Culotta, A. (eds.) Neural Information Processing Systems (NIPS) (2010).
-
(2010)
Neural Information Processing Systems (NIPS)
-
-
Strehl, A.L.1
Langford, J.2
Li, L.3
Kakade, S.M.4
-
31
-
-
82355180344
-
-
Teytaud, O., Gelly, S., Sebag, M.: Anytime many-armed bandits. In: Zucker, J., Cornuéjols, A. (eds.) Conférence d'Apprentissage (CAP07), pp. 387-402 (2007).
-
-
-
|