-
1
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal 47(2-3), 235-256 (2002)
-
(2002)
Machine Learning Journal
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
2
-
-
38049040954
-
-
Auer, P., Ortner, R., Szepesvari, C.: Improved Rates for the Stochastic Continuum-Armed Bandit Problem. In: Bshouty, N.H., Gentile, C. (eds.) COLT 2007. LNCS, 4539, pp. 454-468. Springer, Heidelberg (2007)
-
Auer, P., Ortner, R., Szepesvari, C.: Improved Rates for the Stochastic Continuum-Armed Bandit Problem. In: Bshouty, N.H., Gentile, C. (eds.) COLT 2007. LNCS, vol. 4539, pp. 454-468. Springer, Heidelberg (2007)
-
-
-
-
3
-
-
58449087518
-
-
Bertsekas, D.: Dynamic programming and suboptimal control: From ADP to MFC. Fundamental Issues in Control, European Journal of Control 11(4-5) (2005): From 2005 CDC, Seville, Spain
-
Bertsekas, D.: Dynamic programming and suboptimal control: From ADP to MFC. Fundamental Issues in Control, European Journal of Control 11(4-5) (2005): From 2005 CDC, Seville, Spain
-
-
-
-
4
-
-
48349140736
-
Rollout sampling approximate policy iteration
-
September
-
Dimitrakakis, C., Lagoudakis, M.: Rollout sampling approximate policy iteration. Machine Learning 72(3) (September 2008)
-
(2008)
Machine Learning
, vol.72
, Issue.3
-
-
Dimitrakakis, C.1
Lagoudakis, M.2
-
5
-
-
33745295134
-
Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
-
Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research 7, 1079-1105 (2006)
-
(2006)
Journal of Machine Learning Research
, vol.7
, pp. 1079-1105
-
-
Even-Dar, E.1
Mannor, S.2
Mansour, Y.3
-
6
-
-
22944468731
-
Approximate policy iteration with a policy language bias
-
Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias. Advances in Neural Information Processing Systems 16(3) (2004)
-
(2004)
Advances in Neural Information Processing Systems
, vol.16
, Issue.3
-
-
Fern, A.1
Yoon, S.2
Givan, R.3
-
7
-
-
33744466799
-
Approximate policy iteration with a policy language bias: Solving relational Markov decision processes
-
Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. Journal of Artificial Intelligence Research 25, 75-118 (2006)
-
(2006)
Journal of Artificial Intelligence Research
, vol.25
, pp. 75-118
-
-
Fern, A.1
Yoon, S.2
Givan, R.3
-
8
-
-
33750293964
-
-
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS, 4212, pp. 282-293. Springer, Heidelberg (2006)
-
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS, vol. 4212, pp. 282-293. Springer, Heidelberg (2006)
-
-
-
-
9
-
-
1942420814
-
Reinforcement learning as classification: Leveraging modern classifiers
-
Washington, DC, USA, pp, August
-
Lagoudakis, M.G., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML), Washington, DC, USA, pp. 424-431 (August 2003)
-
(2003)
Proceedings of the 20th International Conference on Machine Learning (ICML)
, pp. 424-431
-
-
Lagoudakis, M.G.1
Parr, R.2
-
10
-
-
31844448029
-
Relating reinforcement learning performance to classification performance
-
Bonn, Germany, pp
-
Langford, J., Zadrozny, B.: Relating reinforcement learning performance to classification performance. In: Proceedings of the 22nd International Conference on Machine learning (ICML), Bonn, Germany, pp. 473-480 (2005)
-
(2005)
Proceedings of the 22nd International Conference on Machine learning (ICML)
, pp. 473-480
-
-
Langford, J.1
Zadrozny, B.2
|