-
1
-
-
0042996986
-
Associative reinforcement learning using linear probabilistic concepts
-
Morgan Kaufman, San Francisco
-
Abe, N., P. M. Long. 1999. Associative reinforcement learning using linear probabilistic concepts. Proc. 16th Internat. Conf. Machine Learn., Morgan Kaufman, San Francisco, 3-11.
-
(1999)
Proc. 16th Internat. Conf. Machine Learn.
, pp. 3-11
-
-
Abe, N.1
Long, P.M.2
-
2
-
-
0000616723
-
Sample mean based index policies with 0(log n) regret for the multi-armed bandit problem
-
Agrawal, R. 1995. Sample mean based index policies with 0(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27(4) 1054-1078.
-
(1995)
Adv. Appl. Probab.
, vol.27
, Issue.4
, pp. 1054-1078
-
-
Agrawal, R.1
-
3
-
-
0024626787
-
Asymptotically efficient adaptive allocation schemes for controlled I.I.D. Processes: Finite parameter space
-
Agrawal, R., D. Teneketzis, V. Anantharam. 1989. Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: Finite parameter space. IEEE Trans. Automatic Control 34(3) 258-267.
-
(1989)
IEEE Trans. Automatic Control
, vol.34
, Issue.3
, pp. 258-267
-
-
Agrawal, R.1
Teneketzis, D.2
Anantharam, V.3
-
4
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
Auer, P. 2002. Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(3) 397-422.
-
(2002)
J. Machine Learn. Res.
, vol.3
, Issue.3
, pp. 397-422
-
-
Auer, P.1
-
5
-
-
0036568025
-
Finite-time analysis of the multi-armed bandit problem
-
Auer, P., N. Cesa-Bianchi, P. Fischer. 2002. Finite-time analysis of the multi-armed bandit problem. Machine Learn. 47(2) 235-256.
-
(2002)
Machine Learn
, vol.47
, Issue.2
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
10
-
-
0000792515
-
Multidimensional stochastic approximation methods
-
Blum, J. R. 1954. Multidimensional stochastic approximation methods. Ann. Math. Statist. 25(4) 737-744.
-
(1954)
Ann. Math. Statist.
, vol.25
, Issue.4
, pp. 737-744
-
-
Blum, J.R.1
-
11
-
-
77953084889
-
-
Working paper, Columbia Graduate School of Business, New York
-
Cicek, D., M. Broadie, A. Zeevi. 2009. General bounds and finite-time performance improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. Working paper, Columbia Graduate School of Business, New York.
-
(2009)
General Bounds and Finite-time Performance Improvement for the Kiefer-Wolfowitz Stochastic Approximation Algorithm
-
-
Cicek, D.1
Broadie, M.2
Zeevi, A.3
-
12
-
-
84898072179
-
Stochastic linear optimization under bandit feedback
-
Helsinki, Finland
-
Dani, V., T. P. Hayes, S. M. Kakade. 2008a. Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory (COLT 2008), Helsinki, Finland, 355-366.
-
(2008)
Proc. 21st Annual Conf. Learn. Theory (COLT 2008)
, pp. 355-366
-
-
Dani, V.1
Hayes, T.P.2
Kakade, S.M.3
-
13
-
-
77953110428
-
-
Working paper, University of Chicago, Chicago
-
Dani, V., T. P. Hayes, S. M. Kakade. 2008b. Stochastic linear optimization under bandit feedback. Working paper, University of Chicago, Chicago. http://ttic.uchicago.edu/-sham/papers/ml/bandit-linear-long.pdf.
-
(2008)
Stochastic Linear Optimization under Bandit Feedback
-
-
Dani, V.1
Hayes, T.P.2
Kakade, S.M.3
-
14
-
-
0001492860
-
"Two-armed bandit" problem
-
Feldman, D. 1962. Contributions to the "two-armed bandit" problem. Ann. Math. Statist. 33(3) 847-856.
-
(1962)
Ann. Math. Statist.
, vol.33
, Issue.3
, pp. 847-856
-
-
Feldman, D.1
-
15
-
-
0039176122
-
A new positive definite geometric mean of two positive definite matrices
-
Fiedler, M., V. Pták. 1997. A new positive definite geometric mean of two positive definite matrices. Linear Algebra Its Appl. 251(1) 1-20.
-
(1997)
Linear Algebra Its Appl.
, vol.251
, Issue.1
, pp. 1-20
-
-
Fiedler, M.1
Pták, V.2
-
17
-
-
77953091640
-
-
Working paper, Columbia Graduate School of Business, Columbia University Graduate School of Business, New York
-
Goldenshluger, A., A. Zeevi. 2008. Performance limitations in bandit problems with side observations. Working paper, Columbia Graduate School of Business, Columbia University Graduate School of Business, New York.
-
(2008)
Performance Limitations in Bandit Problems with Side Observations
-
-
Goldenshluger, A.1
Zeevi, A.2
-
18
-
-
70049095891
-
Woodroofe's one-armed bandit problem revisited
-
Goldenshluger, A., A. Zeevi. 2009. Woodroofe's one-armed bandit problem revisited. Ann. Appl. Probab. 19(4) 1603-1633.
-
(2009)
Ann. Appl. Probab.
, vol.19
, Issue.4
, pp. 1603-1633
-
-
Goldenshluger, A.1
Zeevi, A.2
-
19
-
-
0010948196
-
"Two-armed bandit" problem
-
Keener, R. 1985. Further contributions to the "two-armed bandit" problem. Ann. Statist. 13(1) 418-422.
-
(1985)
Ann. Statist.
, vol.13
, Issue.1
, pp. 418-422
-
-
Keener, R.1
-
20
-
-
0001079593
-
Stochastic estimation of the maximum of a regression function
-
Kiefer, J., J. Wolfowitz. 1952. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3) 462-466.
-
(1952)
Ann. Math. Statist.
, vol.23
, Issue.3
, pp. 462-466
-
-
Kiefer, J.1
Wolfowitz, J.2
-
21
-
-
0038026196
-
Stochastic approximation (invited paper)
-
Lai, T. 2003. Stochastic approximation (invited paper). Ann. Statist. 31(2) 391-406.
-
(2003)
Ann. Statist.
, vol.31
, Issue.2
, pp. 391-406
-
-
Lai, T.1
-
22
-
-
0000854435
-
Adaptive treatment allocation and the multi-armed bandit problem
-
Lai, T. L. 1987. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3) 1091-1114.
-
(1987)
Ann. Statist.
, vol.15
, Issue.3
, pp. 1091-1114
-
-
Lai, T.L.1
-
23
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
Lai, T. L., H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1) 4-22.
-
(1985)
Adv. Appl. Math.
, vol.6
, Issue.1
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
25
-
-
34547966991
-
Multi-armed bandit problems with dependent arms
-
Corvallis, OR
-
Pandey, S., D. Chakrabarti, D. Agrawal. 2007. Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn., Corvallis, OR, 721-728.
-
(2007)
Proc. 24th Internat. Conf. Machine Learn
, pp. 721-728
-
-
Pandey, S.1
Chakrabarti, D.2
Agrawal, D.3
-
26
-
-
0030306745
-
Strongly convex analysis
-
Polovinkin, E. S. 1996. Strongly convex analysis. Sbornik: Math. 187(2) 259-286.
-
(1996)
Sbornik: Math.
, vol.187
, Issue.2
, pp. 259-286
-
-
Polovinkin, E.S.1
-
28
-
-
84966203785
-
Some aspects of the sequential design of experiments
-
Robbins, H. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5) 527-535.
-
(1952)
Bull. Amer. Math. Soc.
, vol.58
, Issue.5
, pp. 527-535
-
-
Robbins, H.1
-
29
-
-
0000016172
-
A stochastic approximation method
-
Robbins, H., S. Monro. 1951. A stochastic approximation method. Ann. Math. Statist. 22(3) 400-407.
-
(1951)
Ann. Math. Statist.
, vol.22
, Issue.3
, pp. 400-407
-
-
Robbins, H.1
Monro, S.2
-
31
-
-
0001395850
-
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
-
Thompson, W. R. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3) 285-294.
-
(1933)
Biometrika
, vol.25
, Issue.3
, pp. 285-294
-
-
Thompson, W.R.1
-
33
-
-
15844362682
-
Arbitrary side observations in bandit problems
-
Wang, C.-C., S. R. Kulkarni, H. V. Poor. 2005b. Arbitrary side observations in bandit problems. Adv. Appl. Math. 34(4) 903-938.
-
(2005)
Adv. Appl. Math.
, vol.34
, Issue.4
, pp. 903-938
-
-
Wang, C.-C.1
Kulkarni, S.R.2
Poor, H.V.3
|