SCOPUS 정보 검색 플랫폼

Mathematics of Operations Research

Volumn 35, Issue 2, 2010, Pages 395-411

Linearly parameterized bandits

(2) Rusmevichientong, Paat a Tsitsiklis, John N b

a Cornell University (United States)

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Adaptive control; Multi armed bandit; Parametric model

Indexed keywords

ADAPTIVE CONTROL; BANDIT PROBLEMS; BAYES RISK; EXPLORATION AND EXPLOITATION; LINEAR FUNCTIONS; LOWER BOUNDS; MULTI ARMED BANDIT; NEAR-OPTIMAL POLICIES; PARAMETERIZED; PARAMETRIC MODELS; RANDOM VECTORS; UPPER BOUND;

ADAPTIVE CONTROL SYSTEMS;

MODELS;

EID: 77953111834 PISSN: 0364765X EISSN: 15265471 Source Type: Journal
DOI: 10.1287/moor.1100.0446 Document Type: Article

Times cited : (516)

References (33)

1
- 0042996986
- Associative reinforcement learning using linear probabilistic concepts
- Morgan Kaufman, San Francisco
- Abe, N., P. M. Long. 1999. Associative reinforcement learning using linear probabilistic concepts. Proc. 16th Internat. Conf. Machine Learn., Morgan Kaufman, San Francisco, 3-11.
- (1999) Proc. 16th Internat. Conf. Machine Learn. , pp. 3-11
- Abe, N.¹ Long, P.M.²

2
- 0000616723
- Sample mean based index policies with 0(log n) regret for the multi-armed bandit problem
- Agrawal, R. 1995. Sample mean based index policies with 0(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27(4) 1054-1078.
- (1995) Adv. Appl. Probab. , vol.27 , Issue.4 , pp. 1054-1078
- Agrawal, R.¹

3
- 0024626787
- Asymptotically efficient adaptive allocation schemes for controlled I.I.D. Processes: Finite parameter space
- Agrawal, R., D. Teneketzis, V. Anantharam. 1989. Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: Finite parameter space. IEEE Trans. Automatic Control 34(3) 258-267.
- (1989) IEEE Trans. Automatic Control , vol.34 , Issue.3 , pp. 258-267
- Agrawal, R.¹ Teneketzis, D.² Anantharam, V.³

4
- 0041966002
- Using confidence bounds for exploitation-exploration trade-offs
- Auer, P. 2002. Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(3) 397-422.
- (2002) J. Machine Learn. Res. , vol.3 , Issue.3 , pp. 397-422
- Auer, P.¹

5
- 0036568025
- Finite-time analysis of the multi-armed bandit problem
- Auer, P., N. Cesa-Bianchi, P. Fischer. 2002. Finite-time analysis of the multi-armed bandit problem. Machine Learn. 47(2) 235-256.
- (2002) Machine Learn , vol.47 , Issue.2 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

6
- 0004181906
- Chapman and Hall, London
- Berry, D., B. Fristedt. 1985. Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London.
- (1985) Bandit Problems: Sequential Allocation of Experiments
- Berry, D.¹ Fristedt, B.²

7
- 0003565783
- Athena Scientific, Belmont, MA
- Bertsekas, D. 1995. Dynamic Programming and Optimal Controls, Vol.1. Athena Scientific, Belmont, MA.
- (1995) Dynamic Programming and Optimal Controls , vol.1
- Bertsekas, D.¹

8
- 0003487482
- Bertsekas, D., J. N. Tsitsiklis. 1996. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
- (1996) Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
- Bertsekas, D.¹ Tsitsiklis, J.N.²

9
- 0003850196
- Athena Scientific, Belmont, MA
- Bertsimas, D., J. N. Tsitsiklis. 1997. Introduction to Linear Optimization. Athena Scientific, Belmont, MA.
- (1997) Introduction to Linear Optimization
- Bertsimas, D.¹ Tsitsiklis, J.N.²

10
- 0000792515
- Multidimensional stochastic approximation methods
- Blum, J. R. 1954. Multidimensional stochastic approximation methods. Ann. Math. Statist. 25(4) 737-744.
- (1954) Ann. Math. Statist. , vol.25 , Issue.4 , pp. 737-744
- Blum, J.R.¹

11
- 77953084889
- Working paper, Columbia Graduate School of Business, New York
- Cicek, D., M. Broadie, A. Zeevi. 2009. General bounds and finite-time performance improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. Working paper, Columbia Graduate School of Business, New York.
- (2009) General Bounds and Finite-time Performance Improvement for the Kiefer-Wolfowitz Stochastic Approximation Algorithm
- Cicek, D.¹ Broadie, M.² Zeevi, A.³

12
- 84898072179
- Stochastic linear optimization under bandit feedback
- Helsinki, Finland
- Dani, V., T. P. Hayes, S. M. Kakade. 2008a. Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory (COLT 2008), Helsinki, Finland, 355-366.
- (2008) Proc. 21st Annual Conf. Learn. Theory (COLT 2008) , pp. 355-366
- Dani, V.¹ Hayes, T.P.² Kakade, S.M.³

13
- 77953110428
- Working paper, University of Chicago, Chicago
- Dani, V., T. P. Hayes, S. M. Kakade. 2008b. Stochastic linear optimization under bandit feedback. Working paper, University of Chicago, Chicago. http://ttic.uchicago.edu/-sham/papers/ml/bandit-linear-long.pdf.
- (2008) Stochastic Linear Optimization under Bandit Feedback
- Dani, V.¹ Hayes, T.P.² Kakade, S.M.³

14
- 0001492860
- "Two-armed bandit" problem
- Feldman, D. 1962. Contributions to the "two-armed bandit" problem. Ann. Math. Statist. 33(3) 847-856.
- (1962) Ann. Math. Statist. , vol.33 , Issue.3 , pp. 847-856
- Feldman, D.¹

15
- 0039176122
- A new positive definite geometric mean of two positive definite matrices
- Fiedler, M., V. Pták. 1997. A new positive definite geometric mean of two positive definite matrices. Linear Algebra Its Appl. 251(1) 1-20.
- (1997) Linear Algebra Its Appl. , vol.251 , Issue.1 , pp. 1-20
- Fiedler, M.¹ Pták, V.²

16
- 0000532482
- Response surface bandits
- Ginebra, J., M. K. Clayton. 1995. Response surface bandits. J. Roy. Statist. Soc. Ser. B (Methodological) 57(4) 771-784.
- (1995) J. Roy. Statist. Soc. Ser. B (Methodological) , vol.57 , Issue.4 , pp. 771-784
- Ginebra, J.¹ Clayton, M.K.²

17
- 77953091640
- Working paper, Columbia Graduate School of Business, Columbia University Graduate School of Business, New York
- Goldenshluger, A., A. Zeevi. 2008. Performance limitations in bandit problems with side observations. Working paper, Columbia Graduate School of Business, Columbia University Graduate School of Business, New York.
- (2008) Performance Limitations in Bandit Problems with Side Observations
- Goldenshluger, A.¹ Zeevi, A.²

18
- 70049095891
- Woodroofe's one-armed bandit problem revisited
- Goldenshluger, A., A. Zeevi. 2009. Woodroofe's one-armed bandit problem revisited. Ann. Appl. Probab. 19(4) 1603-1633.
- (2009) Ann. Appl. Probab. , vol.19 , Issue.4 , pp. 1603-1633
- Goldenshluger, A.¹ Zeevi, A.²

19
- 0010948196
- "Two-armed bandit" problem
- Keener, R. 1985. Further contributions to the "two-armed bandit" problem. Ann. Statist. 13(1) 418-422.
- (1985) Ann. Statist. , vol.13 , Issue.1 , pp. 418-422
- Keener, R.¹

20
- 0001079593
- Stochastic estimation of the maximum of a regression function
- Kiefer, J., J. Wolfowitz. 1952. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3) 462-466.
- (1952) Ann. Math. Statist. , vol.23 , Issue.3 , pp. 462-466
- Kiefer, J.¹ Wolfowitz, J.²

21
- 0038026196
- Stochastic approximation (invited paper)
- Lai, T. 2003. Stochastic approximation (invited paper). Ann. Statist. 31(2) 391-406.
- (2003) Ann. Statist. , vol.31 , Issue.2 , pp. 391-406
- Lai, T.¹

22
- 0000854435
- Adaptive treatment allocation and the multi-armed bandit problem
- Lai, T. L. 1987. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3) 1091-1114.
- (1987) Ann. Statist. , vol.15 , Issue.3 , pp. 1091-1114
- Lai, T.L.¹

23
- 0002899547
- Asymptotically efficient adaptive allocation rules
- Lai, T. L., H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1) 4-22.
- (1985) Adv. Appl. Math. , vol.6 , Issue.1 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

24
- 72349091790
- A structured multi-armed bandit problem and the greedy policy
- Mersereau, A. J., P. Rusmevichientong, J. N. Tsitsiklis. 2009. A structured multi-armed bandit problem and the greedy policy. IEEE Trans. Automatic Control 54(12) 2787-2802.
- (2009) IEEE Trans. Automatic Control , vol.54 , Issue.1 , pp. 2787-2802
- Mersereau, A.J.¹ Rusmevichientong, P.² Tsitsiklis, J.N.³

25
- 34547966991
- Multi-armed bandit problems with dependent arms
- Corvallis, OR
- Pandey, S., D. Chakrabarti, D. Agrawal. 2007. Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn., Corvallis, OR, 721-728.
- (2007) Proc. 24th Internat. Conf. Machine Learn , pp. 721-728
- Pandey, S.¹ Chakrabarti, D.² Agrawal, D.³

26
- 0030306745
- Strongly convex analysis
- Polovinkin, E. S. 1996. Strongly convex analysis. Sbornik: Math. 187(2) 259-286.
- (1996) Sbornik: Math. , vol.187 , Issue.2 , pp. 259-286
- Polovinkin, E.S.¹

27
- 0002308024
- Academic Press, London
- Pressman, E. L., I. N. Sonin. 1990. Sequential Control with Incomplete Information. Academic Press, London.
- (1990) Sequential Control with Incomplete Information
- Pressman, E.L.¹ Sonin, I.N.²

28
- 84966203785
- Some aspects of the sequential design of experiments
- Robbins, H. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5) 527-535.
- (1952) Bull. Amer. Math. Soc. , vol.58 , Issue.5 , pp. 527-535
- Robbins, H.¹

29
- 0000016172
- A stochastic approximation method
- Robbins, H., S. Monro. 1951. A stochastic approximation method. Ann. Math. Statist. 22(3) 400-407.
- (1951) Ann. Math. Statist. , vol.22 , Issue.3 , pp. 400-407
- Robbins, H.¹ Monro, S.²

30
- 77953098183
- Rusmevichientong, P., J. N. Tsitsiklis. 2010. Linearly parameterized bandits (extended version). http://arxiv.org/abs/0812.3465.
- (2010)
- Rusmevichientong, P.¹ Tsitsiklis, J.N.²

31
- 0001395850
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
- Thompson, W. R. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3) 285-294.
- (1933) Biometrika , vol.25 , Issue.3 , pp. 285-294
- Thompson, W.R.¹

32
- 15844389867
- Bandit problems with side observations
- Wang, C.-C., S. R. Kulkarni, H. V. Poor. 2005a. Bandit problems with side observations. IEEE Trans. Automatic Control 50(3) 338-355.
- (2005) IEEE Trans. Automatic Control , vol.50 , Issue.3 , pp. 338-355
- Wang, C.-C.¹ Kulkarni, S.R.² Poor, H.V.³

33
- 15844362682
- Arbitrary side observations in bandit problems
- Wang, C.-C., S. R. Kulkarni, H. V. Poor. 2005b. Arbitrary side observations in bandit problems. Adv. Appl. Math. 34(4) 903-938.
- (2005) Adv. Appl. Math. , vol.34 , Issue.4 , pp. 903-938
- Wang, C.-C.¹ Kulkarni, S.R.² Poor, H.V.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.