SCOPUS 정보 검색 플랫폼

Volumn 53, Issue 5, 2006, Pages 762-799

Combining expert advice in reactive environments

c NONE (United States)

Author keywords

Complexity and performance bounds; Experts algorithms; Exploration exploitation tradeoffs; Reactive environments; Sequential decision making

Indexed keywords

ASYMPTOTIC STABILITY; COMPUTATIONAL COMPLEXITY; DECISION MAKING; LEARNING SYSTEMS; POLYNOMIALS;

COMPLEXITY AND PERFORMANCE BOUNDS; EXPERTS ALGORITHMS; EXPLORATION-EXPLOITATION TRADEOFFS; REACTIVE ENVIRONMENTS; SEQUENTIAL DECISION MAKING;

ALGORITHMS;

EID: 33845302015 PISSN: 00045411 EISSN: None Source Type: Journal
DOI: 10.1145/1183907.1183911 Document Type: Article

Times cited : (32)

References (19)

1
- 0037709910
- The nonstochastic multiarmed bandit problem
- AUER, P., CESA-BIANCHI, N., FREUND, Y., AND SCHAPIRE, R. E. 2002. The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32, 1.
- (2002) SIAM J. Comput. , vol.32 , pp. 1
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

2
- 0031140246
- How to use expert advice
- CESA-BIANCHI, N., FREUND, Y., HAUSSLER, D., HELMBOLD, D. P., SCHAPIRE, R. E., AND WARMUTH, M. K. 1997. How to use expert advice. J. ACM 44, 427-485.
- (1997) J. ACM , vol.44 , pp. 427-485
- Cesa-Bianchi, N.¹ Freund, Y.² Haussler, D.³ Helmbold, D.P.⁴ Schapire, R.E.⁵ Warmuth, M.K.⁶

3
- 0000182415
- A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations
- CHERNOFF, H. 1952. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23, 493-507.
- (1952) Ann. Math. Stat. , vol.23 , pp. 493-507
- Chernoff, H.¹

4
- 33845304828
- How to combine expert (or novice) advice when actions impact the environment
- DE FARIAS, D., AND MEGIDDO, N. 2004. How to combine expert (or novice) advice when actions impact the environment. In Advances in Neural Information Processing Systems, Vol. 16.
- (2004) Advances in Neural Information Processing Systems , vol.16
- De Farias, D.¹ Megiddo, N.²

5
- 0004060486
- Wiley, New York
- FELLER, W. 1971. Probability Theory and its Applications. Wiley, New York.
- (1971) Probability Theory and its Applications
- Feller, W.¹

6
- 0002095886
- A randomization rule for selecting forecasts
- FOSTER, D. P. AND VOHRA, R. V. 1993. A randomization rule for selecting forecasts. Oper. Res. 41, 704-709.
- (1993) Oper. Res. , vol.41 , pp. 704-709
- Foster, D.P.¹ Vohra, R.V.²

7
- 0002476325
- Regret and the on-line decision problem
- FOSTER, D. AND VOHRA, R. 1999. Regret and the on-line decision problem. Games Econ. Behav. 29, 7-35.
- (1999) Games Econ. Behav. , vol.29 , pp. 7-35
- Foster, D.¹ Vohra, R.²

9
- 0002267135
- Adaptive game playing using multiplicative weights
- FREUND, Y., AND SCHAPIRE, R. E. 1999. Adaptive game playing using multiplicative weights. Games Econ. Behav. 29, 79-103.
- (1999) Games Econ. Behav. , vol.29 , pp. 79-103
- Freund, Y.¹ Schapire, R.E.²

10
- 0004247096
- The MIT Press, Cambridge, MA
- FUDENBERG, D., AND LEVINE, D. 1997. The Theory of Learning in Games. The MIT Press, Cambridge, MA.
- (1997) The Theory of Learning in Games
- Fudenberg, D.¹ Levine, D.²

11
- 84947403595
- Probability inequalities for sums of bounded random variables
- HOEFFDING, W. 1963. Probability inequalities for sums of bounded random variables. J. ASA 58, 13-30.
- (1963) J. ASA , vol.58 , pp. 13-30
- Hoeffding, W.¹

12
- 23244466805
- Ph.D. dissertation, Gatsby Computational Neuroscience Unit, University College, London, England
- KAKADE, S. 2003. On the sample complexity of reinforcement learning. Ph.D. dissertation, Gatsby Computational Neuroscience Unit, University College, London, England.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.¹

13
- 84899026236
- Finite-sample convergence rates for Q-learning and indirect algorithms
- KEARNS, M., AND SINGH, S. 1999. Finite-sample convergence rates for Q-learning and indirect algorithms. In Neural Information Processing Systems 12. MIT Press.
- (1999) Neural Information Processing Systems 12. MIT Press
- Kearns, M.¹ Singh, S.²

14
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- KEARNS, M., AND SINGH, S. 2002. Near-optimal reinforcement learning in polynomial time. Mach. Learn. 49, 2, 209-232.
- (2002) Mach. Learn. , vol.49 , Issue.2 , pp. 209-232
- Kearns, M.¹ Singh, S.²

15
- 0029344133
- Machine learning and nonparametric bandit theory
- LAI, T.-L., AND YAKOWITZ, S. 1995. Machine learning and nonparametric bandit theory. IEEE Trans. Automat. Cont. 40, 7, 1199-1209.
- (1995) IEEE Trans. Automat. Cont. , vol.40 , Issue.7 , pp. 1199-1209
- Lai, T.-L.¹ Yakowitz, S.²

16
- 35148838877
- The weighted majority algorithm
- LITTLESTONE, N., AND WARMUTH, M. 1994. The weighted majority algorithm. Inf. Comput. 108, 2, 212-261.
- (1994) Inf. Comput. , vol.108 , Issue.2 , pp. 212-261
- Littlestone, N.¹ Warmuth, M.²

17
- 0032047115
- A game of prediction with expert advice
- VOVK, V. 1998. A game of prediction with expert advice. J. Compu. Syst. Sci. 56, 153-173.
- (1998) J. Compu. Syst. Sci. , vol.56 , pp. 153-173
- Vovk, V.¹

18
- 34249833101
- Q-learning
- WATKINS, C., AND DAYAN, P. 1992. Q-learning. Mach. Learn. 8, 279-292.
- (1992) Mach. Learn. , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

19
- 0004228766
- Cambridge University Press, Cambridge, UK
- WILLIAMS, D. 1991. Probability with Martingales. Cambridge University Press, Cambridge, UK.
- (1991) Probability with Martingales
- Williams, D.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.