SCOPUS 정보 검색 플랫폼

Journal of Machine Learning Research

Volumn 7, Issue , 2006, Pages 1079-1105

Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

(3) Even Bar, Eyal a Mannor, Shie b Mansour, Yishay c

a UNIVERSITY OF PENNSYLVANIA (United States)

b MCGILL UNIVERSITY (Canada)

c TEL AVIV UNIVERSITY (Israel)

Author keywords

[No Author keywords available]

Indexed keywords

MANNOR; MULTI-ARMED BANDIT; REINFORCEMENT LEARNING ALGORITHMS; REINFORCEMENT LEARNING PROBLEMS; TSITSIKLIS;

COMPUTER SIMULATION; LEARNING ALGORITHMS; MATHEMATICAL MODELS; PROBABILITY; PROBLEM SOLVING; ROBUSTNESS (CONTROL SYSTEMS); SPEED CONTROL; STATISTICAL METHODS;

LEARNING SYSTEMS;

EID: 33745295134 PISSN: 15337928 EISSN: 15337928 Source Type: Journal
DOI: None Document Type: Article

Times cited : (668)

References (30)

1
- 0033893766
- Competitive queue policies for differentiated services
- W. A. Aiello, Y. Mansour, S. Rajagopolan, and A. Rosen. Competitive queue policies for differentiated services. In INFOCOM, 2000.
- (2000) INFOCOM
- Aiello, W.A.¹ Mansour, Y.² Rajagopolan, S.³ Rosen, A.⁴

2
- 33745324645
- To appear
- (To appear in J. of Algorithms).
- J. of Algorithms

3
- 0018454769
- Fast probabilistic algorithms for Hamiltonian circuits and matchings
- D. Angluin and L. G. Valiant. Fast probabilistic algorithms for Hamiltonian circuits and matchings. Journal of Computer and System Sciences, 18:155-193, 1979.
- (1979) Journal of Computer and System Sciences , vol.18 , pp. 155-193
- Angluin, D.¹ Valiant, L.G.²

4
- 0029513526
- Gambling in a rigged casino: The adversarial multi-armed bandit problem
- IEEE Computer Society Press
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proc. 36th Annual Symposium on Foundations of Computer Science, pages 322-331. IEEE Computer Society Press, 1995.
- (1995) Proc. 36th Annual Symposium on Foundations of Computer Science , pp. 322-331
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

5
- 0037709910
- The non-stochastic multi-armed bandit problem
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The non-stochastic multi-armed bandit problem. SIAM J. on Computing, 32(1):48-77, 2002.
- (2002) SIAM J. on Computing , vol.32 , Issue.1 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

6
- 0004218171
- Chapman and Hall
- D. A. Berry and B. Fristedt. Bandit Problems. Chapman and Hall, 1985.
- (1985) Bandit Problems
- Berry, D.A.¹ Fristedt, B.²

7
- 0003487482
- Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1995.
- (1995) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

8
- 0003487482
- Athena Scientific, Belmont, MA
- D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

9
- 0033876515
- The O.D.E. method for convergence of stochastic approximation and reinforcement learning
- V. S. Borkar and S.P Meyn. The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447-469, 2000.
- (2000) SIAM J. Control Optim. , vol.38 , Issue.2 , pp. 447-469
- Borkar, V.S.¹ Meyn, S.P.²

10
- 14344266002
- Learning rates for Q-learning
- E. Even-Dar and Y. Mansour. Learning rates for Q-learning. Journal of Machine Learning Research, 5:1-25, 2003.
- (2003) Journal of Machine Learning Research , vol.5 , pp. 1-25
- Even-Dar, E.¹ Mansour, Y.²

11
- 33745301839
- (A preliminary version appeared in the Fourteenth Annual Conference on Computation Learning Theory (2001), 589-604.).
- (2001) A Preliminary Version Appeared in the Fourteenth Annual Conference on Computation Learning Theory , pp. 589-604

12
- 84947403595
- Probability inequalities for sums of bounded random variables
- W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301): 13-30, 1963.
- (1963) Journal of the American Statistical Association , vol.58 , Issue.301 , pp. 13-30
- Hoeffding, W.¹

13
- 0003644124
- MIT press
- R. Howard. Dynamic programming and Markov decision processes. MIT press, 1960.
- (1960) Dynamic Programming and Markov Decision Processes
- Howard, R.¹

14
- 1942514728
- Approximately optimal approximate reinforcement learning
- Morgan Kaufmann
- S. Kakade and J. Langford. Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 267-274. Morgan Kaufmann, 2002.
- (2002) Proceedings of the Nineteenth International Conference on Machine Learning , pp. 267-274
- Kakade, S.¹ Langford, J.²

15
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
- Kearns, M.¹ Singh, S.²

16
- 33745317922
- (A preliminary version appeared in ICML (1998), 260-268.).
- (1998) A Preliminary Version Appeared in ICML , pp. 260-268

17
- 84899026236
- Finite-sample convergence rates for Q-learning and indirect algorithms
- M. Kearns and S. P. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Neural Information Processing Systems 10, pages 996-1002, 1998.
- (1998) Neural Information Processing Systems , vol.10 , pp. 996-1002
- Kearns, M.¹ Singh, S.P.²

18
- 3142742614
- Buffer over-flow management in QoS switches
- A. Kesselman, Z. Lotker, Y. Mansour, B. Patt-Shamir, B. Schieber, and M. Sviridenko. Buffer over-flow management in QoS switches. SIAM J. on Computing, 33(3):563-583, 2004.
- (2004) SIAM J. on Computing , vol.33 , Issue.3 , pp. 563-583
- Kesselman, A.¹ Lotker, Z.² Mansour, Y.³ Patt-Shamir, B.⁴ Schieber, B.⁵ Sviridenko, M.⁶

19
- 33745306819
- (A preliminary version appeared in ACM Symposium on Theory of Computing (2001), 520-529.).
- (2001) A Preliminary Version Appeared in ACM Symposium on Theory of Computing , pp. 520-529

20
- 0002899547
- Asymptotically efficient adaptive allocation rules
- T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
- (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

21
- 0006193487
- A modified dynamic programming method for Markov decision problems
- J. MacQueen. A modified dynamic programming method for Markov decision problems. J. Math. Anal. Appl., 14:38-43, 1966.
- (1966) J. Math. Anal. Appl. , vol.14 , pp. 38-43
- MacQueen, J.¹

22
- 30044441333
- The sample complexity of exploration in the multi-armed bandit problem
- S. Mannor and J. N. Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5:623-648, 2004.
- (2004) Journal of Machine Learning Research , vol.5 , pp. 623-648
- Mannor, S.¹ Tsitsiklis, J.N.²

23
- 33745308559
- (A preliminary version appeared in the Sixteenth Annual Conference on Computation Learning Theory (2003), 418-432.).
- (2003) A Preliminary Version Appeared in the Sixteenth Annual Conference on Computation Learning Theory , pp. 418-432

24
- 0036832956
- Kernel-based reinforcement learning
- D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49(2-3): 161-178, 2002.
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
- Ormoneit, D.¹ Sen, S.²

25
- 0003998452
- Wiley-Interscience
- M. Puterman. Markov Decision Processes. Wiley-Interscience, 1994.
- (1994) Markov Decision Processes
- Puterman, M.¹

26
- 84966203785
- Some aspects of sequential design of experiments
- H. Robbins. Some aspects of sequential design of experiments. Bull. Amer. Math. Soc., 55:527-535, 1952.
- (1952) Bull. Amer. Math. Soc. , vol.55 , pp. 527-535
- Robbins, H.¹

27
- 0028497385
- An upper bound on the loss from approximate optimal-value functions
- S. P. Singh and R. C. Yee. An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16(3):227-233, 1994.
- (1994) Machine Learning , vol.16 , Issue.3 , pp. 227-233
- Singh, S.P.¹ Yee, R.C.²

28
- 0004007508
- R. Sutton and A. Barto. Reinforcement Learning. 1998.
- (1998) Reinforcement Learning
- Sutton, R.¹ Barto, A.²

29
- 31844456754
- Finite time bounds for sampling based fitted value iteration
- Cs. Szepesvri and R. Munos. Finite time bounds for sampling based fitted value iteration. In Proceedings of the 22nd International Conference on Machine Learning (ICML), page 881886, 2005.
- (2005) Proceedings of the 22nd International Conference on Machine Learning (ICML) , pp. 881886
- Szepesvri, Cs.¹ Munos, R.²

30
- 0004049893
- PhD thesis, Cambridge University
- C. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.