SCOPUS 정보 검색 플랫폼

Volumn , Issue , 1997, Pages 1019-1025

Local bandit approximation for optimal learning problems

Author keywords

[No Author keywords available]

Indexed keywords

MARKOV PROCESSES;

ADAPTIVE CONTROL; BANDIT PROCESS; BAYES-OPTIMAL; GITTINS INDEX; LEARNING PROBLEM; LEARNING STRATEGY; MARKOV DECISION PROCESSES;

OPTIMIZATION;

EID: 16244388049 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (24)

References (15)

1
- 0041443966
- Caution, probing and the value of information in the control of uncertain systems
- Bar-Shalom, Y. & Tse, E. (1976) Caution, probing and the value of information in the control of uncertain systems, Ann. Econ. Soc. Meas. 5:323-337.
- (1976) Ann. Econ. Soc. Meas , vol.5 , pp. 323-337
- Bar-Shalom, Y.¹ Tse, E.²

2
- 84938011869
- On adaptive control processes
- R. Bellman & R. Kalaba, (1959) On adaptive control processes. IRE Trans., 4:1-9.
- (1959) IRE Trans. , vol.4 , pp. 1-9
- Bellman, R.¹ Kalaba, R.²

3
- 0018678571
- Adaptive control of markov chains i: Finite parameter set
- Bokar, V. & Varaiya, P.P. (1979) Adaptive control of Markov chains I: finite parameter set. IEEE Trans. Auto. Control 24:953-958.
- (1979) IEEE Trans. Auto. Control , vol.24 , pp. 953-958
- Bokar, V.¹ Varaiya, P.P.²

4
- 84900550689
- Markov decision processes with uncertain transition probabilities
- Cozzolino, J.M., Gonzalez-Zubieta, R., & Miller, R.L. (1965) Markov decision processes with uncertain transition probabilities. Tech. Rpt. 11, Operations Research Center, MIT.
- (1965) Tech. Rpt. 11, Operations Research Center, MIT
- Cozzolino, J.M.¹ Gonzalez-Zubieta, R.² Miller, R.L.³

6
- 85152636797
- Q-learning for bandit problems
- Duff, M.O. (1995) Q-learning for bandit problems, in Machine Learning: Proceedings of the Twelfth International Conference on Machine Learning: pp. 209-217.
- (1995) Machine Learning: Proceedings of the Twelfth International Conference on Machine Learning , pp. 209-217
- Duff, M.O.¹

7
- 84899025295
- Technical Report, Deptartment of Computer Science, Univ. of Massachusetts, Amherst
- Duff, M.O. (1997) Approximate computational methods for optimal learning and dual control. Technical Report, Deptartment of Computer Science, Univ. of Massachusetts, Amherst.
- (1997) Approximate Computational Methods for Optimal Learning and Dual Control
- Duff, M.O.¹

8
- 0004003001
- Academic Press
- Feldbaum, A. (1965) Optimal Control Systems, Academic Press.
- (1965) Optimal Control Systems
- Feldbaum, A.¹

9
- 0000169010
- Bandit processes and dynamic allocation indices (with discussion)
- Gittins, J.C. & Jones, D. (1979) Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. 5 41:148-177.
- (1979) J. R. Statist. Soc , vol.5 , Issue.41 , pp. 148-177
- Gittins, J.C.¹ Jones, D.²

10
- 0023345261
- The multi-armed bandit problem: Decomposition and computation
- Katehakis, M.H. & Veinott, A.F. (1987) The multi-armed bandit problem: decomposition and computation Math. OR 12: 262-268.
- (1987) Math. or , vol.12 , pp. 262-268
- Katehakis, M.H.¹ Veinott, A.F.²

11
- 0006193487
- A modified dynamic programming method for Markov decision problems
- MacQueen, J. (1966). A modified dynamic programming method for Markov decision problems, J. Math. Anal. Appl., 14:38-43.
- (1966) J. Math. Anal. Appl. , vol.14 , pp. 38-43
- MacQueen, J.¹

12
- 84899002741
- Algorithms for evaluating the dynamic allocation index
- Robinsion, D.R. (1981) Algorithms for evaluating the dynamic allocation index. Research Report No. 80/DRR/4, Manchester-Sheffield School of Probability and Statistics.
- (1981) Research Report No. 80/DRR/4, Manchester-Sheffield School of Probability and Statistics
- Robinsion, D.R.¹

13
- 0027725920
- A short proof of the Gittins index theorem
- Tsitsiklis, J. (1993) A short proof of the Gittins index theorem. Proc. 32nd Conf. Dec. and Control: 389-390.
- (1993) Proc. 32nd Conf. Dec. and Control , pp. 389-390
- Tsitsiklis, J.¹

14
- 0022060331
- Extensions of the multiarmed bandit problem: The discounted case
- Varaiya, P.P., Walrand, J.C, & Buyukkoc, C. (1985) Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Auto. Control 30(5):426-439.
- (1985) IEEE Trans. Auto. Control , vol.30 , Issue.5 , pp. 426-439
- Varaiya, P.P.¹ Walrand, J.C.² Buyukkoc, C.³

15
- 0004049893
- Ph.D. Thesis, Cambidge University
- Watkins, C. (1989) Learning from Delayed Rewards Ph.D. Thesis, Cambidge University.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.