SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 6359 LNAI, Issue , 2010, Pages 203-210

Adaptive ε-greedy exploration in reinforcement learning based on value differences

(1) Tokic, Michel a,b

a UNIVERSITY OF APPLIED SCIENCES RAVENSBURG WEINGARTEN (Germany)

b UNIVERSITY OF ULM (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

AD HOC APPROACH; COMMONLY USED; EXPLORATION/EXPLOITATION DILEMMAS; GREEDY EXPLORATION; MULTI ARMED BANDIT; TEMPORAL DIFFERENCE ERRORS; VALUE FUNCTIONS;

ARTIFICIAL INTELLIGENCE; REINFORCEMENT LEARNING;

POTASSIUM IODIDE; REINFORCEMENT LEARNING;

EID: 78349245906 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-642-16111-7_23 Document Type: Conference Paper

Times cited : (269)

References (14)

1
- 0004102479
- MIT Press, Cambridge
- Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

2
- 0004049893
- PhD thesis, University of Cambridge, Cambridge, England
- Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, Cambridge, England (1989)
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

3
- 0003411271
- Efficient exploration in reinforcement learning
- Pittsburgh, PA, USA
- Thrun, S.B.: Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, PA, USA (1992)
- (1992) Technical Report CMU-CS-92-102, Carnegie Mellon University
- Thrun, S.B.¹

4
- 0041965975
- R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
- Brafman, R.I., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213-231 (2002)
- (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

5
- 0036592028
- Control of exploitation-exploration metaparameter in reinforcement learning
- Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation- exploration metaparameter in reinforcement learning. Neural Networks 15(4-6), 665-687 (2002)
- (2002) Neural Networks , vol.15 , Issue.4-6 , pp. 665-687
- Ishii, S.¹ Yoshida, W.² Yoshimoto, J.³

6
- 78349266245
- Interview with richard S. Sutton
- Heidrich-Meisner, V.: Interview with Richard S. Sutton. Künstliche Intelligenz 3, 41-43 (2009)
- (2009) Künstliche Intelligenz , vol.3 , pp. 41-43
- Heidrich-Meisner, V.¹

7
- 33646406807
- Multi-armed bandit algorithms and empirical evaluation
- Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) LNCS (LNAI) Springer, Heidelberg
- Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437-448. Springer, Heidelberg (2005)
- (2005) ECML 2005 , vol.3720 , pp. 437-448
- Vermorel, J.¹ Mohri, M.²

8
- 58349084664
- Improving the exploration strategy in bandit algorithms
- Maniezzo, V., Battiti, R., Watson, J.-P. (eds.) II. LNCS Springer, Heidelberg
- Caelen, O., Bontempi, G.: Improving the exploration strategy in bandit algorithms. In: Maniezzo, V., Battiti, R., Watson, J.-P. (eds.) LION 2007 II. LNCS, vol. 5313, pp. 56-68. Springer, Heidelberg (2008)
- (2008) LION 2007 , vol.5313 , pp. 56-68
- Caelen, O.¹ Bontempi, G.²

9
- 0003636089
- On-line Q-learning using connectionist systems
- Cambridge University
- Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University (1994)
- (1994) Technical Report CUED/F-INFENG/TR 166
- Rummery, G.A.¹ Niranjan, M.²

10
- 0003565779
- Prentice-Hall, Englewood Cliffs
- Bertsekas, D.P.: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs (1987)
- (1987) Dynamic Programming: Deterministic and Stochastic Models
- Bertsekas, D.P.¹

11
- 33748998787
- Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
- George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 65(1), 167-198 (2006)
- (2006) Machine Learning , vol.65 , Issue.1 , pp. 167-198
- George, A.P.¹ Powell, W.B.²

12
- 84966203785
- Some aspects of the sequential design of experiments
- Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58, 527-535 (1952)
- (1952) Bulletin of the American Mathematical Society , vol.58 , pp. 527-535
- Robbins, H.¹

13
- 4544345025
- Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches
- Chicago, IL, USA ACM, New York
- Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, pp. 45-53. ACM, New York (2004)
- (2004) Proceedings of the 36th Annual ACM Symposium on Theory of Computing , pp. 45-53
- Awerbuch, B.¹ Kleinberg, R.D.²

14
- 4243096065
- Exploitation vs. exploration: Choosing a supplier in an environment of incomplete information
- Azoulay-Schwartz, R., Kraus, S., Wilkenfeld, J.: Exploitation vs. exploration: Choosing a supplier in an environment of incomplete information. Decision Support Systems 38(1), 1-18 (2004)
- (2004) Decision Support Systems , vol.38 , Issue.1 , pp. 1-18
- Azoulay-Schwartz, R.¹ Kraus, S.² Wilkenfeld, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.