SCOPUS 정보 검색 플랫폼

Volumn 4131 LNCS - I, Issue , 2006, Pages 790-800

Optimal tuning of continual online exploration in reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; ENTROPY; ONLINE SYSTEMS; OPTIMAL SYSTEMS; OPTIMIZATION; PROBABILITY DISTRIBUTIONS;

BELLMAN EQUATIONS; CONTINUAL ONLINE EXPLORATION; OPTIMAL TUNING;

LEARNING SYSTEMS;

EID: 33749864692 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/11840817_82 Document Type: Conference Paper

Times cited : (13)

References (24)

1
- 34047216334
- Timing continual exploration in reinforcement learning
- Y. Achbany, F. Fouss, L. Yen, A. Pirotte, and M. Saerens, Timing continual exploration in reinforcement learning. Technical report, http://www.isys.ucl.ac.be/staff/francois/Articles/Achbany2005a.pdf, 2005.
- (2005) Technical Report
- Achbany, Y.¹ Fouss, F.² Yen, L.³ Pirotte, A.⁴ Saerens, M.⁵

2
- 84890245567
- John Wiley and Sons
- M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear programming: Theory and algorithms. John Wiley and Sons, 1993.
- (1993) Nonlinear Programming: Theory and Algorithms
- Bazaraa, M.S.¹ Sherali, H.D.² Shetty, C.M.³

3
- 0003487482
- Athena Scientific
- D. P. Bertsekas. N euro-dynamic programming. Athena Scientific, 1996.
- (1996) N Euro-dynamic Programming
- Bertsekas, D.P.¹

4
- 0003920776
- Athena Scientific
- D. P. Bertsekas. Network optimization: continuous and discrete models. Athena Scientific, 1998.
- (1998) Network Optimization: Continuous and Discrete Models
- Bertsekas, D.P.¹

5
- 0003565783
- Athena sientific
- D. P. Bertsekas. Dynamic programming and optimal control. Athena sientific, 2000.
- (2000) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

6
- 0000719863
- Packet routing in dynamically changing networks: A reinforcement learning approach
- J. A. Boyan and M. L. Littman. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems 6 (NIPS6), pages 671-678, 1994.
- (1994) Advances in Neural Information Processing Systems 6 (NIPS6) , pp. 671-678
- Boyan, J.A.¹ Littman, M.L.²

7
- 0003708762
- Prentice-hall
- R. G. Brown. Smoothing, forecasting and prediction of discrete time series. Prentice-hall, 1962.
- (1962) Smoothing, Forecasting and Prediction of Discrete Time Series
- Brown, R.G.¹

8
- 0003554178
- Academic Press
- N. Christofides. Graph theory: An algorithmic approach. Academic Press, 1975.
- (1975) Graph Theory: An Algorithmic Approach
- Christofides, N.¹

9
- 84889281816
- John Wiley and Sons
- T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley and Sons, 1991.
- (1991) Elements of Information Theory
- Cover, T.M.¹ Thomas, J.A.²

10
- 0003656494
- Academic Press
- J. N. Kapur and H. K. Kesavan. Entropy optimization principles with applications. Academic Press, 1992.
- (1992) Entropy Optimization Principles with Applications
- Kapur, J.N.¹ Kesavan, H.K.²

11
- 0003979966
- Springer-Verlag
- J. G. Kemeny and J. L. Snell. Finite markov chains. Springer-Verlag, 1976.
- (1976) Finite Markov Chains
- Kemeny, J.G.¹ Snell, J.L.²

12
- 3142771906
- Oxford University Press
- M. J. Osborne. An introduction to game theory. Oxford University Press, 2004.
- (2004) An Introduction to Game Theory
- Osborne, M.J.¹

13
- 0003998394
- Aclclison-Wesley
- H. Raiffa. Decision analysis. Aclclison-Wesley, 1970.
- (1970) Decision Analysis
- Raiffa, H.¹

14
- 0003636089
- On-line q-learning using connectionist systems
- G. Rummery and M. Niranjan. On-line q-learning using connectionist systems. Technical Report CUED/F-INFENC/TR 166, Cambridge University Engineering Departement, 1994.
- (1994) Technical Report CUED/F-INFENC/TR 166, Cambridge University Engineering Departement
- Rummery, G.¹ Niranjan, M.²

15
- 33749835652
- Adaptation for changing stochastic environments through online pomdp policy learning
- G. Shani, R. Brafman, and S. Shimony. Adaptation for changing stochastic environments through online pomdp policy learning. In Workshop on Reinforcement Learning in Non-Stationary Environments, ECML 2005, pages 61-70, 2005.
- (2005) Workshop on Reinforcement Learning in Non-stationary Environments, ECML 2005 , pp. 61-70
- Shani, G.¹ Brafman, R.² Shimony, S.³

16
- 0029753630
- Reinforcement learning with replacing eligibility traces
- S. Singh and R. Sutton. Reinforcement learning with replacing eligibility traces. Machine Learning, 22:123-158, 1996.
- (1996) Machine Learning , vol.22 , pp. 123-158
- Singh, S.¹ Sutton, R.²

17
- 0013025914
- Wiley
- J. C. Spall. Introduction to stochastic search and optimization. Wiley, 2003.
- (2003) Introduction to Stochastic Search and Optimization
- Spall, J.C.¹

18
- 0004102479
- The MIT Press
- R. S. Sutton and A. G. Barto. Reinforcement learning: an introduction. The MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

19
- 0003411271
- Efficient exploration in reinforcement learning
- S. Thrun. Efficient exploration in reinforcement learning. Technical report, School of Computer Science, Carnegie Mellon University, 1992.
- (1992) Technical Report, School of Computer Science, Carnegie Mellon University
- Thrun, S.¹

21
- 27744518715
- MIT Press
- S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. MIT Press, 2005.
- (2005) Probabilistic Robotics
- Thrun, S.¹ Burgard, W.² Fox, D.³

22
- 33644810504
- PhD thesis, Vrije Universiteit Brussel, Belgium
- K. Verbeeck. Coordinated exploration in multi-agent reinforcement learning. PhD thesis, Vrije Universiteit Brussel, Belgium, 2004.
- (2004) Coordinated Exploration in Multi-agent Reinforcement Learning
- Verbeeck, K.¹

23
- 0004049893
- PhD thesis, King's College of Cambridge, UK
- J. C. Watkins. Learning from delayed rewards. PhD thesis, King's College of Cambridge, UK, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, J.C.¹

24
- 34249833101
- Q-learning
- J. C. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3-4):279-292, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 279-292
- Watkins, J.C.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.