SCOPUS 정보 검색 플랫폼

Volumn 71, Issue 13-15, 2008, Pages 2507-2520

Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy

(5) Achbany, Youssef a Fouss, François b Yen, Luh a Pirotte, Alain a Saerens, Marco a

a LOUVAIN SCHOOL OF MANAGEMENT (Belgium)

b Facultes Universitaires Catholiques de Mons (Belgium)

Author keywords

Exploration and exploitation; Markov decision processes; Maximum entropy; Randomized strategy; Reinforcement learning; Shortest path problems

Indexed keywords

DIRECTED GRAPHS; DYNAMIC PROGRAMMING; ENTROPY; GLOBAL OPTIMIZATION; GRAPH THEORY; ITERATIVE METHODS; MARKOV PROCESSES; MAXIMUM ENTROPY METHODS; OPTIMIZATION; PROBABILITY DISTRIBUTIONS; REINFORCEMENT LEARNING; STOCHASTIC SYSTEMS;

EXPLORATION AND EXPLOITATION; EXPLORATION EXPLOITATIONS; GLOBAL OPTIMIZATION PROBLEMS; MARKOV DECISION PROCESSES; RANDOMIZED STRATEGY; SHORTEST PATH PROBLEM; STATIONARY ENVIRONMENTS; STOCHASTIC SHORTEST PATH PROBLEM;

NONLINEAR EQUATIONS;

COMPUTER SIMULATION; CONCEPTUAL FRAMEWORK; CONFERENCE PAPER; CONTROLLED STUDY; COST MINIMIZATION ANALYSIS; ENTROPY; ENVIRONMENTAL EXPLOITATION; EXPERIENTIAL LEARNING; INFORMATION PROCESSING; LEARNING ALGORITHM; NONLINEAR SYSTEM; PRIORITY JOURNAL; PROBABILITY; PROBLEM SOLVING; REINFORCEMENT; STATISTICAL ANALYSIS; TUNING CURVE;

EID: 56449125387 PISSN: 09252312 EISSN: None Source Type: Journal
DOI: 10.1016/j.neucom.2007.11.040 Document Type: Conference Paper

Times cited : (26)

References (47)

1
- 0030302511
- Cyclic flows, Markov process and stochastic traffic assignment
- T. Akamatsu, Cyclic flows, Markov process and stochastic traffic assignment, Transport. Res. B 30 (5) (1996) 369-386.
- (1996) Transport. Res. B , vol.30 , Issue.5 , pp. 369-386
- Akamatsu, T.¹

2
- 0007440184
- Wiley, New York
- J. Bather, Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions, Wiley, New York, 2000.
- (2000) Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions
- Bather, J.¹

3
- 84890245567
- Wiley, New York
- M.S. Bazaraa, H.D. Sherali, C.M. Shetty, Nonlinear Programming: Theory and Algorithms, Wiley, New York, 1993.
- (1993) Nonlinear Programming: Theory and Algorithms
- Bazaraa, M.S.¹ Sherali, H.D.² Shetty, C.M.³

4
- 0003920776
- Athena Scientific
- D.P. Bertsekas, Network Optimization: Continuous and Discrete Models, Athena Scientific, 1998.
- (1998) Network Optimization: Continuous and Discrete Models
- Bertsekas, D.P.¹

5
- 0003565783
- Athena Scientific
- D.P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, 2000.
- (2000) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

6
- 0003487482
- Athena Scientific
- D.P. Bertsekas, J. Tsitsiklis, Neuro-dynamic Programming, Athena Scientific, 1996.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.²

7
- 0000719863
- Packet routing in dynamically changing networks: A reinforcement learning approach
- J.A. Boyan, M.L. Littman, Packet routing in dynamically changing networks: A reinforcement learning approach, Adv. Neural Inf. Process. Syst. (NIPS) 6 (1994) 671-678.
- (1994) Adv. Neural Inf. Process. Syst. (NIPS) , vol.6 , pp. 671-678
- Boyan, J.A.¹ Littman, M.L.²

8
- 11344275321
- Fastest mixing Markov chain on a graph
- S. Boyd, P. Diaconis, L. Xiao, Fastest mixing Markov chain on a graph, SIAM Rev. (2004) 667-689.
- (2004) SIAM Rev , pp. 667-689
- Boyd, S.¹ Diaconis, P.² Xiao, L.³

9
- 56449111759
- Smoothing
- Prentice-Hall, Englewood Cliffs, NJ
- R.G. Brown, Smoothing, Forecasting and Prediction of Discrete Time Series, Prentice-Hall, Englewood Cliffs, NJ, 1962.
- (1962) Forecasting and Prediction of Discrete Time Series
- Brown, R.G.¹

10
- 0003554178
- Academic Press, New York
- N. Christofides, Graph Theory: An Algorithmic Approach, Academic Press, New York, 1975.
- (1975) Graph Theory: An Algorithmic Approach
- Christofides, N.¹

11
- 84889281816
- Wiley, New York
- T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley, New York, 1991.
- (1991) Elements of Information Theory
- Cover, T.M.¹ Thomas, J.A.²

12
- 0034342516
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- D.P. de Farias, B.V. Roy, On the existence of fixed points for approximate value iteration and temporal-difference learning, J. Opt. Theory Appl. 105 (2000) 22-32.
- (2000) J. Opt. Theory Appl , vol.105 , pp. 22-32
- de Farias, D.P.¹ Roy, B.V.²

13
- 85030579534
- Manuscript submitted for publication
- J.-C. Delvenne, Pagerank, entropy and free energy, 2005, Manuscript submitted for publication.
- (2005) Pagerank, entropy and free energy
- Delvenne, J.-C.¹

14
- 0015078345
- A probabilistic multipath assignment model that obviates path enumeration
- R. Dial, A probabilistic multipath assignment model that obviates path enumeration, Transport. Res. 5 (1971) 83-111.
- (1971) Transport. Res , vol.5 , pp. 83-111
- Dial, R.¹

15
- 33847766633
- Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation
- F. Fouss, A. Pirotte, J.-M. Renders, M. Saerens, Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation, IEEE Trans. Knowl. Data Eng. 19 (3) (2007) 355-369.
- (2007) IEEE Trans. Knowl. Data Eng , vol.19 , Issue.3 , pp. 355-369
- Fouss, F.¹ Pirotte, A.² Renders, J.-M.³ Saerens, M.⁴

16
- 4844223639
- M. Guo, Y. Liu, J. Malec, A new Q-learning algorithm based on the metropolis criterion, IEEE Trans. Syst. Man Cybernet. B: Cybernet. 34 (5) (2004) 2140-2143.
- M. Guo, Y. Liu, J. Malec, A new Q-learning algorithm based on the metropolis criterion, IEEE Trans. Syst. Man Cybernet. B: Cybernet. 34 (5) (2004) 2140-2143.

17
- 0029679044
- Reinforcement learning: A survey
- L.P. Kaelbling, M. Littman, A. Moore, Reinforcement learning: A survey, J. Artif. Intell. Res. 4 (1996) 237-285.
- (1996) J. Artif. Intell. Res , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.² Moore, A.³

18
- 0003758197
- Springer, Berlin
- L. Kanal, V. Kumar, Search in Artificial Intelligence, Springer, Berlin, 1988.
- (1988) Search in Artificial Intelligence
- Kanal, L.¹ Kumar, V.²

19
- 0003656494
- Academic Press, New York
- J.N. Kapur, H.K. Kesavan, Entropy Optimization Principles with Applications, Academic Press, New York, 1992.
- (1992) Entropy Optimization Principles with Applications
- Kapur, J.N.¹ Kesavan, H.K.²

20
- 0003979966
- Springer, Berlin
- J.G. Kemeny, J.L. Snell, Finite Markov Chains, Springer, Berlin, 1976.
- (1976) Finite Markov Chains
- Kemeny, J.G.¹ Snell, J.L.²

21
- 85149834820
- Markov games as a framework for multi-agent reinforcement learning
- M.L. Littman, Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-94), 1994, pp. 157-163.
- (1994) Proceedings of the 11th International Conference on Machine Learning (ICML-94) , pp. 157-163
- Littman, M.L.¹

22
- 0032679082
- Exploration of multi-state environments: Local measures and back-propagation of uncertainty
- N. Meuleau, P. Bourgine, Exploration of multi-state environments: Local measures and back-propagation of uncertainty, Mach. Learn. 35 (1999) 117-154.
- (1999) Mach. Learn , vol.35 , pp. 117-154
- Meuleau, N.¹ Bourgine, P.²

23
- 33746878798
- Memo 2001-003, Massachusetts Institute of Technology
- N. Meuleau, L. Peshkin, K. Kim, Exploration in gradient-based reinforcement learning, AI Memo 2001-003, Massachusetts Institute of Technology, 2001.
- (2001) Exploration in gradient-based reinforcement learning, AI
- Meuleau, N.¹ Peshkin, L.² Kim, K.³

24
- 0004255908
- McGraw-Hill Companies
- T.M. Mitchell, Machine Learning, McGraw-Hill Companies, 1997.
- (1997) Machine Learning
- Mitchell, T.M.¹

25
- 3142771906
- Oxford University Press, Oxford
- M.J. Osborne, An Introduction to Game Theory, Oxford University Press, Oxford, 2004.
- (2004) An Introduction to Game Theory
- Osborne, M.J.¹

26
- 0033326218
- Adaptive exploration in reinforcement learning
- R. Patrascu, D. Stacey, Adaptive exploration in reinforcement learning, in: Proceedings of the International Joint Conference on Neural Networks (IJCNN1999), 1999, pp. 2276-2281.
- (1999) Proceedings of the International Joint Conference on Neural Networks (IJCNN1999) , pp. 2276-2281
- Patrascu, R.¹ Stacey, D.²

27
- 0036082856
- Reinforcement learning for adaptive routing
- L. Peshkin, V. Savova, Reinforcement learning for adaptive routing, in: Proceedings of the International Joint Conference on Neural Networks (IJNN2002), 2002, pp. 1825-1830.
- (2002) Proceedings of the International Joint Conference on Neural Networks (IJNN2002) , pp. 1825-1830
- Peshkin, L.¹ Savova, V.²

28
- 0003998452
- Wiley, New York
- M. Puterman, Markov Decision Processes: Discrete Stochastic Programming, Wiley, New York, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Programming
- Puterman, M.¹

29
- 0003998394
- Addison-Wesley, Reading, MA
- H. Raiffa, Decision Analysis, Addison-Wesley, Reading, MA, 1970.
- (1970) Decision Analysis
- Raiffa, H.¹

30
- 85030589365
- G. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.
- G. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.

31
- 22944459214
- The principal components analysis of a graph, and its relationships to spectral clustering
- Proceedings of the 15th European Conference on Machine Learning ECML, Springer, Berlin
- M. Saerens, F. Fouss, L. Yen, P. Dupont, The principal components analysis of a graph, and its relationships to spectral clustering, in: Proceedings of the 15th European Conference on Machine Learning (ECML 2004), Lecture Notes in Artificial Intelligence, vol. 3201, Springer, Berlin, 2004, pp. 371-383.
- (2004) Lecture Notes in Artificial Intelligence , vol.3201 , pp. 371-383
- Saerens, M.¹ Fouss, F.² Yen, L.³ Dupont, P.⁴

32
- 85030575492
- G. Shani, R. Brafman, S. Shimony, Adaptation for changing stochastic environments through online POMDP policy learning, in: Workshop on Reinforcement Learning in Non-Stationary Environments, ECML 2005, 2005, pp. 61-70.
- G. Shani, R. Brafman, S. Shimony, Adaptation for changing stochastic environments through online POMDP policy learning, in: Workshop on Reinforcement Learning in Non-Stationary Environments, ECML 2005, 2005, pp. 61-70.

33
- 0029753630
- Reinforcement learning with replacing eligibility traces
- S. Singh, R. Sutton, Reinforcement learning with replacing eligibility traces, Mach. Learn. 22 (1996) 123-158.
- (1996) Mach. Learn , vol.22 , pp. 123-158
- Singh, S.¹ Sutton, R.²

34
- 0013025914
- Wiley, New York
- J.C. Spall, Introduction to Stochastic Search and Optimization, Wiley, New York, 2003.
- (2003) Introduction to Stochastic Search and Optimization
- Spall, J.C.¹

35
- 33746329499
- The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem
- J. Sun, S. Boyd, L. Xiao, P. Diaconis, The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem, SIAM Rev. (2006) 681-699.
- (2006) SIAM Rev , pp. 681-699
- Sun, J.¹ Boyd, S.² Xiao, L.³ Diaconis, P.⁴

36
- 0004102479
- The MIT Press, Cambridge, MA
- R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction, The MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

37
- 39649107929
- A one-parameter family of distributed consensus algorithms with boundary: From shortest paths to mean hitting times
- A. Tahbaz, A. Jadbabaie, A one-parameter family of distributed consensus algorithms with boundary: From shortest paths to mean hitting times, in: Proceedings of IEEE Conference on Decision and Control, 2006, pp. 4664-4669.
- (2006) Proceedings of IEEE Conference on Decision and Control , pp. 4664-4669
- Tahbaz, A.¹ Jadbabaie, A.²

38
- 0004126844
- third ed, Academic Press, New York
- H.M. Taylor, S. Karlin, An Introduction to Stochastic Modeling, third ed., Academic Press, New York, 1998.
- (1998) An Introduction to Stochastic Modeling
- Taylor, H.M.¹ Karlin, S.²

39
- 0003411271
- Efficient exploration in reinforcement learning
- Technical Report, School of Computer Science, Carnegie Mellon University
- S. Thrun, Efficient exploration in reinforcement learning, Technical Report, School of Computer Science, Carnegie Mellon University, 1992.
- (1992)
- Thrun, S.¹

40
- 0002210775
- The role of exploration in learning control
- D. White, D. Sofge Eds, Van Nostrand Reinhold, Princeton, NJ
- S. Thrun, The role of exploration in learning control, in: D. White, D. Sofge (Eds.), Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, Princeton, NJ, 1992.
- (1992) Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches
- Thrun, S.¹

41
- 27744518715
- MIT Press, Cambridge
- S. Thrun, W. Burgard, D. Fox, Probabilistic Robotics, MIT Press, Cambridge, 2005.
- (2005) Probabilistic Robotics
- Thrun, S.¹ Burgard, W.² Fox, D.³

42
- 2442632432
- Wiley, New York
- H.C. Tijms, A First Course in Stochastic Models, Wiley, New York, 2003.
- (2003) A First Course in Stochastic Models
- Tijms, H.C.¹

43
- 84923382376
- Linearly-solvable Markov decision problems
- MIT Press, Cambridge, MA
- E. Todorov, Linearly-solvable Markov decision problems, in: Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 2006.
- (2006) Advances in Neural Information Processing Systems
- Todorov, E.¹

44
- 84880475269
- A new paradigm for ranking pages on the World Wide Web
- J. Tomlin, A new paradigm for ranking pages on the World Wide Web, in: Proceedings of the International World Wide Web Conference (WWW2003), 2003, pp. 350-355.
- (2003) Proceedings of the International World Wide Web Conference (WWW2003) , pp. 350-355
- Tomlin, J.¹

45
- 33644810504
- Ph.D. Thesis, Vrije Universiteit Brussel, Belgium
- K. Verbeeck, Coordinated exploration in multi-agent reinforcement learning, Ph.D. Thesis, Vrije Universiteit Brussel, Belgium, 2004.
- (2004) Coordinated exploration in multi-agent reinforcement learning
- Verbeeck, K.¹

46
- 0004049893
- Ph.D. Thesis, King's College of Cambridge, UK
- J.C. Watkins, Learning from delayed rewards, Ph.D. Thesis, King's College of Cambridge, UK, 1989.
- (1989) Learning from delayed rewards
- Watkins, J.C.¹

47
- 34249833101
- Q-learning
- J.C. Watkins, P. Dayan, Q-learning, Mach. Learn. 8 (3/4) (1992) 279-292.
- (1992) Mach. Learn , vol.8 , Issue.3-4 , pp. 279-292
- Watkins, J.C.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.