메뉴 건너뛰기




Volumn 71, Issue 13-15, 2008, Pages 2507-2520

Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy

Author keywords

Exploration and exploitation; Markov decision processes; Maximum entropy; Randomized strategy; Reinforcement learning; Shortest path problems

Indexed keywords

DIRECTED GRAPHS; DYNAMIC PROGRAMMING; ENTROPY; GLOBAL OPTIMIZATION; GRAPH THEORY; ITERATIVE METHODS; MARKOV PROCESSES; MAXIMUM ENTROPY METHODS; OPTIMIZATION; PROBABILITY DISTRIBUTIONS; REINFORCEMENT LEARNING; STOCHASTIC SYSTEMS;

EID: 56449125387     PISSN: 09252312     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.neucom.2007.11.040     Document Type: Conference Paper
Times cited : (26)

References (47)
  • 1
    • 0030302511 scopus 로고    scopus 로고
    • Cyclic flows, Markov process and stochastic traffic assignment
    • T. Akamatsu, Cyclic flows, Markov process and stochastic traffic assignment, Transport. Res. B 30 (5) (1996) 369-386.
    • (1996) Transport. Res. B , vol.30 , Issue.5 , pp. 369-386
    • Akamatsu, T.1
  • 7
    • 0000719863 scopus 로고
    • Packet routing in dynamically changing networks: A reinforcement learning approach
    • J.A. Boyan, M.L. Littman, Packet routing in dynamically changing networks: A reinforcement learning approach, Adv. Neural Inf. Process. Syst. (NIPS) 6 (1994) 671-678.
    • (1994) Adv. Neural Inf. Process. Syst. (NIPS) , vol.6 , pp. 671-678
    • Boyan, J.A.1    Littman, M.L.2
  • 8
    • 11344275321 scopus 로고    scopus 로고
    • Fastest mixing Markov chain on a graph
    • S. Boyd, P. Diaconis, L. Xiao, Fastest mixing Markov chain on a graph, SIAM Rev. (2004) 667-689.
    • (2004) SIAM Rev , pp. 667-689
    • Boyd, S.1    Diaconis, P.2    Xiao, L.3
  • 12
    • 0034342516 scopus 로고    scopus 로고
    • On the existence of fixed points for approximate value iteration and temporal-difference learning
    • D.P. de Farias, B.V. Roy, On the existence of fixed points for approximate value iteration and temporal-difference learning, J. Opt. Theory Appl. 105 (2000) 22-32.
    • (2000) J. Opt. Theory Appl , vol.105 , pp. 22-32
    • de Farias, D.P.1    Roy, B.V.2
  • 14
    • 0015078345 scopus 로고
    • A probabilistic multipath assignment model that obviates path enumeration
    • R. Dial, A probabilistic multipath assignment model that obviates path enumeration, Transport. Res. 5 (1971) 83-111.
    • (1971) Transport. Res , vol.5 , pp. 83-111
    • Dial, R.1
  • 15
    • 33847766633 scopus 로고    scopus 로고
    • Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation
    • F. Fouss, A. Pirotte, J.-M. Renders, M. Saerens, Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation, IEEE Trans. Knowl. Data Eng. 19 (3) (2007) 355-369.
    • (2007) IEEE Trans. Knowl. Data Eng , vol.19 , Issue.3 , pp. 355-369
    • Fouss, F.1    Pirotte, A.2    Renders, J.-M.3    Saerens, M.4
  • 16
    • 4844223639 scopus 로고    scopus 로고
    • M. Guo, Y. Liu, J. Malec, A new Q-learning algorithm based on the metropolis criterion, IEEE Trans. Syst. Man Cybernet. B: Cybernet. 34 (5) (2004) 2140-2143.
    • M. Guo, Y. Liu, J. Malec, A new Q-learning algorithm based on the metropolis criterion, IEEE Trans. Syst. Man Cybernet. B: Cybernet. 34 (5) (2004) 2140-2143.
  • 22
    • 0032679082 scopus 로고    scopus 로고
    • Exploration of multi-state environments: Local measures and back-propagation of uncertainty
    • N. Meuleau, P. Bourgine, Exploration of multi-state environments: Local measures and back-propagation of uncertainty, Mach. Learn. 35 (1999) 117-154.
    • (1999) Mach. Learn , vol.35 , pp. 117-154
    • Meuleau, N.1    Bourgine, P.2
  • 29
  • 30
    • 85030589365 scopus 로고    scopus 로고
    • G. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.
    • G. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.
  • 31
    • 22944459214 scopus 로고    scopus 로고
    • The principal components analysis of a graph, and its relationships to spectral clustering
    • Proceedings of the 15th European Conference on Machine Learning ECML, Springer, Berlin
    • M. Saerens, F. Fouss, L. Yen, P. Dupont, The principal components analysis of a graph, and its relationships to spectral clustering, in: Proceedings of the 15th European Conference on Machine Learning (ECML 2004), Lecture Notes in Artificial Intelligence, vol. 3201, Springer, Berlin, 2004, pp. 371-383.
    • (2004) Lecture Notes in Artificial Intelligence , vol.3201 , pp. 371-383
    • Saerens, M.1    Fouss, F.2    Yen, L.3    Dupont, P.4
  • 32
    • 85030575492 scopus 로고    scopus 로고
    • G. Shani, R. Brafman, S. Shimony, Adaptation for changing stochastic environments through online POMDP policy learning, in: Workshop on Reinforcement Learning in Non-Stationary Environments, ECML 2005, 2005, pp. 61-70.
    • G. Shani, R. Brafman, S. Shimony, Adaptation for changing stochastic environments through online POMDP policy learning, in: Workshop on Reinforcement Learning in Non-Stationary Environments, ECML 2005, 2005, pp. 61-70.
  • 33
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • S. Singh, R. Sutton, Reinforcement learning with replacing eligibility traces, Mach. Learn. 22 (1996) 123-158.
    • (1996) Mach. Learn , vol.22 , pp. 123-158
    • Singh, S.1    Sutton, R.2
  • 35
    • 33746329499 scopus 로고    scopus 로고
    • The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem
    • J. Sun, S. Boyd, L. Xiao, P. Diaconis, The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem, SIAM Rev. (2006) 681-699.
    • (2006) SIAM Rev , pp. 681-699
    • Sun, J.1    Boyd, S.2    Xiao, L.3    Diaconis, P.4
  • 37
    • 39649107929 scopus 로고    scopus 로고
    • A one-parameter family of distributed consensus algorithms with boundary: From shortest paths to mean hitting times
    • A. Tahbaz, A. Jadbabaie, A one-parameter family of distributed consensus algorithms with boundary: From shortest paths to mean hitting times, in: Proceedings of IEEE Conference on Decision and Control, 2006, pp. 4664-4669.
    • (2006) Proceedings of IEEE Conference on Decision and Control , pp. 4664-4669
    • Tahbaz, A.1    Jadbabaie, A.2
  • 39
    • 0003411271 scopus 로고
    • Efficient exploration in reinforcement learning
    • Technical Report, School of Computer Science, Carnegie Mellon University
    • S. Thrun, Efficient exploration in reinforcement learning, Technical Report, School of Computer Science, Carnegie Mellon University, 1992.
    • (1992)
    • Thrun, S.1
  • 40
    • 0002210775 scopus 로고
    • The role of exploration in learning control
    • D. White, D. Sofge Eds, Van Nostrand Reinhold, Princeton, NJ
    • S. Thrun, The role of exploration in learning control, in: D. White, D. Sofge (Eds.), Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, Princeton, NJ, 1992.
    • (1992) Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches
    • Thrun, S.1
  • 47
    • 34249833101 scopus 로고
    • Q-learning
    • J.C. Watkins, P. Dayan, Q-learning, Mach. Learn. 8 (3/4) (1992) 279-292.
    • (1992) Mach. Learn , vol.8 , Issue.3-4 , pp. 279-292
    • Watkins, J.C.1    Dayan, P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.