메뉴 건너뛰기




Volumn 34, Issue 5, 2004, Pages 2140-2143

A new Q-learning algorithm based on the metropolis criterion

Author keywords

[No Author keywords available]

Indexed keywords

COMBINATORIAL MATHEMATICS; LEARNING ALGORITHMS; OPTIMIZATION; SIMULATED ANNEALING; THEOREM PROVING;

EID: 4844223639     PISSN: 10834419     EISSN: None     Source Type: Journal    
DOI: 10.1109/TSMCB.2004.832154     Document Type: Article
Times cited : (150)

References (14)
  • 1
    • 0033170372 scopus 로고    scopus 로고
    • Between MDP's and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • R. S. Sutton, D. Precup, and S. Singh, "Between MDP's and semi-MDPs: A framework for temporal abstraction in reinforcement learning," Artific. Intell., vol. 112, pp. 181-211, 1999.
    • (1999) Artific. Intell. , vol.112 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 2
    • 4844229682 scopus 로고    scopus 로고
    • A summary on reinforcement learning
    • in Chinese
    • M. Z. Guo, B. Chen, X. L. Wang, and J. R. Hong, "A summary on reinforcement learning" (in Chinese), Comput. Sci., vol. 25, no. 3, pp. 13-15, 1998.
    • (1998) Comput. Sci. , vol.25 , Issue.3 , pp. 13-15
    • Guo, M.Z.1    Chen, B.2    Wang, X.L.3    Hong, J.R.4
  • 3
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal difference
    • R. S. Sutton, "Learning to predict by the method of temporal difference," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
    • (1988) Mach. Learn. , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 4
    • 0004049893 scopus 로고
    • Ph.D dissertation, Psychol. Dept., Cambridge Univ., Cambridge, U.K
    • C. J. C. H. Watkins, "Learning from delayed rewards," Ph.D dissertation, Psychol. Dept., Cambridge Univ., Cambridge, U.K., 1989.
    • (1989) Learning From Delayed Rewards
    • Watkins, C.J.C.H.1
  • 5
    • 34249833101 scopus 로고
    • Q-learning
    • C. J. C. H. Watkins and P. Dayan, "Q-learning," Mach. Learn., vol. 8, no. 3, pp. 279-292, 1992.
    • (1992) Mach. Learn. , vol.8 , Issue.3 , pp. 279-292
    • Watkins, C.J.C.H.1    Dayan, P.2
  • 6
    • 4844228133 scopus 로고    scopus 로고
    • Combining the methods of temporal differences with neural network for real-time modeling and prediction of time series
    • in Chinese
    • L. Yang, J. R. Hong, and T. Y. Huang, "Combining the methods of temporal differences with neural network for real-time modeling and prediction of time series" (in Chinese), Chinese J. Comput., vol. 19, no. 9, pp. 695-700, 1996.
    • (1996) Chinese J. Comput. , vol.19 , Issue.9 , pp. 695-700
    • Yang, L.1    Hong, J.R.2    Huang, T.Y.3
  • 7
    • 0033148990 scopus 로고    scopus 로고
    • Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development
    • M. Asada, E. Uchibe, and K. Hosoda, "Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development," Intell., vol. 110, pp. 275-292, 1999.
    • (1999) Intell. , vol.110 , pp. 275-292
    • Asada, M.1    Uchibe, E.2    Hosoda, K.3
  • 9
    • 0029679044 scopus 로고    scopus 로고
    • Reinforcement learning. A survey
    • L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning. A survey," J. AI Res., vol. 4, pp. 237-285, 1996.
    • (1996) J. AI Res. , vol.4 , pp. 237-285
    • Kaelbling, L.P.1    Littman, M.L.2    Moore, A.W.3
  • 11
    • 0033687233 scopus 로고    scopus 로고
    • Nature's way of optimizing
    • S. Boettcher and A. Percus, "Nature's way of optimizing," Artific. Intell., vol. 119, pp. 275-286, 2000.
    • (2000) Artific. Intell. , vol.119 , pp. 275-286
    • Boettcher, S.1    Percus, A.2
  • 12
    • 26444479778 scopus 로고
    • Optimization by simulated annealing
    • S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 671-680, 1983.
    • (1983) Science , vol.220 , pp. 671-680
    • Kirkpatrick, S.1    Gelatt, C.D.2    Vecchi, M.P.3
  • 14
    • 0031208987 scopus 로고    scopus 로고
    • Explanation-based learning and reinforcement learning: A unified view
    • T. G. Dietterich and N. S. Flann, "Explanation-based learning and reinforcement learning: A unified view," Mach. Learn., vol. 28, pp. 169-210, 1997.
    • (1997) Mach. Learn. , vol.28 , pp. 169-210
    • Dietterich, T.G.1    Flann, N.S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.