메뉴 건너뛰기




Volumn 6, Issue , 2005, Pages

Prioritization methods for accelerating MDP solvers

Author keywords

Dynamic programming; Markov Decision Processes; Policy iteration; Prioritized sweeping; Value iteration

Indexed keywords

ALGORITHMS; CALCULATIONS; DYNAMIC PROGRAMMING; ITERATIVE METHODS; LEARNING SYSTEMS; PROBLEM SOLVING;

EID: 21844451909     PISSN: 15337928     EISSN: None     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (70)

References (40)
  • 4
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • Andrew G. Barto, S. J. Bradtke, and Satinder P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81-138, 1995.
    • (1995) Artificial Intelligence , vol.72 , Issue.1 , pp. 81-138
    • Barto, A.G.1    Bradtke, S.J.2    Singh, S.P.3
  • 6
    • 0020822225 scopus 로고
    • Distributed asynchronous computation of fixed points
    • Dimitri P. Bertsekas. Distributed asynchronous computation of fixed points. Mathematics Programming, 27:107-120, 1983.
    • (1983) Mathematics Programming , vol.27 , pp. 107-120
    • Bertsekas, D.P.1
  • 12
  • 14
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49:209-232, 2002.
    • (2002) Machine Learning , vol.49 , pp. 209-232
    • Kearns, M.J.1    Singh, S.P.2
  • 19
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less time
    • Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13:103-130, 1993.
    • (1993) Machine Learning , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 20
    • 0029514510 scopus 로고
    • The parti-game algorithm for variable resolution reinforcement learning in multidimensional state space
    • Andrew W. Moore and Christopher G. Atkeson. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state space. Machine Learning, 21:199-233, 1995.
    • (1995) Machine Learning , vol.21 , pp. 199-233
    • Moore, A.W.1    Atkeson, C.G.2
  • 21
    • 0036832953 scopus 로고    scopus 로고
    • Variable resolution discretization in optimal control
    • Remi Munos and Andrew W. Moore. Variable resolution discretization in optimal control. Machine Learning, 49:291-323, 2002.
    • (2002) Machine Learning , vol.49 , pp. 291-323
    • Munos, R.1    Moore, A.W.2
  • 26
    • 0037581251 scopus 로고
    • Modified policy iteration algorithms for discounted Markov Decision Problems
    • Martin L. Puterman and Moon C. Shin. Modified policy iteration algorithms for discounted Markov Decision Problems. Management Science, 24:1127-1137, 1978.
    • (1978) Management Science , vol.24 , pp. 1127-1137
    • Puterman, M.L.1    Shin, M.C.2
  • 27
    • 13244294436 scopus 로고    scopus 로고
    • PhD thesis, University of Birmingham, Birmingham, United Kingdom
    • Stuart I. Reynolds. Reinforcement Learning with Exploration. PhD thesis, University of Birmingham, Birmingham, United Kingdom, 2002.
    • (2002) Reinforcement Learning with Exploration
    • Reynolds, S.I.1
  • 28
    • 0003636089 scopus 로고
    • On-line Q-learning using connectionist systems
    • Cambridge University, Cambridge, United Kingdom
    • Gavin A. Rummery and Mahesan Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University, Cambridge, United Kingdom, 1994.
    • (1994) Technical Report , vol.CUED-F-INFENG-TR 166
    • Rummery, G.A.1    Niranjan, M.2
  • 31
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • Satinder P. Singh and Richard S. Sutton. Reinforcement learning with replacing eligibility traces. Machine Learning, 22:123-158, 1996.
    • (1996) Machine Learning , vol.22 , pp. 123-158
    • Singh, S.P.1    Sutton, R.S.2
  • 32
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 33
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • Richard S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems, 8:1038-1044, 1996.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
    • Sutton, R.S.1
  • 35
    • 0012252296 scopus 로고
    • Tight performance bounds on greedy policies based on imperfect value functions
    • Northeastern University, Boston, MA
    • Ronald J. Williams and Leemon C. Baird. Tight performance bounds on greedy policies based on imperfect value functions. Technical Report NU-CCS-93-14, Northeastern University, Boston, MA, 1993.
    • (1993) Technical Report , vol.NU-CCS-93-14
    • Williams, R.J.1    Baird, L.C.2
  • 39
    • 0036374229 scopus 로고    scopus 로고
    • Speeding up the convergence of value iteration in partially observable Markov Decision Processes
    • Nevin L. Zhang and Weihong Zhang. Speeding up the convergence of value iteration in partially observable Markov Decision Processes. Journal of Artificial Intelligence Research, 14:29-51, 2001.
    • (2001) Journal of Artificial Intelligence Research , vol.14 , pp. 29-51
    • Zhang, N.L.1    Zhang, W.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.