메뉴 건너뛰기




Volumn , Issue , 2008, Pages 528-536

Dyna-style planning with linear function approximation and prioritized sweeping

Author keywords

[No Author keywords available]

Indexed keywords

BACKING-UP; LEAST SQUARE; LIMIT POINTS; LINEAR APPROXIMATIONS; LINEAR FUNCTIONS; MODEL BASED APPROACH; MODEL FREE; NATURAL CONDITIONS; ON-LINE SETTING; OPTIMAL CONTROL POLICY; POLICY EVALUATION; PRIORITIZED SWEEPING; STATE SPACE; STATE TRANSITIONS; VALUE FUNCTIONS; WORLD MODEL;

EID: 80053284668     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (111)

References (26)
  • 1
    • 0039816976 scopus 로고
    • Using local trajectory optimizers to speed up global optimization in dynamic programming
    • Atkeson, C. (1993). Using local trajectory optimizers to speed up global optimization in dynamic programming. Advances in Neural Information Processing Systems, 5, 663-670.
    • (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 663-670
    • Atkeson, C.1
  • 4
    • 0034248853 scopus 로고    scopus 로고
    • Stochastic dynamic programming with factored representations
    • Boutilier, C., Dearden, R., Goldszmidt, M. (2000). Stochastic dynamic programming with factored representations. Artificial Intelligence 121: 49-107.
    • (2000) Artificial Intelligence , vol.121 , pp. 49-107
    • Boutilier, C.1    Dearden, R.2    Goldszmidt, M.3
  • 7
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • DOI 10.1023/A:1017936530646
    • Boyan, J. A. (2002). Technical update: Least-squares temporal difference learning. Machine Learning, 49:233-246. (Pubitemid 34325688)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
    • Boyan, J.A.1
  • 8
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke, S., Barto, A. G. (1996). Linear least-squares al gorithms for temporal difference learning. Machine Learning, 22:33-57. (Pubitemid 126724362)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
    • Bradtke, S.J.1
  • 10
    • 0030242092 scopus 로고    scopus 로고
    • General results on the convergence of stochastic algorithms
    • PII S0018928696067748
    • Delyon, B. (1996). General results on the convergence of stochastic algorithms. IEEE Transactions on Automatic Control, 41:1245-1255. (Pubitemid 126768500)
    • (1996) IEEE Transactions on Automatic Control , vol.41 , Issue.9 , pp. 1245-1255
    • Delyon, B.1
  • 11
    • 33750737011 scopus 로고    scopus 로고
    • Incremental least-squares temporal difference learning
    • Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
    • Geramifard, A., Bowling, M., Sutton, R. S. (2006). Incremental least-square temporal difference learning. Proceedings of the National Conference on Artificial Intelligence, pp. 356-361. (Pubitemid 44705310)
    • (2006) Proceedings of the National Conference on Artificial Intelligence , vol.1 , pp. 356-361
    • Geramifard, A.1    Bowling, M.2    Sutton, R.S.3
  • 15
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less real time
    • Moore, A. W., Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13:103-130.
    • (1993) Machine Learning , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 17
    • 84977063352 scopus 로고
    • Efficient learning and planning within the Dyna framework
    • Peng, J.,Williams, R.J. (1993). Efficient learning and planning within the Dyna framework, Adaptive Behavior 1, 437-454.
    • (1993) Adaptive Behavior , vol.1 , pp. 437-454
    • Peng, J.1    Williams, R.J.2
  • 22
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3:9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 23
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh International Conference on Machine Learning, pp. 216-224.
    • (1990) Proceedings of the Seventh International Conference on Machine Learning , pp. 216-224
    • Sutton, R.S.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.