메뉴 건너뛰기




Volumn 227, Issue , 2007, Pages 751-758

Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; ERROR CORRECTION; FUNCTION EVALUATION; KALMAN FILTERS; LINEAR SYSTEMS;

EID: 34547974097     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1273496.1273591     Document Type: Conference Paper
Times cited : (9)

References (18)
  • 1
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • San Francisco: Morgan Kaufman Publishers
    • Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. Twelfth International Conference on Machine Learning (pp. 30-37). San Francisco: Morgan Kaufman Publishers.
    • (1995) Twelfth International Conference on Machine Learning , pp. 30-37
    • Baird, L.1
  • 2
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • Boyan, J. (2002). Technical update: Least-squares temporal difference learning. Machine Learning, 49, 233-246.
    • (2002) Machine Learning , vol.49 , pp. 233-246
    • Boyan, J.1
  • 3
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke, S. J., amp; Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33-57.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 4
    • 33646435300 scopus 로고    scopus 로고
    • A generalized kalman filter for fixed point approximation and efficient temporal difference learning
    • Choi, D., amp; Roy, B. V. (2006). A generalized kalman filter for fixed point approximation and efficient temporal difference learning. Discrete Event Dynamic Systems, 16, 207-239.
    • (2006) Discrete Event Dynamic Systems , vol.16 , pp. 207-239
    • Choi, D.1    Roy, B.V.2
  • 5
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the MAXQ value function decomposition
    • Dietterieh, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13, 227-303.
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
    • Dietterieh, T.G.1
  • 10
    • 0036832953 scopus 로고    scopus 로고
    • Variable resolution discretization in optimal control
    • Munos, R., amp; Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291-323.
    • (2002) Machine Learning , vol.49 , pp. 291-323
    • Munos, R.1    Moore, A.2
  • 11
    • 22944468429 scopus 로고    scopus 로고
    • A convergent form of approximate policy iteration
    • Vancouver, British Columbia, Canada: MIT Press
    • Perkins, T. J., amp; Precup, D. (2002). A convergent form of approximate policy iteration. Neural Information Processing Systems (pp. 1595-1602). Vancouver, British Columbia, Canada: MIT Press.
    • (2002) Neural Information Processing Systems , pp. 1595-1602
    • Perkins, T.J.1    Precup, D.2
  • 12
    • 30044434365 scopus 로고    scopus 로고
    • Incremental learning of linear model trees
    • Potts, D., amp; Sammut, C. (2005). Incremental learning of linear model trees. Machine Learning, 61, 5-48.
    • (2005) Machine Learning , vol.61 , pp. 5-48
    • Potts, D.1    Sammut, C.2
  • 14
    • 0036236260 scopus 로고    scopus 로고
    • Instrumental variable methods for system identification
    • Soderstrom, T., amp; Stoica, P. (2002). Instrumental variable methods for system identification. Circuits Systems Signal Processing, 21, 1-9.
    • (2002) Circuits Systems Signal Processing , vol.21 , pp. 1-9
    • Soderstrom, T.1    Stoica, P.2
  • 16
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J. N., amp; Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Roy, B.V.2
  • 18
    • 0041345290 scopus 로고    scopus 로고
    • Efficient reinforcement learning using recursive least squares methods
    • Xu, X., gen He, H., & Hu, D. (2002). Efficient reinforcement learning using recursive least squares methods. Journal of Artificial Intelligence Research, 16, 259-292.
    • (2002) Journal of Artificial Intelligence Research , vol.16 , pp. 259-292
    • Xu, X.1    gen He, H.2    Hu, D.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.