메뉴 건너뛰기




Volumn , Issue , 2010, Pages 1207-1214

Convergence of least squares temporal difference methods under general conditions

Author keywords

[No Author keywords available]

Indexed keywords

BOUNDEDNESS PROPERTIES; DISCOUNTED COST CRITERION; FINITE STATE; LEARNING CONTEXT; LEAST SQUARE; MARKOV CHAIN; MARKOV DECISION PROCESSES; POLICY EVALUATION; PRACTICAL IMPLEMENTATION; SIMULATION-BASED; TEMPORAL DIFFERENCES; TEMPORAL-DIFFERENCE ALGORITHM; TOPOLOGICAL SPACES;

EID: 77956517288     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (41)

References (21)
  • 1
    • 33744819512 scopus 로고    scopus 로고
    • Adaptive importance sampling technique for Markov chains using stochastic approximation
    • Ahamed, T. P., Borkar, V. S., and Juneja, S. Adaptive importance sampling technique for Markov chains using stochastic approximation. Operations Research, 54:489-504, 2006.
    • (2006) Operations Research , vol.54 , pp. 489-504
    • Ahamed, T.P.1    Borkar, V.S.2    Juneja, S.3
  • 3
    • 77956540624 scopus 로고    scopus 로고
    • Projected equations, variational inequalities, and temporal difference methods
    • to appear
    • Bertsekas, D. P. Projected equations, variational inequalities, and temporal difference methods. IEEE Trans. Automat. Contr., 2009. to appear.
    • (2009) IEEE Trans. Automat. Contr.
    • Bertsekas, D.P.1
  • 5
    • 61849106433 scopus 로고    scopus 로고
    • Projected equation methods for approximate solution of large linear systems
    • Bertsekas, D. P. and Yu, H. Projected equation methods for approximate solution of large linear systems. J. Computational and Applied Mathematics, 227(1): 27-50, 2009.
    • (2009) J. Computational and Applied Mathematics , vol.227 , Issue.1 , pp. 27-50
    • Bertsekas, D.P.1    Yu, H.2
  • 7
    • 0038595396 scopus 로고    scopus 로고
    • Least-squares temporal difference learning
    • Boyan, J. A. Least-squares temporal difference learning. In Proc. the 16th ICML, pp. 49-56, 1999.
    • (1999) Proc. the 16th ICML , pp. 49-56
    • Boyan, J.A.1
  • 8
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke, S. J. and Barto, A. G. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(2):33-57, 1996.
    • (1996) Machine Learning , vol.22 , Issue.2 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 9
    • 32944469001 scopus 로고
    • Probability
    • Philadelphia, PA
    • Breiman, L. Probability. SIAM, Philadelphia, PA, 1992.
    • (1992) SIAM
    • Breiman, L.1
  • 11
    • 0001240715 scopus 로고
    • Importance sampling for stochastic simulations
    • Glynn, P. W. and Iglehart, D. L. Importance sampling for stochastic simulations. Management Science, 35: 1367-1392, 1989.
    • (1989) Management Science , vol.35 , pp. 1367-1392
    • Glynn, P.W.1    Iglehart, D.L.2
  • 15
    • 0037288398 scopus 로고    scopus 로고
    • Least squares policy evaluation algorithms with linear function approximation
    • Nedic, A. and Bertsekas, D. P. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn. Syst., 13:79-110, 2003.
    • (2003) Discrete Event Dyn. Syst. , vol.13 , pp. 79-110
    • Nedic, A.1    Bertsekas, D.P.2
  • 16
    • 4644328593 scopus 로고    scopus 로고
    • Off-policy temporal-difference learning with function approximation
    • Precup, D., Sutton, R. S., and Dasgupta, S. Off-policy temporal-difference learning with function approximation. In Proc. the 18th ICML, pp. 417-424, 2001.
    • (2001) Proc. the 18th ICML , pp. 417-424
    • Precup, D.1    Sutton, R.S.2    Dasgupta, S.3
  • 17
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R. S. Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 19
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J. N. and Van Roy, B. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Contr., 42(5):674- 690, 1997.
    • (1997) IEEE Trans. Automat. Contr. , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 20
    • 56449123618 scopus 로고    scopus 로고
    • Preconditioned temporal difference learning
    • Yao, H. S. and Liu, Z. Q. Preconditioned temporal difference learning. In Proc. the 25th ICML, pp. 1208-1215, 2008.
    • (2008) Proc. the 25th ICML , pp. 1208-1215
    • Yao, H.S.1    Liu, Z.Q.2
  • 21
    • 77956506470 scopus 로고    scopus 로고
    • Convergence of least squares temporal difference methods under general conditions
    • Yu, H. Convergence of least squares temporal difference methods under general conditions. Tech. Report C-2010-1, Dept. CS, Univ. of Helsinki, 2010.
    • (2010) Tech. Report C-2010-1, Dept. CS, Univ. of Helsinki
    • Yu, H.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.