메뉴 건너뛰기




Volumn , Issue , 2007, Pages 2368-2375

Q-learning algorithms for optimal stopping based on least squares

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; APPROXIMATION THEORY; ITERATIVE METHODS; REINFORCEMENT LEARNING; STOCHASTIC CONTROL SYSTEMS; STOCHASTIC SYSTEMS;

EID: 84927748655     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.23919/ecc.2007.7068523     Document Type: Conference Paper
Times cited : (26)

References (18)
  • 1
    • 0033351917 scopus 로고    scopus 로고
    • Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives
    • J. N. Tsitsiklis and B. Van Roy, "Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives," IEEE Trans. Automat. Contr., vol. 44, pp. 1840-1851, 1999.
    • (1999) IEEE Trans. Automat. Contr. , vol.44 , pp. 1840-1851
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 2
    • 84974489693 scopus 로고
    • Numerical valuation of high dimensional multivariate American securities
    • J. Barraquand and D. Martineau, "Numerical valuation of high dimensional multivariate American securities," Journal of Financial and Quantitative Analysis, vol. 30, pp. 383-405, 1995.
    • (1995) Journal of Financial and Quantitative Analysis , vol.30 , pp. 383-405
    • Barraquand, J.1    Martineau, D.2
  • 3
    • 0035578679 scopus 로고    scopus 로고
    • Valuing American options by simulation: A simple least-squares approach
    • F. A. Longstaff and E. S. Schwartz, "Valuing American options by simulation: A simple least-squares approach," Review of Financial Studies, vol. 14, pp. 113-147, 2001.
    • (2001) Review of Financial Studies , vol.14 , pp. 113-147
    • Longstaff, F.A.1    Schwartz, E.S.2
  • 6
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 7
    • 4243567726 scopus 로고    scopus 로고
    • Temporal differences-based policy iteration and applications in neuro-dynamic programming
    • D. P. Bertsekas and S. Ioffe, "Temporal differences-based policy iteration and applications in neuro-dynamic programming," MIT, LIDS Tech. Report LIDS-P-2349, 1996.
    • (1996) MIT, LIDS Tech. Report LIDS-P-2349
    • Bertsekas, D.P.1    Ioffe, S.2
  • 8
    • 0037288398 scopus 로고    scopus 로고
    • Least squares policy evaluation algorithms with linear function approximation
    • A. Nedíc and D. P. Bertsekas, "Least squares policy evaluation algorithms with linear function approximation," Discrete Event Dyn. Syst., vol. 13, pp. 79-110, 2003.
    • (2003) Discrete Event Dyn. Syst. , vol.13 , pp. 79-110
    • Nedíc, A.1    Bertsekas, D.P.2
  • 11
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. J. Bradtke and A. G. Barto, "Linear least-squares algorithms for temporal difference learning," Machine Learning, vol. 22, no. 2, pp. 33-57, 1996.
    • (1996) Machine Learning , vol.22 , Issue.2 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 13
    • 33646435300 scopus 로고    scopus 로고
    • A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
    • D. S. Choi and B. Van Roy, "A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning," Discrete Event Dyn. Syst., vol. 16, no. 2, pp. 207-239, 2006.
    • (2006) Discrete Event Dyn. Syst. , vol.16 , Issue.2 , pp. 207-239
    • Choi, D.S.1    Van Roy, B.2
  • 15
    • 0004049893 scopus 로고
    • Doctoral Dissertation University of Cambridge, Cambridge, United Kingdom
    • C. J. C. H. Watkins, "Learning from delayed rewards," Doctoral dissertation, University of Cambridge, Cambridge, United Kingdom, 1989.
    • (1989) Learning from delayed rewards
    • Watkins, C.J.C.H.1
  • 17
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Qlearning
    • J. N. Tsitsiklis, "Asynchronous stochastic approximation and Qlearning," Machine Learning, vol. 16, pp. 185-202, 1994.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 18
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Automat. Contr., vol. 42, no. 5, pp. 674-690, 1997.
    • (1997) IEEE Trans. Automat. Contr. , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.