SCOPUS 정보 검색 플랫폼

2007 European Control Conference, ECC 2007

Volumn , Issue , 2007, Pages 2368-2375

Q-learning algorithms for optimal stopping based on least squares

(2) Yu, Huizhen a Bertsekas, Dimitri P b

a UNIVERSITY OF HELSINKI (Finland)

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; APPROXIMATION THEORY; ITERATIVE METHODS; REINFORCEMENT LEARNING; STOCHASTIC CONTROL SYSTEMS; STOCHASTIC SYSTEMS;

ALTERNATIVE ALGORITHMS; LINEAR FUNCTIONS; OPTIMAL STOPPING; OPTIMAL STOPPING PROBLEM; Q-LEARNING ALGORITHMS; STOCHASTIC APPROXIMATIONS; TEMPORAL DIFFERENCES; VALUE ITERATION;

LEARNING ALGORITHMS;

EID: 84927748655 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.23919/ecc.2007.7068523 Document Type: Conference Paper

Times cited : (26)

References (18)

1
- 0033351917
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives
- J. N. Tsitsiklis and B. Van Roy, "Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives," IEEE Trans. Automat. Contr., vol. 44, pp. 1840-1851, 1999.
- (1999) IEEE Trans. Automat. Contr. , vol.44 , pp. 1840-1851
- Tsitsiklis, J.N.¹ Van Roy, B.²

2
- 84974489693
- Numerical valuation of high dimensional multivariate American securities
- J. Barraquand and D. Martineau, "Numerical valuation of high dimensional multivariate American securities," Journal of Financial and Quantitative Analysis, vol. 30, pp. 383-405, 1995.
- (1995) Journal of Financial and Quantitative Analysis , vol.30 , pp. 383-405
- Barraquand, J.¹ Martineau, D.²

3
- 0035578679
- Valuing American options by simulation: A simple least-squares approach
- F. A. Longstaff and E. S. Schwartz, "Valuing American options by simulation: A simple least-squares approach," Review of Financial Studies, vol. 14, pp. 113-147, 2001.
- (2001) Review of Financial Studies , vol.14 , pp. 113-147
- Longstaff, F.A.¹ Schwartz, E.S.²

4
- 0003487482
- Belmont, MA: Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

5
- 0003565783
- 3rd ed. Belmont, MA: Athena Scientific
- D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. II, 3rd ed. Belmont, MA: Athena Scientific, 2007.
- (2007) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

6
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

7
- 4243567726
- Temporal differences-based policy iteration and applications in neuro-dynamic programming
- D. P. Bertsekas and S. Ioffe, "Temporal differences-based policy iteration and applications in neuro-dynamic programming," MIT, LIDS Tech. Report LIDS-P-2349, 1996.
- (1996) MIT, LIDS Tech. Report LIDS-P-2349
- Bertsekas, D.P.¹ Ioffe, S.²

8
- 0037288398
- Least squares policy evaluation algorithms with linear function approximation
- A. Nedíc and D. P. Bertsekas, "Least squares policy evaluation algorithms with linear function approximation," Discrete Event Dyn. Syst., vol. 13, pp. 79-110, 2003.
- (2003) Discrete Event Dyn. Syst. , vol.13 , pp. 79-110
- Nedíc, A.¹ Bertsekas, D.P.²

9
- 34547991475
- MIT, LIDS Tech. Report 2697
- H. Yu and D. P. Bertsekas, "Convergence results for some temporal difference methods based on least squares," MIT, LIDS Tech. Report 2697, 2006.
- (2006) Convergence results for some temporal difference methods based on least squares
- Yu, H.¹ Bertsekas, D.P.²

10
- 58849124361
- MIT, LIDS Tech. Report 2731
- -, "A least squares Q-learning algorithm for optimal stopping problems," MIT, LIDS Tech. Report 2731, 2006.
- (2006) A least squares Q-learning algorithm for optimal stopping problems
- Yu, H.¹ Bertsekas, D.P.²

11
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- S. J. Bradtke and A. G. Barto, "Linear least-squares algorithms for temporal difference learning," Machine Learning, vol. 22, no. 2, pp. 33-57, 1996.
- (1996) Machine Learning , vol.22 , Issue.2 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

12
- 0038595396
- Least-squares temporal difference learning
- J. A. Boyan, "Least-squares temporal difference learning," in Proc. The 16th Int. Conf. Machine Learning, 1999.
- (1999) Proc. The 16th Int. Conf. Machine Learning
- Boyan, J.A.¹

13
- 33646435300
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- D. S. Choi and B. Van Roy, "A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning," Discrete Event Dyn. Syst., vol. 16, no. 2, pp. 207-239, 2006.
- (2006) Discrete Event Dyn. Syst. , vol.16 , Issue.2 , pp. 207-239
- Choi, D.S.¹ Van Roy, B.²

14
- 33645396919
- Improved temporal difference methods with linear function approximation
- by A. Barto, W. Powell, J. Si, (Eds.), IEEE Press
- D. P. Bertsekas, V. S. Borkar, and A. Nedíc, "Improved temporal difference methods with linear function approximation," MIT, LIDS Tech. Report 2573, 2003, also appears in "Learning and Approximate Dynamic Programming," by A. Barto, W. Powell, J. Si, (Eds.), IEEE Press, 2004.
- (2004) MIT, LIDS Tech. Report 2573, 2003, Also Appears in Learning and Approximate Dynamic Programming
- Bertsekas, D.P.¹ Borkar, V.S.² Nedíc, A.³

15
- 0004049893
- Doctoral Dissertation University of Cambridge, Cambridge, United Kingdom
- C. J. C. H. Watkins, "Learning from delayed rewards," Doctoral dissertation, University of Cambridge, Cambridge, United Kingdom, 1989.
- (1989) Learning from delayed rewards
- Watkins, C.J.C.H.¹

16
- 34249833101
- Q-learning
- C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, pp. 279-292, 1992.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

17
- 0028497630
- Asynchronous stochastic approximation and Qlearning
- J. N. Tsitsiklis, "Asynchronous stochastic approximation and Qlearning," Machine Learning, vol. 16, pp. 185-202, 1994.
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

18
- 0031143730
- An analysis of temporal-difference learning with function approximation
- J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Automat. Contr., vol. 42, no. 5, pp. 674-690, 1997.
- (1997) IEEE Trans. Automat. Contr. , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.