SCOPUS 정보 검색 플랫폼

2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings

Volumn , Issue , 2009, Pages 74-81

Basis Function Adaptation Methods for Cost Approximation in MDP

(2) Yu, Huizhen a Bertsekas, Dimitri P b

a UNIVERSITY OF HELSINKI (Finland)

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ADAPTATION FRAMEWORK; ADAPTATION METHODS; ADAPTATION SCHEME; BASIS FUNCTIONS; FUNCTION APPROXIMATION; LOW ORDER; MARKOV DECISION PROCESS; NONLINEAR OPTIMAL; OBJECTIVE FUNCTIONS; POLICY GRADIENT METHODS; TD METHOD; TEMPORAL DIFFERENCES;

COSTS; DYNAMIC PROGRAMMING; GRADIENT METHODS; LEARNING ALGORITHMS; MARKOV PROCESSES; REINFORCEMENT LEARNING; SYSTEMS ENGINEERING;

REINFORCEMENT;

EID: 67650458822 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ADPRL.2009.4927528 Document Type: Conference Paper

Times cited : (39)

References (17)

1
- 17444414191
- Basis function adaptation in temporal difference reinforcement learning
- DOI 10.1007/s10479-005-5732-z
- I. Menache, S. Mannor, and N. Shimkin, "Basis function adaptation in temporal difference reinforcement learning," Ann. Oper. Res., Vol. 134, no. 1, pp. 215-238, 2005. (Pubitemid 40550047)
- (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
- Menache, I.¹ Mannor, S.² Shimkin, N.³

2
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, Vol. 3, pp. 9-44, 1988.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

3
- 0003487482
- Belmont, MA: Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

4
- 0004007508
- Cambridge, MA: MIT Press
- R. S. Sutton and A. G. Barto, Reinforcement Learning. Cambridge, MA: MIT Press, 1998.
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

5
- 0003565783
- Belmont, MA: Athena Scientific
- D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. Belmont, MA: Athena Scientific, 2007, Vol. II.
- (2007) Dynamic Programming and Optimal Control, 3rd ed , vol.2
- Bertsekas, D.P.¹

6
- 4043069840
- Actor-critic algorithms
- V. R. Konda and J. N. Tsitsiklis, "Actor-critic algorithms," SIAM J. Control Optim., Vol. 42, no. 4, pp. 1143-1166, 2003.
- (2003) SIAM J. Control Optim. , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.R.¹ Tsitsiklis, J.N.²

7
- 0038595396
- Least-squares temporal difference learning
- J. A. Boyan, "Least-squares temporal difference learning," in Proc. The 16th Int. Conf. Machine Learning, 1999.
- (1999) Proc. The 16th Int. Conf. Machine Learning
- Boyan, J.A.¹

8
- 85036496976
- Improved temporal difference methods with linear function approximation
- IEEE Press
- D. P. Bertsekas, V. S. Borkar, and A. Nedić, "Improved temporal difference methods with linear function approximation," in Learning and Approximate Dynamic Programming. IEEE Press, 2004.
- (2004) Learning and Approximate Dynamic Programming
- Bertsekas, D.P.¹ Borkar, V.S.² Nedić, A.³

9
- 28544451799
- Stochastic approximation with 'controlled Markov' noise
- V. S. Borkar, "Stochastic approximation with 'controlled Markov' noise," Systems Control Lett., Vol. 55, pp. 139-145, 2006.
- (2006) Systems Control Lett. , vol.55 , pp. 139-145
- Borkar, V.S.¹

10
- 58849087743
- New Delhi: Hindustan Book Agency
- V. S. Borkar, Stochastic Approximation: A Dynamic Viewpoint. New Delhi: Hindustan Book Agency, 2008.
- (2008) Stochastic Approximation: A Dynamic Viewpoint
- Borkar, V.S.¹

11
- 67650362344
- Projected equation methods for approximate solution of large linear systems
- to appear
- D. P. Bertsekas and H. Yu, "Projected equation methods for approximate solution of large linear systems," J. Comput. Sci. Appl. Math., 2008, to appear.
- (2008) J. Comput. Sci. Appl. Math.
- Bertsekas, D.P.¹ Yu, H.²

12
- 0033351917
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- DOI 10.1109/9.793723
- J. N. Tsitsiklis and B. Van Roy, "Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives," IEEE Trans. Automat. Contr., Vol. 44, pp. 1840-1851, 1999. (Pubitemid 30546876)
- (1999) IEEE Transactions on Automatic Control , vol.44 , Issue.10 , pp. 1840-1851
- Tsitsiklis, J.N.¹ Van Roy, B.²

13
- 33646435300
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- D. S. Choi and B. Van Roy, "A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning," Discrete Event Dyn. Syst., Vol. 16, no. 2, pp. 207-239, 2006.
- (2006) Discrete Event Dyn. Syst. , vol.16 , Issue.2 , pp. 207-239
- Choi, D.S.¹ Van Roy, B.²

14
- 58849124361
- A least squares Q-learning algorithm for optimal stopping problems
- H. Yu and D. P. Bertsekas, "A least squares Q-learning algorithm for optimal stopping problems," MIT, LIDS Tech. Report 2731, 2006.
- (2006) MIT, LIDS Tech. Report , vol.2731
- Yu, H.¹ Bertsekas, D.P.²

15
- 0004258516
- Berlin: Springer-Verlag
- R. T. Rockafellar and R. J.-B. Wets, Variational Analysis. Berlin: Springer-Verlag, 1998.
- (1998) Variational Analysis
- Rockafellar, R.T.¹ Wets, R.J.-B.²

16
- 0000516813
- An implicit-function theorem for a class of nonsmooth functions
- S. M. Robinson, "An implicit-function theorem for a class of nonsmooth functions," Math. Oper. Res., Vol. 16, no. 2, pp. 292-309, 1991.
- (1991) Math. Oper. Res. , vol.16 , Issue.2 , pp. 292-309
- Robinson, S.M.¹

17
- 46749106339
- Robinson's implicit function theorem and its extensions
- A. L. Dontchev and R. T. Rockafellar, "Robinson's implicit function theorem and its extensions," Math. Program. Ser. B, Vol. 117, no. 1, pp. 129-147, 2008.
- (2008) Math. Program. Ser. B , vol.117 , Issue.1 , pp. 129-147
- Dontchev, A.L.¹ Rockafellar, R.T.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.