SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Conference on Decision and Control

Volumn , Issue , 2010, Pages 1409-1416

Q-learning and enhanced policy iteration in discounted dynamic programming

(2) Bertsekas, Dimitri P a Yu, Huizhen b

a Department of Electrical Engineering and Computer Science (United States)

b UNIVERSITY OF HELSINKI (Finland)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; DYNAMIC PROGRAMMING; ITERATIVE METHODS; LINEAR SYSTEMS; Q FACTOR MEASUREMENT; REINFORCEMENT LEARNING; STOCHASTIC SYSTEMS; TABLE LOOKUP;

LARGE-SCALE PROBLEM; LINEAR BASIS FUNCTION; LINEAR SYSTEM OF EQUATIONS; MARKOVIAN DECISION PROBLEMS; OPTIMAL STOPPING PROBLEM; POLICY EVALUATION; Q-LEARNING ALGORITHMS; TEMPORAL DIFFERENCES;

LEARNING ALGORITHMS;

EID: 79953158573 PISSN: 07431546 EISSN: 25762370 Source Type: Conference Proceeding
DOI: 10.1109/CDC.2010.5717930 Document Type: Conference Paper

Times cited : (7)

References (42)

1
- 0037225359
- Stochastic Approximation for Non-Expansive Maps: Application to QLearning Algorithms
- Abounadi, J., Bertsekas, D. P., and Borkar, V., 2002. "Stochastic Approximation for Non-Expansive Maps: Application to QLearning Algorithms," SIAM J. on Control and Optimization, Vol. 41, pp. 1-22.
- (2002) SIAM J. on Control and Optimization , vol.41 , pp. 1-22
- Abounadi, J.¹ Bertsekas, D.P.² Borkar, V.³

2
- 4243567726
- Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming
- MIT, Cambridge, MA
- Bertsekas, D. P., and Ioffe, S., 1996. "Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming," Lab. for Info. and Decision Systems Report LIDSP-2349, MIT, Cambridge, MA.
- (1996) Lab. for Info. and Decision Systems Report LIDSP-2349
- Bertsekas, D.P.¹ Ioffe, S.²

3
- 0003636164
- Prentice-Hall, Englewood Cliffs, N. J; republished by Athena Scientific, Belmont, MA
- Bertsekas, D. P., and Tsitsiklis, J. N., 1989. Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, N. J; republished by Athena Scientific, Belmont, MA, 1997.
- (1989) Parallel and Distributed Computation: Numerical Methods
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

4
- 0003487482
- Athena Scientific, Belmont, MA
- Bertsekas, D. P., and Tsitsiklis, J. N., 1996. Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

5
- 79952385345
- Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
- MIT.
- Bertsekas, D. P., and Yu, H., 2010. "Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming" Lab. for Information and Decision Systems Report 2831, MIT.
- (2010) Lab. for Information and Decision Systems Report 2831
- Bertsekas, D.P.¹ Yu, H.²

6
- 0020138998
- Distributed Dynamic Programming
- Bertsekas, D. P., 1982. "Distributed Dynamic Programming," IEEE Trans. Automatic Control, Vol. AC-27, pp. 610-616.
- (1982) IEEE Trans. Automatic Control , vol.AC-27 , pp. 610-616
- Bertsekas, D.P.¹

7
- 0020822225
- Asynchronous Distributed Computation of Fixed Points
- Bertsekas, D. P., 1983. "Asynchronous Distributed Computation of Fixed Points," Math. Programming, Vol. 27, pp. 107-120.
- (1983) Math. Programming , vol.27 , pp. 107-120
- Bertsekas, D.P.¹

8
- 0003565783
- 3rd Edition, Athena Scientific, Belmont, MA
- Bertsekas, D. P., 2005. Dynamic Programming and Optimal Control, 3rd Edition, Vol. I, Athena Scientific, Belmont, MA.
- (2005) Dynamic Programming and Optimal Control , vol.1
- Bertsekas, D.P.¹

9
- 0003565783
- 3rd Edition, Athena Scientific, Belmont, MA
- Bertsekas, D. P., 2007. Dynamic Programming and Optimal Control, 3rd Edition, Vol. II, Athena Scientific, Belmont, MA.
- (2007) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

10
- 0003565783
- online at
- Bertsekas, D. P., 2010. Approximate Dynamic Programming, online at http://web.mit.edu/dimitrib/www/dpchapter.html.
- (2010) Approximate Dynamic Programming
- Bertsekas, D.P.¹

11
- 41049095293
- New Algorithms of the Q-Learning Type
- Bhatnagar, S., and Babu, K. M., 2008. "New Algorithms of the Q-Learning Type," Automatica, Vol. 44, pp. 1111-1119.
- (2008) Automatica , vol.44 , pp. 1111-1119
- Bhatnagar, S.¹ Babu, K.M.²

12
- 0032075427
- Asynchronous Stochastic Approximations
- correction note in ibid., Vol. 38, pp. 662-663
- Borkar, V. S., 1998. "Asynchronous Stochastic Approximations," SIAM J. on Control and Optimization, Vol. 36, pp. 840-851; correction note in ibid., Vol. 38, pp. 662-663.
- (1998) SIAM J. on Control and Optimization , vol.36 , pp. 840-851
- Borkar, V.S.¹

13
- 58849087743
- Hindustan Book Agency, New Delhi, India
- Borkar, V. S., 2008. Stochastic Approximation: A Dynamical Systems Viewpoint, Hindustan Book Agency, New Delhi, India.
- (2008) Stochastic Approximation: A Dynamical Systems Viewpoint
- Borkar, V.S.¹

14
- 0036832950
- Technical Update: Least-Squares Temporal Difference Learning
- Boyan, J. A., 2002. "Technical Update: Least-Squares Temporal Difference Learning," Machine Learning, Vol. 49, pp. 1-15.
- (2002) Machine Learning , vol.49 , pp. 1-15
- Boyan, J.A.¹

15
- 0001771345
- Linear Least-Squares Algorithms for Temporal Difference Learning
- Bradtke, S. J., and Barto, A. G., 1996. "Linear Least-Squares Algorithms for Temporal Difference Learning," Machine Learning, Vol. 22, pp. 33-57.
- (1996) Machine Learning , vol.22 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

16
- 34547120053
- Springer, N.Y.
- Chang, H. S., Fu, M. C., Hu, J., Marcus, S. I., 2007. Simulation-Based Algorithms for Markov Decision Processes, Springer, N.Y.
- (2007) Simulation-Based Algorithms for Markov Decision Processes
- Chang, H.S.¹ Fu, M.C.² Hu, J.³ Marcus, S.I.⁴

17
- 84889784415
- Springer, N.Y.
- Cao, X. R., 2007. Stochastic Learning and Optimization: A Sensitivity-Based Approach, Springer, N.Y.
- (2007) Stochastic Learning and Optimization: A Sensitivity-Based Approach
- Cao, X.R.¹

18
- 33646435300
- A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
- Choi, D. S., and Van Roy, B., 2006. "A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, Vol. 16, pp. 207-239.
- (2006) Discrete Event Dynamic Systems , vol.16 , pp. 207-239
- Choi, D.S.¹ Van Roy, B.²

19
- 84880694195
- Stable Function Approximation in Dynamic Programming
- Gordon, G. J., 1995. "Stable Function Approximation in Dynamic Programming," Proc. ICML.
- (1995) Proc. ICML
- Gordon, G.J.¹

20
- 84888630832
- Springer-Verlag, N.Y.
- Gosavi, A., 2003. Simulation-Based Optimization Parametric Optimization Techniques and Reinforcement Learning, Springer-Verlag, N.Y.
- (2003) Simulation-Based Optimization Parametric Optimization Techniques and Reinforcement Learning
- Gosavi, A.¹

21
- 0000439891
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- Jaakkola, T., Jordan, M. I., and Singh, S. P., 1994. "On the Convergence of Stochastic Iterative Dynamic Programming Algorithms," Neural Computation, Vol. 6, pp. 1185-1201.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

22
- 0000624333
- Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
- Jaakkola, T., Singh, S. P., and Jordan, M. I., 1995. "Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems," Proc. NIPS.
- (1995) Proc. NIPS
- Jaakkola, T.¹ Singh, S.P.² Jordan, M.I.³

23
- 17444414191
- Basis Function Adaptation in Temporal Difference Reinforcement Learning
- Menache, I., Mannor, S., and Shimkin, N., 2005. "Basis Function Adaptation in Temporal Difference Reinforcement Learning," Ann. Oper. Res., Vol. 134, pp. 215-238.
- (2005) Ann. Oper. Res. , vol.134 , pp. 215-238
- Menache, I.¹ Mannor, S.² Shimkin, N.³

24
- 79951481923
- Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
- Maei, H. R., Szepesvari, C., Bhatnagar, S., Silver, D., Precup, D., and Sutton, R. S., 2009. "Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation," Proc. NIPS.
- (2009) Proc. NIPS
- Maei, H.R.¹ Szepesvari, C.² Bhatnagar, S.³ Silver, D.⁴ Precup, D.⁵ Sutton, R.S.⁶

25
- 84925067999
- Cambridge University Press, N.Y.
- Meyn, S., 2007. Control Techniques for Complex Networks, Cambridge University Press, N.Y.
- (2007) Control Techniques for Complex Networks
- Meyn, S.¹

26
- 47349092417
- Wiley, N.Y.
- Powell, W. B., 2007. Approximate Dynamic Programming: Solving the Curses of Dimensionality, Wiley, N.Y.
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality
- Powell, W.B.¹

27
- 85102627959
- Wiley, N.Y.
- Puterman, M. L., 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, N.Y.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

28
- 71149099079
- Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation
- Sutton, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvari, C., and Wiewiora, E., 2009. "Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation," Proc. ICML.
- (2009) Proc. ICML
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvari, C.⁶ Wiewiora, E.⁷

29
- 85048441545
- A Convergent O(n) Algorithm for Off-Policy Temporal-Difference Learning with Linear Function Approximation
- Sutton, R. S., Szepesvari, C., and Maei, H. R., 2008. "A Convergent O(n) Algorithm for Off-Policy Temporal-Difference Learning with Linear Function Approximation," Proc. NIPS.
- (2008) Proc. NIPS
- Sutton, R.S.¹ Szepesvari, C.² Maei, H.R.³

30
- 0004007508
- MIT Press, Cambridge, MA
- Sutton, R. S., and Barto, A. G., 1998. Reinforcement Learning, MIT Press, Cambridge, MA.
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

31
- 33847202724
- Learning to Predict by the Methods of Temporal Differences
- Sutton, R. S., 1988. "Learning to Predict by the Methods of Temporal Differences," Machine Learning, Vol. 3, pp. 9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

32
- 0022783899
- Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms
- Tsitsiklis, J. N., Bertsekas, D. P., and Athans, M., 1986. "Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms," IEEE Trans. on Aut. Control, Vol. AC-31, pp. 803-812.
- (1986) IEEE Trans. on Aut. Control , vol.AC-31 , pp. 803-812
- Tsitsiklis, J.N.¹ Bertsekas, D.P.² Athans, M.³

33
- 0029752470
- Feature-Based Methods for Large-Scale Dynamic Programming
- Tsitsiklis, J. N., and Van Roy, B., 1996. "Feature-Based Methods for Large-Scale Dynamic Programming," Machine Learning, Vol. 22, pp. 59-94.
- (1996) Machine Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

34
- 0033351917
- Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing Financial Derivatives
- Tsitsiklis, J. N., and Van Roy, B., 1999. "Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing Financial Derivatives," IEEE Trans. on Aut. Control, Vol. 44, pp. 1840-1851.
- (1999) IEEE Trans. on Aut. Control , vol.44 , pp. 1840-1851
- Tsitsiklis, J.N.¹ Van Roy, B.²

35
- 0028497630
- Asynchronous Stochastic Approximation and Q-Learning
- Tsitsiklis, J. N., 1994. "Asynchronous Stochastic Approximation and Q-Learning," Machine Learning, Vol. 16, pp. 185-202.
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

36
- 0042466434
- On the Convergence of Optimistic Policy Iteration
- Tsitsiklis, J. N., 2002. "On the Convergence of Optimistic Policy Iteration," J. of Machine Learning Research, Vol. 3, pp. 59-72.
- (2002) J. of Machine Learning Research , vol.3 , pp. 59-72
- Tsitsiklis, J.N.¹

37
- 79953139271
- On Regression-Based Stopping Times
- to appear
- Van Roy, B., 2009. "On Regression-Based Stopping Times," Discrete Event Dynamic Systems, to appear.
- (2009) Discrete Event Dynamic Systems
- Van Roy, B.¹

38
- 0004049893
- Ph.D. Thesis, Cambridge Univ., England
- Watkins, C. J. C. H., Learning from Delayed Rewards, Ph.D. Thesis, Cambridge Univ., England.
- Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

39
- 0039967456
- Report NU-CCS-93-11, College of Computer Science, Northeastern University, Boston, MA
- Williams, R. J., and Baird, L. C., 1993. "Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Critic Learning Systems," Report NU-CCS-93-11, College of Computer Science, Northeastern University, Boston, MA.
- (1993) Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Critic Learning Systems
- Williams, R.J.¹ Baird, L.C.²

40
- 58849124361
- A Least Squares Q-Learning Algorithm for Optimal Stopping Problems
- MIT
- Yu, H., and Bertsekas, D. P., 2007. "A Least Squares Q-Learning Algorithm for Optimal Stopping Problems," Lab. for Information and Decision Systems Report 2731, MIT;
- (2007) Lab. for Information and Decision Systems Report 2731
- Yu, H.¹ Bertsekas, D.P.²

41
- 79953157483
- also in
- also in Proc. European Control Conference 2007, Kos, Greece.
- Proc. European Control Conference 2007, Kos, Greece

42
- 67650458822
- Basis Function Adaptation Methods for Cost Approximation in MDP
- Yu, H., and Bertsekas, D. P., 2009. "Basis Function Adaptation Methods for Cost Approximation in MDP," Proc. IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, Tenn.
- (2009) Proc. IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, Tenn
- Yu, H.¹ Bertsekas, D.P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.