메뉴 건너뛰기




Volumn , Issue , 2010, Pages 1409-1416

Q-learning and enhanced policy iteration in discounted dynamic programming

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; DYNAMIC PROGRAMMING; ITERATIVE METHODS; LINEAR SYSTEMS; Q FACTOR MEASUREMENT; REINFORCEMENT LEARNING; STOCHASTIC SYSTEMS; TABLE LOOKUP;

EID: 79953158573     PISSN: 07431546     EISSN: 25762370     Source Type: Conference Proceeding    
DOI: 10.1109/CDC.2010.5717930     Document Type: Conference Paper
Times cited : (7)

References (42)
  • 1
    • 0037225359 scopus 로고    scopus 로고
    • Stochastic Approximation for Non-Expansive Maps: Application to QLearning Algorithms
    • Abounadi, J., Bertsekas, D. P., and Borkar, V., 2002. "Stochastic Approximation for Non-Expansive Maps: Application to QLearning Algorithms," SIAM J. on Control and Optimization, Vol. 41, pp. 1-22.
    • (2002) SIAM J. on Control and Optimization , vol.41 , pp. 1-22
    • Abounadi, J.1    Bertsekas, D.P.2    Borkar, V.3
  • 2
    • 4243567726 scopus 로고    scopus 로고
    • Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming
    • MIT, Cambridge, MA
    • Bertsekas, D. P., and Ioffe, S., 1996. "Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming," Lab. for Info. and Decision Systems Report LIDSP-2349, MIT, Cambridge, MA.
    • (1996) Lab. for Info. and Decision Systems Report LIDSP-2349
    • Bertsekas, D.P.1    Ioffe, S.2
  • 6
    • 0020138998 scopus 로고
    • Distributed Dynamic Programming
    • Bertsekas, D. P., 1982. "Distributed Dynamic Programming," IEEE Trans. Automatic Control, Vol. AC-27, pp. 610-616.
    • (1982) IEEE Trans. Automatic Control , vol.AC-27 , pp. 610-616
    • Bertsekas, D.P.1
  • 7
    • 0020822225 scopus 로고
    • Asynchronous Distributed Computation of Fixed Points
    • Bertsekas, D. P., 1983. "Asynchronous Distributed Computation of Fixed Points," Math. Programming, Vol. 27, pp. 107-120.
    • (1983) Math. Programming , vol.27 , pp. 107-120
    • Bertsekas, D.P.1
  • 11
    • 41049095293 scopus 로고    scopus 로고
    • New Algorithms of the Q-Learning Type
    • Bhatnagar, S., and Babu, K. M., 2008. "New Algorithms of the Q-Learning Type," Automatica, Vol. 44, pp. 1111-1119.
    • (2008) Automatica , vol.44 , pp. 1111-1119
    • Bhatnagar, S.1    Babu, K.M.2
  • 12
    • 0032075427 scopus 로고    scopus 로고
    • Asynchronous Stochastic Approximations
    • correction note in ibid., Vol. 38, pp. 662-663
    • Borkar, V. S., 1998. "Asynchronous Stochastic Approximations," SIAM J. on Control and Optimization, Vol. 36, pp. 840-851; correction note in ibid., Vol. 38, pp. 662-663.
    • (1998) SIAM J. on Control and Optimization , vol.36 , pp. 840-851
    • Borkar, V.S.1
  • 14
    • 0036832950 scopus 로고    scopus 로고
    • Technical Update: Least-Squares Temporal Difference Learning
    • Boyan, J. A., 2002. "Technical Update: Least-Squares Temporal Difference Learning," Machine Learning, Vol. 49, pp. 1-15.
    • (2002) Machine Learning , vol.49 , pp. 1-15
    • Boyan, J.A.1
  • 15
    • 0001771345 scopus 로고    scopus 로고
    • Linear Least-Squares Algorithms for Temporal Difference Learning
    • Bradtke, S. J., and Barto, A. G., 1996. "Linear Least-Squares Algorithms for Temporal Difference Learning," Machine Learning, Vol. 22, pp. 33-57.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 18
    • 33646435300 scopus 로고    scopus 로고
    • A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
    • Choi, D. S., and Van Roy, B., 2006. "A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, Vol. 16, pp. 207-239.
    • (2006) Discrete Event Dynamic Systems , vol.16 , pp. 207-239
    • Choi, D.S.1    Van Roy, B.2
  • 19
    • 84880694195 scopus 로고
    • Stable Function Approximation in Dynamic Programming
    • Gordon, G. J., 1995. "Stable Function Approximation in Dynamic Programming," Proc. ICML.
    • (1995) Proc. ICML
    • Gordon, G.J.1
  • 21
    • 0000439891 scopus 로고
    • On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
    • Jaakkola, T., Jordan, M. I., and Singh, S. P., 1994. "On the Convergence of Stochastic Iterative Dynamic Programming Algorithms," Neural Computation, Vol. 6, pp. 1185-1201.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 22
    • 0000624333 scopus 로고
    • Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
    • Jaakkola, T., Singh, S. P., and Jordan, M. I., 1995. "Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems," Proc. NIPS.
    • (1995) Proc. NIPS
    • Jaakkola, T.1    Singh, S.P.2    Jordan, M.I.3
  • 23
    • 17444414191 scopus 로고    scopus 로고
    • Basis Function Adaptation in Temporal Difference Reinforcement Learning
    • Menache, I., Mannor, S., and Shimkin, N., 2005. "Basis Function Adaptation in Temporal Difference Reinforcement Learning," Ann. Oper. Res., Vol. 134, pp. 215-238.
    • (2005) Ann. Oper. Res. , vol.134 , pp. 215-238
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 29
    • 85048441545 scopus 로고    scopus 로고
    • A Convergent O(n) Algorithm for Off-Policy Temporal-Difference Learning with Linear Function Approximation
    • Sutton, R. S., Szepesvari, C., and Maei, H. R., 2008. "A Convergent O(n) Algorithm for Off-Policy Temporal-Difference Learning with Linear Function Approximation," Proc. NIPS.
    • (2008) Proc. NIPS
    • Sutton, R.S.1    Szepesvari, C.2    Maei, H.R.3
  • 31
    • 33847202724 scopus 로고
    • Learning to Predict by the Methods of Temporal Differences
    • Sutton, R. S., 1988. "Learning to Predict by the Methods of Temporal Differences," Machine Learning, Vol. 3, pp. 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 32
    • 0022783899 scopus 로고
    • Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms
    • Tsitsiklis, J. N., Bertsekas, D. P., and Athans, M., 1986. "Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms," IEEE Trans. on Aut. Control, Vol. AC-31, pp. 803-812.
    • (1986) IEEE Trans. on Aut. Control , vol.AC-31 , pp. 803-812
    • Tsitsiklis, J.N.1    Bertsekas, D.P.2    Athans, M.3
  • 33
    • 0029752470 scopus 로고    scopus 로고
    • Feature-Based Methods for Large-Scale Dynamic Programming
    • Tsitsiklis, J. N., and Van Roy, B., 1996. "Feature-Based Methods for Large-Scale Dynamic Programming," Machine Learning, Vol. 22, pp. 59-94.
    • (1996) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 34
    • 0033351917 scopus 로고    scopus 로고
    • Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing Financial Derivatives
    • Tsitsiklis, J. N., and Van Roy, B., 1999. "Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing Financial Derivatives," IEEE Trans. on Aut. Control, Vol. 44, pp. 1840-1851.
    • (1999) IEEE Trans. on Aut. Control , vol.44 , pp. 1840-1851
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 35
    • 0028497630 scopus 로고
    • Asynchronous Stochastic Approximation and Q-Learning
    • Tsitsiklis, J. N., 1994. "Asynchronous Stochastic Approximation and Q-Learning," Machine Learning, Vol. 16, pp. 185-202.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 36
    • 0042466434 scopus 로고    scopus 로고
    • On the Convergence of Optimistic Policy Iteration
    • Tsitsiklis, J. N., 2002. "On the Convergence of Optimistic Policy Iteration," J. of Machine Learning Research, Vol. 3, pp. 59-72.
    • (2002) J. of Machine Learning Research , vol.3 , pp. 59-72
    • Tsitsiklis, J.N.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.