메뉴 건너뛰기




Volumn , Issue , 2009, Pages 3598-3605

Q-learning and Pontryagin's minimum principle

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; APPROXIMATION THEORY; CONTINUOUS TIME SYSTEMS; DISTRIBUTED PARAMETER CONTROL SYSTEMS; HAMILTONIANS; LEARNING SYSTEMS; MARKOV CHAINS; MULTI AGENT SYSTEMS; OPTIMIZATION; REINFORCEMENT LEARNING; STOCHASTIC CONTROL SYSTEMS; STOCHASTIC SYSTEMS;

EID: 77950806766     PISSN: 07431546     EISSN: 25762370     Source Type: Conference Proceeding    
DOI: 10.1109/CDC.2009.5399753     Document Type: Conference Paper
Times cited : (143)

References (21)
  • 2
    • 0033876515 scopus 로고    scopus 로고
    • The O.D.E. method for convergence of stochastic approximation and reinforcement learning
    • also presented at the IEEE CDC, December, 1998
    • V. S. Borkar and S. P. Meyn. The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447-469, 2000. (also presented at the IEEE CDC, December, 1998).
    • (2000) SIAM J. Control Optim. , vol.38 , Issue.2 , pp. 447-469
    • Borkar, V.S.1    Meyn, S.P.2
  • 3
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. J. Bradtke and A. G. Barto. Linear least-squares algorithms for temporal difference learning. Mach. Learn., 22(1-3):33-57, 1996.
    • (1996) Mach. Learn. , vol.22 , Issue.1-3 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 6
    • 33748414214 scopus 로고    scopus 로고
    • A cost-shaping linear program for average-cost approximate dynamic programming with performance guarantees
    • D. P. Pucci de Farias and B. Van Roy. A cost-shaping linear program for average-cost approximate dynamic programming with performance guarantees. Math. Oper. Res., 31(3):597-620, 2006.
    • (2006) Math. Oper. Res. , vol.31 , Issue.3 , pp. 597-620
    • Pucci De Farias, D.P.1    Van Roy, B.2
  • 8
    • 77950828770 scopus 로고    scopus 로고
    • To appear in a volume on stochastic programming in honor of George Dantzig, edited by Gerd Infanger. Preprint available at
    • J. Han and B. Van Roy. Control of diffusions via linear programming. To appear in a volume on stochastic programming in honor of George Dantzig, edited by Gerd Infanger. Preprint available at http://www.stanford.edu/~bvr/, 2009.
    • (2009) Control of Diffusions Via Linear Programming
    • Han, J.1    Van Roy, B.2
  • 9
    • 34648831837 scopus 로고    scopus 로고
    • Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-Nash equilibria
    • M. Huang, P. E. Caines, and R. P. Malhame. Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-Nash equilibria. IEEE Trans. Automat. Control, 52(9):1560-1571, 2007.
    • (2007) IEEE Trans. Automat. Control , vol.52 , Issue.9 , pp. 1560-1571
    • Huang, M.1    Caines, P.E.2    Malhame, R.P.3
  • 11
    • 56449091120 scopus 로고    scopus 로고
    • An analysis of reinforcement learning with function approximation
    • F. S. Melo, S. Meyn, and M. Isabel Ribeiro. An analysis of reinforcement learning with function approximation. In Proceedings of ICML, pages 664-671, 2008.
    • (2008) Proceedings of ICML , pp. 664-671
    • Melo, F.S.1    Meyn, S.2    Isabel Ribeiro, M.3
  • 13
    • 62949191986 scopus 로고    scopus 로고
    • Shannon meets Bellman: Feature based Markovian models for detection and optimization
    • S. P. Meyn and G. Mathew. Shannon meets Bellman: Feature based Markovian models for detection and optimization. In Proc. 47th IEEE CDC, pages 5558-5564, 2008.
    • (2008) Proc. 47th IEEE CDC , pp. 5558-5564
    • Meyn, S.P.1    Mathew, G.2
  • 14
    • 70350302258 scopus 로고    scopus 로고
    • Cambridge University Press, Cambridge, second edition Published in the Cambridge Mathematical Library. 1993 edition online
    • S. P. Meyn and R. L. Tweedie. Markov chains and stochastic stability. Cambridge University Press, Cambridge, second edition, 2009. Published in the Cambridge Mathematical Library. 1993 edition online: http://black.csl.uiuc.edu/ ~meyn/pages/book.html.
    • (2009) Markov Chains and Stochastic Stability
    • Meyn, S.P.1    Tweedie, R.L.2
  • 16
    • 34547095501 scopus 로고    scopus 로고
    • Least squares solutions of the HJB equation with neural network value-function approximators
    • Y. Tassa and T. Erez. Least squares solutions of the HJB equation with neural network value-function approximators. IEEE Transactions on Neural Networks, 18(4):1031-1041, 2007.
    • (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 1031-1041
    • Tassa, Y.1    Erez, T.2
  • 17
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Control, 42(5):674-690, 1997.
    • (1997) IEEE Trans. Automat. Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 19
    • 58349110975 scopus 로고    scopus 로고
    • Adaptive optimal control for continuous-time linear systems based on policy iteration
    • D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F.L. Lewis. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 45(2):477-484, 2009.
    • (2009) Automatica , vol.45 , Issue.2 , pp. 477-484
    • Vrabie, D.1    Pastravanu, O.2    Abu-Khalaf, M.3    Lewis, F.L.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.