SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Conference on Decision and Control

Volumn , Issue , 2009, Pages 3598-3605

Q-learning and Pontryagin's minimum principle

(2) Mehta, Prashant a Meyn, Sean a

a United States of America (United States)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; APPROXIMATION THEORY; CONTINUOUS TIME SYSTEMS; DISTRIBUTED PARAMETER CONTROL SYSTEMS; HAMILTONIANS; LEARNING SYSTEMS; MARKOV CHAINS; MULTI AGENT SYSTEMS; OPTIMIZATION; REINFORCEMENT LEARNING; STOCHASTIC CONTROL SYSTEMS; STOCHASTIC SYSTEMS;

CONSISTENT ALGORITHM; CONTINUOUS TIME MODELS; CONTROLLED MARKOV CHAINS; OPTIMAL APPROXIMATION; OPTIMALITY EQUATION; PONTRYAGIN'S MINIMUM PRINCIPLES; Q-LEARNING ALGORITHMS; STOCHASTIC APPROXIMATIONS;

LEARNING ALGORITHMS;

EID: 77950806766 PISSN: 07431546 EISSN: 25762370 Source Type: Conference Proceeding
DOI: 10.1109/CDC.2009.5399753 Document Type: Conference Paper

Times cited : (146)

References (21)

1
- 0003487482
- Atena Scientific, Cambridge, Mass
- D.P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Atena Scientific, Cambridge, Mass, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

2
- 0033876515
- The O.D.E. method for convergence of stochastic approximation and reinforcement learning
- also presented at the IEEE CDC, December, 1998
- V. S. Borkar and S. P. Meyn. The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447-469, 2000. (also presented at the IEEE CDC, December, 1998).
- (2000) SIAM J. Control Optim. , vol.38 , Issue.2 , pp. 447-469
- Borkar, V.S.¹ Meyn, S.P.²

3
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- S. J. Bradtke and A. G. Barto. Linear least-squares algorithms for temporal difference learning. Mach. Learn., 22(1-3):33-57, 1996.
- (1996) Mach. Learn. , vol.22 , Issue.1-3 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

4
- 0028584964
- Adaptive linear quadratic control using policy iteration
- S.J. Bradtke, B.E. Ydstie, and A.G. Barto. Adaptive linear quadratic control using policy iteration. In Proceedings of the 1994 American Control Conference, volume 3, pages 3475-3479, 1994.
- (1994) Proceedings of the 1994 American Control Conference , vol.3 , pp. 3475-3479
- Bradtke, S.J.¹ Ydstie, B.E.² Barto, A.G.³

5
- 33748784614
- An approximate dynamic programming approach to decentralized control of stochastic systems
- Springer
- R. Cogill, M. Rotkowitz, B. Van Roy, and S Lall. An approximate dynamic programming approach to decentralized control of stochastic systems. In Control of Uncertain Systems: Modelling, Approximation, and Design, pages 243-256. Springer, 2006.
- (2006) Control of Uncertain Systems: Modelling, Approximation, and Design , pp. 243-256
- Cogill, R.¹ Rotkowitz, M.² Van Roy, B.³ Lall, S.⁴

6
- 33748414214
- A cost-shaping linear program for average-cost approximate dynamic programming with performance guarantees
- D. P. Pucci de Farias and B. Van Roy. A cost-shaping linear program for average-cost approximate dynamic programming with performance guarantees. Math. Oper. Res., 31(3):597-620, 2006.
- (2006) Math. Oper. Res. , vol.31 , Issue.3 , pp. 597-620
- Pucci De Farias, D.P.¹ Van Roy, B.²

7
- 77950793347
- V. F. Farias, D. Saure, and G. Y. Weintraub. The linear programming approach to solving large scale dynamic stochastic games (working paper). http://www.stanford.edu/ bvr/publ-all.html, 2008.
- (2008) The Linear Programming Approach to Solving Large Scale Dynamic Stochastic Games (Working Paper)
- Farias, V.F.¹ Saure, D.² Weintraub, G.Y.³

8
- 77950828770
- To appear in a volume on stochastic programming in honor of George Dantzig, edited by Gerd Infanger. Preprint available at
- J. Han and B. Van Roy. Control of diffusions via linear programming. To appear in a volume on stochastic programming in honor of George Dantzig, edited by Gerd Infanger. Preprint available at http://www.stanford.edu/~bvr/, 2009.
- (2009) Control of Diffusions Via Linear Programming
- Han, J.¹ Van Roy, B.²

9
- 34648831837
- Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-Nash equilibria
- M. Huang, P. E. Caines, and R. P. Malhame. Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-Nash equilibria. IEEE Trans. Automat. Control, 52(9):1560-1571, 2007.
- (2007) IEEE Trans. Automat. Control , vol.52 , Issue.9 , pp. 1560-1571
- Huang, M.¹ Caines, P.E.² Malhame, R.P.³

10
- 0004291983
- American Elsevier Pub. Co., New York, NY
- D. H. Jacobson and D. Q. Mayne. Differential dynamic programming. American Elsevier Pub. Co., New York, NY, 1970.
- (1970) Differential Dynamic Programming
- Jacobson, D.H.¹ Mayne, D.Q.²

11
- 56449091120
- An analysis of reinforcement learning with function approximation
- F. S. Melo, S. Meyn, and M. Isabel Ribeiro. An analysis of reinforcement learning with function approximation. In Proceedings of ICML, pages 664-671, 2008.
- (2008) Proceedings of ICML , pp. 664-671
- Melo, F.S.¹ Meyn, S.² Isabel Ribeiro, M.³

12
- 84925067999
- Cambridge University Press, Cambridge
- S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press, Cambridge, 2007.
- (2007) Control Techniques for Complex Networks
- Meyn, S.P.¹

13
- 62949191986
- Shannon meets Bellman: Feature based Markovian models for detection and optimization
- S. P. Meyn and G. Mathew. Shannon meets Bellman: Feature based Markovian models for detection and optimization. In Proc. 47th IEEE CDC, pages 5558-5564, 2008.
- (2008) Proc. 47th IEEE CDC , pp. 5558-5564
- Meyn, S.P.¹ Mathew, G.²

14
- 70350302258
- Cambridge University Press, Cambridge, second edition Published in the Cambridge Mathematical Library. 1993 edition online
- S. P. Meyn and R. L. Tweedie. Markov chains and stochastic stability. Cambridge University Press, Cambridge, second edition, 2009. Published in the Cambridge Mathematical Library. 1993 edition online: http://black.csl.uiuc.edu/ ~meyn/pages/book.html.
- (2009) Markov Chains and Stochastic Stability
- Meyn, S.P.¹ Tweedie, R.L.²

15
- 77950833925
- Preprint available at
- C.C. Moallemi, S. Kumar, and B. Van Roy. Approximate and data-driven dynamic programming for queueing networks. Preprint available at http://moallemi.com/ciamac/research-interests.php, 2008.
- (2008) Approximate and Data-driven Dynamic Programming for Queueing Networks
- Moallemi, C.C.¹ Kumar, S.² Van Roy, B.³

16
- 34547095501
- Least squares solutions of the HJB equation with neural network value-function approximators
- Y. Tassa and T. Erez. Least squares solutions of the HJB equation with neural network value-function approximators. IEEE Transactions on Neural Networks, 18(4):1031-1041, 2007.
- (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 1031-1041
- Tassa, Y.¹ Erez, T.²

17
- 0031143730
- An analysis of temporal-difference learning with function approximation
- J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Control, 42(5):674-690, 1997.
- (1997) IEEE Trans. Automat. Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

18
- 34548721141
- Continuous-time ADP for linear systems with partially unknown dynamics
- April
- D. Vrabie, M. Abu-Khalaf, F.L. Lewis, and Y. Wang. Continuous-time ADP for linear systems with partially unknown dynamics. In Proc. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages 247-253, April 2007.
- (2007) Proc. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning , pp. 247-253
- Vrabie, D.¹ Abu-Khalaf, M.² Lewis, F.L.³ Wang, Y.⁴

19
- 58349110975
- Adaptive optimal control for continuous-time linear systems based on policy iteration
- D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F.L. Lewis. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 45(2):477-484, 2009.
- (2009) Automatica , vol.45 , Issue.2 , pp. 477-484
- Vrabie, D.¹ Pastravanu, O.² Abu-Khalaf, M.³ Lewis, F.L.⁴

20
- 34249833101
- Q-learning
- C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3-4):279-292, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

21
- 84927748655
- Q-learning algorithms for optimal stopping based on least squares
- H. Yu and D. P. Bertsekas. Q-learning algorithms for optimal stopping based on least squares. In Proc. European Control Conference (ECC), July 2007.
- Proc. European Control Conference (ECC), July 2007
- Yu, H.¹ Bertsekas, D.P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.