-
2
-
-
0033876515
-
The O.D.E. method for convergence of stochastic approximation and reinforcement learning
-
also presented at the IEEE CDC, December, 1998
-
V. S. Borkar and S. P. Meyn. The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447-469, 2000. (also presented at the IEEE CDC, December, 1998).
-
(2000)
SIAM J. Control Optim.
, vol.38
, Issue.2
, pp. 447-469
-
-
Borkar, V.S.1
Meyn, S.P.2
-
3
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
S. J. Bradtke and A. G. Barto. Linear least-squares algorithms for temporal difference learning. Mach. Learn., 22(1-3):33-57, 1996.
-
(1996)
Mach. Learn.
, vol.22
, Issue.1-3
, pp. 33-57
-
-
Bradtke, S.J.1
Barto, A.G.2
-
4
-
-
0028584964
-
Adaptive linear quadratic control using policy iteration
-
S.J. Bradtke, B.E. Ydstie, and A.G. Barto. Adaptive linear quadratic control using policy iteration. In Proceedings of the 1994 American Control Conference, volume 3, pages 3475-3479, 1994.
-
(1994)
Proceedings of the 1994 American Control Conference
, vol.3
, pp. 3475-3479
-
-
Bradtke, S.J.1
Ydstie, B.E.2
Barto, A.G.3
-
5
-
-
33748784614
-
An approximate dynamic programming approach to decentralized control of stochastic systems
-
Springer
-
R. Cogill, M. Rotkowitz, B. Van Roy, and S Lall. An approximate dynamic programming approach to decentralized control of stochastic systems. In Control of Uncertain Systems: Modelling, Approximation, and Design, pages 243-256. Springer, 2006.
-
(2006)
Control of Uncertain Systems: Modelling, Approximation, and Design
, pp. 243-256
-
-
Cogill, R.1
Rotkowitz, M.2
Van Roy, B.3
Lall, S.4
-
6
-
-
33748414214
-
A cost-shaping linear program for average-cost approximate dynamic programming with performance guarantees
-
D. P. Pucci de Farias and B. Van Roy. A cost-shaping linear program for average-cost approximate dynamic programming with performance guarantees. Math. Oper. Res., 31(3):597-620, 2006.
-
(2006)
Math. Oper. Res.
, vol.31
, Issue.3
, pp. 597-620
-
-
Pucci De Farias, D.P.1
Van Roy, B.2
-
8
-
-
77950828770
-
-
To appear in a volume on stochastic programming in honor of George Dantzig, edited by Gerd Infanger. Preprint available at
-
J. Han and B. Van Roy. Control of diffusions via linear programming. To appear in a volume on stochastic programming in honor of George Dantzig, edited by Gerd Infanger. Preprint available at http://www.stanford.edu/~bvr/, 2009.
-
(2009)
Control of Diffusions Via Linear Programming
-
-
Han, J.1
Van Roy, B.2
-
9
-
-
34648831837
-
Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-Nash equilibria
-
M. Huang, P. E. Caines, and R. P. Malhame. Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-Nash equilibria. IEEE Trans. Automat. Control, 52(9):1560-1571, 2007.
-
(2007)
IEEE Trans. Automat. Control
, vol.52
, Issue.9
, pp. 1560-1571
-
-
Huang, M.1
Caines, P.E.2
Malhame, R.P.3
-
11
-
-
56449091120
-
An analysis of reinforcement learning with function approximation
-
F. S. Melo, S. Meyn, and M. Isabel Ribeiro. An analysis of reinforcement learning with function approximation. In Proceedings of ICML, pages 664-671, 2008.
-
(2008)
Proceedings of ICML
, pp. 664-671
-
-
Melo, F.S.1
Meyn, S.2
Isabel Ribeiro, M.3
-
13
-
-
62949191986
-
Shannon meets Bellman: Feature based Markovian models for detection and optimization
-
S. P. Meyn and G. Mathew. Shannon meets Bellman: Feature based Markovian models for detection and optimization. In Proc. 47th IEEE CDC, pages 5558-5564, 2008.
-
(2008)
Proc. 47th IEEE CDC
, pp. 5558-5564
-
-
Meyn, S.P.1
Mathew, G.2
-
14
-
-
70350302258
-
-
Cambridge University Press, Cambridge, second edition Published in the Cambridge Mathematical Library. 1993 edition online
-
S. P. Meyn and R. L. Tweedie. Markov chains and stochastic stability. Cambridge University Press, Cambridge, second edition, 2009. Published in the Cambridge Mathematical Library. 1993 edition online: http://black.csl.uiuc.edu/ ~meyn/pages/book.html.
-
(2009)
Markov Chains and Stochastic Stability
-
-
Meyn, S.P.1
Tweedie, R.L.2
-
16
-
-
34547095501
-
Least squares solutions of the HJB equation with neural network value-function approximators
-
Y. Tassa and T. Erez. Least squares solutions of the HJB equation with neural network value-function approximators. IEEE Transactions on Neural Networks, 18(4):1031-1041, 2007.
-
(2007)
IEEE Transactions on Neural Networks
, vol.18
, Issue.4
, pp. 1031-1041
-
-
Tassa, Y.1
Erez, T.2
-
17
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Control, 42(5):674-690, 1997.
-
(1997)
IEEE Trans. Automat. Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
18
-
-
34548721141
-
Continuous-time ADP for linear systems with partially unknown dynamics
-
April
-
D. Vrabie, M. Abu-Khalaf, F.L. Lewis, and Y. Wang. Continuous-time ADP for linear systems with partially unknown dynamics. In Proc. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages 247-253, April 2007.
-
(2007)
Proc. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
, pp. 247-253
-
-
Vrabie, D.1
Abu-Khalaf, M.2
Lewis, F.L.3
Wang, Y.4
-
19
-
-
58349110975
-
Adaptive optimal control for continuous-time linear systems based on policy iteration
-
D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F.L. Lewis. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 45(2):477-484, 2009.
-
(2009)
Automatica
, vol.45
, Issue.2
, pp. 477-484
-
-
Vrabie, D.1
Pastravanu, O.2
Abu-Khalaf, M.3
Lewis, F.L.4
|