-
1
-
-
0004030716
-
-
preprint LIDS-P-2438, Lab for Info. and Decision Sciences, MIT, Cambridge, MA
-
J. ABOUNADI, D. P. BERTSEKAS, AND V. S. BORKAR, Stochastic Approximation for Non-expansive Maps: Applications to Q-Learning Algorithms, preprint LIDS-P-2438, Lab for Info. and Decision Sciences, MIT, Cambridge, MA, 1988.
-
(1988)
Stochastic Approximation for Non-expansive Maps: Applications to Q-Learning Algorithms
-
-
Abounadi, J.1
Bertsekas, D.P.2
Borkar, V.S.3
-
2
-
-
0003874616
-
-
preprint LIDS-P-2434, Lab for Info. and Decision Sciences, MIT, Cambridge, MA
-
J. ABOUNADI, D. P. BERTSEKAS, AND V. S. BORKAR, Learning Algorithms for Markov Decision Processes with Average Cost, preprint LIDS-P-2434, Lab for Info. and Decision Sciences, MIT, Cambridge, MA, 1988.
-
(1988)
Learning Algorithms for Markov Decision Processes with Average Cost
-
-
Abounadi, J.1
Bertsekas, D.P.2
Borkar, V.S.3
-
3
-
-
0020970738
-
Neuron-like elements that can solve difficult learning control problems
-
A. BARTO, R. SUTTON, AND C. ANDERSON, Neuron-like elements that can solve difficult learning control problems, IEEE Trans. Systems, Man and Cybernetics, 13 (1983), pp. 835-846.
-
(1983)
IEEE Trans. Systems, Man and Cybernetics
, vol.13
, pp. 835-846
-
-
Barto, A.1
Sutton, R.2
Anderson, C.3
-
4
-
-
0032022988
-
A new value iteration method for average cost dynamic programming problem
-
D. P. BERTSEKAS, A new value iteration method for average cost dynamic programming problem, SIAM J. Control Optim., 36 (1998), pp. 742-759.
-
(1998)
SIAM J. Control Optim.
, vol.36
, pp. 742-759
-
-
Bertsekas, D.P.1
-
5
-
-
0003161907
-
An analysis of stochastic shortest path problem
-
D. P. BERTSEKAS AND J. N. TSITSIKLIS, An analysis of stochastic shortest path problem, Math. Oper. Res., 16 (1991), pp. 580-595.
-
(1991)
Math. Oper. Res.
, vol.16
, pp. 580-595
-
-
Bertsekas, D.P.1
Tsitsiklis, J.N.2
-
7
-
-
0009636221
-
Recursive self-tuning control of finite Markov chains
-
V. S. BORKAR, Recursive self-tuning control of finite Markov chains, Applicationes Mathematicae, 24 (1996), pp. 169-188.
-
(1996)
Applicationes Mathematicae
, vol.24
, pp. 169-188
-
-
Borkar, V.S.1
-
8
-
-
0031076413
-
Stochastic approximation with two time scales
-
V. S. BORKAR, Stochastic approximation with two time scales, Systems Control Lett., 29 (1996), pp. 291-294.
-
(1996)
Systems Control Lett.
, vol.29
, pp. 291-294
-
-
Borkar, V.S.1
-
9
-
-
0032075427
-
Asynchronous stochastic approximations
-
V. S. BORKAR, Asynchronous stochastic approximations, SIAM J. Control Optim., 36 (1998), pp. 840-851.
-
(1998)
SIAM J. Control Optim.
, vol.36
, pp. 840-851
-
-
Borkar, V.S.1
-
10
-
-
0031198797
-
Actor-critic algorithm as multi-time scale stochastic approximation
-
V. S. BORKAR AND V. R. KONDA, Actor-critic algorithm as multi-time scale stochastic approximation, Sādhanā, 22 (1997), pp. 525-543.
-
(1997)
Sādhanā
, vol.22
, pp. 525-543
-
-
Borkar, V.S.1
Konda, V.R.2
-
11
-
-
85037961073
-
Stability and convergence of stochastic approximation using the ODE method
-
to appear
-
V. S. BORKAR AND S. P. MEYN, Stability and convergence of stochastic approximation using the ODE method, SIAM J. Control Optim., to appear.
-
SIAM J. Control Optim.
-
-
Borkar, V.S.1
Meyn, S.P.2
-
13
-
-
0024909476
-
Convergent activation dynamics in continuous time networks
-
M. W. HIRSCH, Convergent activation dynamics in continuous time networks, Neural Networks, 2 (1989), pp. 331-349.
-
(1989)
Neural Networks
, vol.2
, pp. 331-349
-
-
Hirsch, M.W.1
-
14
-
-
51249164115
-
A tutorial survey of reinforcement learning
-
S. S. KEERTHI AND B. RAVINDRAN, A tutorial survey of reinforcement learning, Sādhanā, 19 (1994), pp. 851-889.
-
(1994)
Sādhanā
, vol.19
, pp. 851-889
-
-
Keerthi, S.S.1
Ravindran, B.2
-
15
-
-
0021501125
-
Applications of singular perturbation techniques to control problems
-
P. V. KOKOTOVIC, Applications of singular perturbation techniques to control problems, SIAM Rev., 26 (1984), pp. 501-550.
-
(1984)
SIAM Rev.
, vol.26
, pp. 501-550
-
-
Kokotovic, P.V.1
-
16
-
-
85037958376
-
-
M.S. thesis, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India
-
V. R. KONDA, Learning Algorithms for Markov Decision Processes, M.S. thesis, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 1997.
-
(1997)
Learning Algorithms for Markov Decision Processes
-
-
Konda, V.R.1
-
19
-
-
0001000786
-
Non-convergence to unstable points in urn models and stochastic approximations
-
R. PEMANTLE, Non-convergence to unstable points in urn models and stochastic approximations, Ann. Probab., 18 (1990), pp. 698-712.
-
(1990)
Ann. Probab.
, vol.18
, pp. 698-712
-
-
Pemantle, R.1
-
20
-
-
84904774796
-
New method of stochastic approximation type
-
B. T. POLYAK, New method of stochastic approximation type, Automat. Remote Control, 51 (1990), pp. 937-946.
-
(1990)
Automat. Remote Control
, vol.51
, pp. 937-946
-
-
Polyak, B.T.1
-
22
-
-
0031235784
-
A reinforcement learning neural network for adaptive control of Markov chains
-
G. SANTHARAM AND P. S. SASTRY, A reinforcement learning neural network for adaptive control of Markov chains, IEEE Trans. Systems, Man and Cybernetics, 27 (1997), pp. 588-600.
-
(1997)
IEEE Trans. Systems, Man and Cybernetics
, vol.27
, pp. 588-600
-
-
Santharam, G.1
Sastry, P.S.2
-
23
-
-
0001821168
-
Estimation and control in discounted dynamic programming
-
M. SCHÄL, Estimation and control in discounted dynamic programming, Stochastics, 20 (1987), pp. 51-71.
-
(1987)
Stochastics
, vol.20
, pp. 51-71
-
-
Schäl, M.1
-
24
-
-
0028497630
-
Asynchronous stochastic approximation and Q-learning
-
J. N. TSITSIKLIS, Asynchronous stochastic approximation and Q-learning, Mach. Learning, 16 (1994), pp. 185-202.
-
(1994)
Mach. Learning
, vol.16
, pp. 185-202
-
-
Tsitsiklis, J.N.1
-
25
-
-
0029752470
-
Feature-based methods for large scale dynamic programming
-
J. N. TSITSIKLIS AND B. VAN ROY, Feature-based methods for large scale dynamic programming, Mach. Learning, 22 (1996), pp. 59-94.
-
(1996)
Mach. Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
27
-
-
0342455390
-
A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming
-
New Haven. CT
-
R. WILLIAMS AND L. BAIRD, A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming, in Sixth Yale Workshop on Adaptive and Learning Systems, New Haven. CT, 1990, pp. 96-101.
-
(1990)
Sixth Yale Workshop on Adaptive and Learning Systems
, pp. 96-101
-
-
Williams, R.1
Baird, L.2
-
28
-
-
84968514083
-
Smoothing derivatives of functions and applications
-
F. W. WILSON, Smoothing derivatives of functions and applications, Trans. Amer. Math. Soc., 139 (1969), pp. 413-428.
-
(1969)
Trans. Amer. Math. Soc.
, vol.139
, pp. 413-428
-
-
Wilson, F.W.1
|