-
1
-
-
0000991156
-
A new approach to the limit theory of recurrent Markov chains
-
K. B. ATHREYA AND P. NEY, A new approach to the limit theory of recurrent Markov chains, Trans. Amer. Math. Soc., 245 (1978), pp. 493-501.
-
(1978)
Trans. Amer. Math. Soc.
, vol.245
, pp. 493-501
-
-
Athreya, K.B.1
Ney, P.2
-
2
-
-
0020970738
-
Neuron-like elements that can solve difficult learning control problems
-
A. BARTO, R. SUTTON, AND C. ANDERSON, Neuron-like elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man and Cybernetics, 13 (1983), pp. 835-846.
-
(1983)
IEEE Transactions on Systems, Man and Cybernetics
, vol.13
, pp. 835-846
-
-
Barto, A.1
Sutton, R.2
Anderson, C.3
-
3
-
-
0003778897
-
-
Springer-Verlag, Berlin, Heidelberg
-
A. BENVENISTE, M. METIVIER, AND P. PRIOURET, Adaptive Algorithms and Stochastic Approximations, Springer-Verlag, Berlin, Heidelberg, 1990.
-
(1990)
Adaptive Algorithms and Stochastic Approximations
-
-
Benveniste, A.1
Metivier, M.2
Priouret, P.3
-
6
-
-
0031076413
-
Stochastic approximation with two time scales
-
V. S. BORKAR, Stochastic approximation with two time scales, Systems Control Lett., 29 (1997), pp. 291-294.
-
(1997)
Systems Control Lett.
, vol.29
, pp. 291-294
-
-
Borkar, V.S.1
-
7
-
-
0031258478
-
Perturbation realization, potentials, and sensitivity analysis of Markov processes
-
X. R. CAO AND H. F. CHEN, Perturbation realization, potentials, and sensitivity analysis of Markov processes, IEEE Trans. Automat. Control, 42 (1997), pp. 1382-1393.
-
(1997)
IEEE Trans. Automat. Control
, vol.42
, pp. 1382-1393
-
-
Cao, X.R.1
Chen, H.F.2
-
8
-
-
85086530043
-
Stochastic approximation for Monte Carlo optimization
-
Washington, DC
-
P. W. GLYNN, Stochastic approximation for Monte Carlo optimization, in Proceedings of the 1986 Winter Simulation Conference, Washington, DC, 1986, pp. 285-289.
-
(1986)
Proceedings of the 1986 Winter Simulation Conference
, pp. 285-289
-
-
Glynn, P.W.1
-
9
-
-
0001354607
-
Likelihood ratio gradient estimation for stochastic recursions
-
P. W. GLYNN AND P. L'EcuYER, Likelihood ratio gradient estimation for stochastic recursions, Adv. Appl. Probab., 27 (1995), pp. 1019-1053.
-
(1995)
Adv. Appl. Probab.
, vol.27
, pp. 1019-1053
-
-
Glynn, P.W.1
L'Ecuyer, P.2
-
10
-
-
85153938292
-
Reinforcement learning algorithms for partially observable Markov decision problems
-
G. Tesauro and D. Touretzky, eds., Morgan Kaufman, San Francisco, CA
-
T. JAAKKOLA, S. P. SINGH, AND M. I. JORDAN, Reinforcement learning algorithms for partially observable Markov decision problems, in Advances in Neural Information Processing Systems 7, G. Tesauro and D. Touretzky, eds., Morgan Kaufman, San Francisco, CA, 1995, pp. 345-352.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 345-352
-
-
Jaakkola, T.1
Singh, S.P.2
Jordan, M.I.3
-
11
-
-
0042758707
-
-
Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
-
V. R. KONDA, Actor-Critic Algorithms, Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 2002.
-
(2002)
Actor-critic Algorithms
-
-
Konda, V.R.1
-
12
-
-
0343893613
-
Actor-critic-type learning algorithms for Markov decision processes
-
V. R. KONDA AND V. S. BORKAR, Actor-critic-type learning algorithms for Markov decision processes, SIAM J. Control Optim., 38 (1999), pp. 94-123.
-
(1999)
SIAM J. Control Optim.
, vol.38
, pp. 94-123
-
-
Konda, V.R.1
Borkar, V.S.2
-
13
-
-
84898938510
-
Actor-critic algorithms
-
S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA
-
V. R. KONDA AND J. N. TSITSIKLIS, Actor-critic algorithms, in Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA, 2000, pp. 1008-1014.
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1008-1014
-
-
Konda, V.R.1
Tsitsiklis, J.N.2
-
16
-
-
0035249254
-
Simulation-based optimization of Markov reward processes
-
P. MARBACH AND J. N. TSITSIKLIS, Simulation-based optimization of Markov reward processes, IEEE Trans. Automat. Control, 46 (2001), pp. 191-209.
-
(2001)
IEEE Trans. Automat. Control
, vol.46
, pp. 191-209
-
-
Marbach, P.1
Tsitsiklis, J.N.2
-
20
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA
-
R. S. SUTTON, D. MCALLESTER, S. SINGH, AND Y. MANSOUR, Policy gradient methods for reinforcement learning with function approximation, in Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA, 2000, pp. 1057-1063.
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1057-1063
-
-
Sutton, R.S.1
Mcallester, D.2
Singh, S.3
Mansour, Y.4
-
21
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
J. N. TSITSIKLIS AND B. VAN ROY, An analysis of temporal-difference learning with function approximation, IEEE Trans. Automat. Control, 42 (1997), pp. 674-690.
-
(1997)
IEEE Trans. Automat. Control
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
22
-
-
0033221519
-
Average cost temporal-difference learning
-
J. N. TSITSIKLIS AND B. VAN ROY, Average cost temporal-difference learning, Automatica J. IFAC, 35 (1999), pp. 1799-1808.
-
(1999)
Automatica J. IFAC
, vol.35
, pp. 1799-1808
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
23
-
-
0000337576
-
Simple statistical gradient following algorithms for connectionist reinforcement learning
-
R. WILLIAMS, Simple statistical gradient following algorithms for connectionist reinforcement learning, Machine Learning, 8 (1992), pp. 229-256.
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.1
-
24
-
-
84904772278
-
Pseudogradient adaptation and training algorithms
-
B. T. POLYAK, Pseudogradient adaptation and training algorithms, Autom. Remote Control, 34 (1973), pp. 377-397.
-
(1973)
Autom. Remote Control
, vol.34
, pp. 377-397
-
-
Polyak, B.T.1
|