-
2
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
Morgan Kaufmann
-
Baird' L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. Proc. of the Twelfth Int. Conf. on Machine Learning' pp. 30-37. Morgan Kaufmann
-
(1995)
Proc. of the Twelfth Int. Conf. on Machine Learning
, pp. 30-37
-
-
Baird, L.C.1
-
3
-
-
84898958374
-
Gradient descent for general reinforcement learning
-
MIT Press
-
Morgan Kaufmann. Baird' L. C' Moore' A. W. (1999). Gradient descent for general reinforcement learning. NIPS 11. MIT Press.
-
(1999)
NIPS
, vol.11
-
-
Baird, L.C.1
Moore, A.W.2
-
7
-
-
0031258478
-
Perturbation realization' potentials' and sensitivity analysis of Markov Processes
-
Cao' X.-R.' Chen' H.-F. (1997). Perturbation realization' potentials' and sensitivity analysis of Markov Processes' IEEE Trans' on Automatic Control 42(10)=1382-1393.
-
(1997)
IEEE Trans' on Automatic Control
, vol.42
, Issue.10
, pp. 1382-1393
-
-
Cao, X.-R.1
Chen, H.-F.2
-
8
-
-
0346554800
-
Reinforcement comparison
-
In D. S. Touretzky' J. L. Elman' T. J. Sejnowski' and G. E. Hinton (eds.)' Morgan Kaufmann
-
Dayan' P. (1991). Reinforcement comparison. In D. S. Touretzky' J. L. Elman' T. J. Sejnowski' and G. E. Hinton (eds.)' Connectionist Models: Proceedings of the 1990 Summer School' pp. 45-51. Morgan Kaufmann.
-
(1991)
Connectionist Models: Proceedings of the 1990 Summer School
, pp. 45-51
-
-
Dayan, P.1
-
11
-
-
85153938292
-
Reinforcement learning algorithms for partially observable Markov decision problems
-
Morgan Kaufman
-
Jaakkola' T.' Singh' S. P.' Jordan' M. I. (1995) Reinforcement learning algorithms for partially observable Markov decision problems' NIPS 7' pp. 345-352. Morgan Kaufman.
-
(1995)
NIPS
, vol.7
, pp. 345-352
-
-
Jaakkola, T.1
Singh, S.P.2
Jordan, M.I.3
-
12
-
-
0008336447
-
An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions
-
Kimura' H.' Kobayashi' S. (1998). An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions. Proc. ICML-98' pp. 278-286.
-
(1998)
Proc. ICML-98
, pp. 278-286
-
-
Kimura, H.1
Kobayashi, S.2
-
14
-
-
0009011171
-
Simulation-based optimization of Markov reward processes
-
Massachusetts Institute of Technology
-
Mar bach' P.' Tsitsiklis' J. N. (1998) Simulation-based optimization of Markov reward processes' technical report LIDS-P-2411' Massachusetts Institute of Technology.
-
(1998)
Technical Report LIDS-P-2411
-
-
Marbach, P.1
Tsitsiklis, J.N.2
-
15
-
-
2142812536
-
Learning without state-estimation in partially observable Markovian decision problems
-
Singh' S. P.' Jaakkola' T.' Jordan' M. I. (1994). Learning without state-estimation in partially observable Markovian decision problems. Proc. I CML-94' pp. 284-292.
-
(1994)
Proc. i CML-94
, pp. 284-292
-
-
Singh, S.P.1
Jaakkola, T.2
Jordan, M.I.3
-
18
-
-
0029752470
-
Feature-based methods for large scale dynamic programming
-
Tsitsiklis' J. N. Van Roy' B. (1996). Feature-based methods for large scale dynamic programming. Machine Learning 22:59-94.
-
(1996)
Machine Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
19
-
-
0003890455
-
Toward a theory of reinforcement-learning connectionist systems
-
Northeastern University' College of Computer Science
-
Williams' R. J. (1988). Toward a theory of reinforcement-learning connectionist systems. Technical Report NU-CCS-88-3' Northeastern University' College of Computer Science.
-
(1988)
Technical Report NU-CCS-88-3
-
-
Williams, R.J.1
-
20
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams' R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 5:229-256.
-
(1992)
Machine Learning
, vol.5
, pp. 229-256
-
-
Williams, R.J.1
|