-
1
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. Proc. 12th Int. Conf. Machine Learning (pp. 30-37).
-
(1995)
Proc. 12th Int. Conf. Machine Learning
, pp. 30-37
-
-
Baird, L.1
-
2
-
-
0003778897
-
-
Springer-Verlag
-
Benveniste, A., Métivier, M., & Priouret, P. (1990). Adaptive algorithms and stochastic approximations, vol. 22. Springer-Verlag.
-
(1990)
Adaptive algorithms and stochastic approximations
, vol.22
-
-
Benveniste, A.1
Métivier, M.2
Priouret, P.3
-
4
-
-
0031076413
-
Stochastic approximation with two time scales
-
Borkar, V. (1997). Stochastic approximation with two time scales. Systems & Control Letters, 29, 291-294.
-
(1997)
Systems & Control Letters
, vol.29
, pp. 291-294
-
-
Borkar, V.1
-
6
-
-
0034342516
-
On the existence of fixed points for approximate value iteration and temporal-difference learning
-
de Farias, D., & Van Roy, B. (2000). On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105, 589-608.
-
(2000)
Journal of Optimization Theory and Applications
, vol.105
, pp. 589-608
-
-
de Farias, D.1
Van Roy, B.2
-
7
-
-
0030487036
-
Logarithmic Sobolev inequalities for finite Markov chains
-
Diaconis, P., & Saloff-Coste, L. (1996). Logarithmic Sobolev inequalities for finite Markov chains. Annals of Applied Probability, 6, 695-750.
-
(1996)
Annals of Applied Probability
, vol.6
, pp. 695-750
-
-
Diaconis, P.1
Saloff-Coste, L.2
-
8
-
-
0038595393
-
Stable function approximation in dynamic programming
-
CMU-CS-95-103, School of Computer Science, Carnegie Mellon University
-
Gordon, G. (1995). Stable function approximation in dynamic programming (Technical Report CMU-CS-95-103). School of Computer Science, Carnegie Mellon University.
-
(1995)
Technical Report
-
-
Gordon, G.1
-
9
-
-
57649089060
-
-
λ, Technical Report, CMU Learning Lab Internal Report
-
Gordon, G. (1996). Chattering in SARSA(λ). (Technical Report). CMU Learning Lab Internal Report.
-
(1996)
Chattering in SARSA
-
-
Gordon, G.1
-
11
-
-
0000566364
-
Computable bounds for geometric convergence rates of Markov chains
-
Meyn, S., & Tweedie, R. (1994). Computable bounds for geometric convergence rates of Markov chains. Annals of Applied Probability, 4, 981-1011.
-
(1994)
Annals of Applied Probability
, vol.4
, pp. 981-1011
-
-
Meyn, S.1
Tweedie, R.2
-
12
-
-
0036832956
-
Kernel-based reinforcement learning
-
Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49, 161-178.
-
(2002)
Machine Learning
, vol.49
, pp. 161-178
-
-
Ormoneit, D.1
Sen, S.2
-
13
-
-
56449099734
-
On the existence of fixed-points for Q-learning and SARSA in partially observable domains
-
Perkins, T., & Pendrith, M. (2002). On the existence of fixed-points for Q-learning and SARSA in partially observable domains. Proc. 19th Int. Conf. Machine Learning (pp. 490-497).
-
(2002)
Proc. 19th Int. Conf. Machine Learning
, pp. 490-497
-
-
Perkins, T.1
Pendrith, M.2
-
16
-
-
56449114755
-
-
Ribeiro, C., & Szepesvári, C. (1996). Q-learning combined with spreading: Convergence and results. Proc. ISRF-IEE Int. Conf. Intelligent and Cognitive Systems (pp. 32-36).
-
Ribeiro, C., & Szepesvári, C. (1996). Q-learning combined with spreading: Convergence and results. Proc. ISRF-IEE Int. Conf. Intelligent and Cognitive Systems (pp. 32-36).
-
-
-
-
17
-
-
3042638629
-
Quantitative convergence rates of Markov chains: A simple account
-
Rosenthal, J. (2002). Quantitative convergence rates of Markov chains: A simple account. Electronic Communications in Probability, 7, 123-128.
-
(2002)
Electronic Communications in Probability
, vol.7
, pp. 123-128
-
-
Rosenthal, J.1
-
19
-
-
84947807317
-
Open theoretical questions in reinforcement learning
-
Sutton, R. (1999). Open theoretical questions in reinforcement learning. Lecture Notes in Computer Science, 1572, 11-17.
-
(1999)
Lecture Notes in Computer Science
, vol.1572
, pp. 11-17
-
-
Sutton, R.1
-
21
-
-
0035283402
-
On the convergence of temporal-difference learning with linear function approximation
-
Tadić, V. (2001). On the convergence of temporal-difference learning with linear function approximation. Machine Learning, 42, 241-267.
-
(2001)
Machine Learning
, vol.42
, pp. 241-267
-
-
Tadić, V.1
-
22
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
Tsitsiklis, J., & Van Roy, B. (1996a). An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control, 42, 674-690.
-
(1996)
IEEE Trans. Automatic Control
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.1
Van Roy, B.2
-
23
-
-
0029752470
-
Feature-based methods for large scale dynamic programming
-
Tsitsiklis, J., & Van Roy, B. (1996b). Feature-based methods for large scale dynamic programming. Machine Learning, 22, 59-94.
-
(1996)
Machine Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.1
Van Roy, B.2
-
24
-
-
0004049893
-
-
Doctoral dissertation, King's College, University of Cambridge
-
Watkins, C. (1989). Learning from delayed rewards. Doctoral dissertation, King's College, University of Cambridge.
-
(1989)
Learning from delayed rewards
-
-
Watkins, C.1
|