-
1
-
-
0037225359
-
Stochastic Approximation for Non-Expansive Maps: Application to QLearning Algorithms
-
Abounadi, J., Bertsekas, D. P., and Borkar, V., 2002. "Stochastic Approximation for Non-Expansive Maps: Application to QLearning Algorithms," SIAM J. on Control and Optimization, Vol. 41, pp. 1-22.
-
(2002)
SIAM J. on Control and Optimization
, vol.41
, pp. 1-22
-
-
Abounadi, J.1
Bertsekas, D.P.2
Borkar, V.3
-
2
-
-
4243567726
-
Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming
-
MIT, Cambridge, MA
-
Bertsekas, D. P., and Ioffe, S., 1996. "Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming," Lab. for Info. and Decision Systems Report LIDSP-2349, MIT, Cambridge, MA.
-
(1996)
Lab. for Info. and Decision Systems Report LIDSP-2349
-
-
Bertsekas, D.P.1
Ioffe, S.2
-
3
-
-
0003636164
-
-
Prentice-Hall, Englewood Cliffs, N. J; republished by Athena Scientific, Belmont, MA
-
Bertsekas, D. P., and Tsitsiklis, J. N., 1989. Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, N. J; republished by Athena Scientific, Belmont, MA, 1997.
-
(1989)
Parallel and Distributed Computation: Numerical Methods
-
-
Bertsekas, D.P.1
Tsitsiklis, J.N.2
-
4
-
-
0003487482
-
-
Athena Scientific, Belmont, MA
-
Bertsekas, D. P., and Tsitsiklis, J. N., 1996. Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.
-
(1996)
Neuro-Dynamic Programming
-
-
Bertsekas, D.P.1
Tsitsiklis, J.N.2
-
6
-
-
0020138998
-
Distributed Dynamic Programming
-
Bertsekas, D. P., 1982. "Distributed Dynamic Programming," IEEE Trans. Automatic Control, Vol. AC-27, pp. 610-616.
-
(1982)
IEEE Trans. Automatic Control
, vol.AC-27
, pp. 610-616
-
-
Bertsekas, D.P.1
-
7
-
-
0020822225
-
Asynchronous Distributed Computation of Fixed Points
-
Bertsekas, D. P., 1983. "Asynchronous Distributed Computation of Fixed Points," Math. Programming, Vol. 27, pp. 107-120.
-
(1983)
Math. Programming
, vol.27
, pp. 107-120
-
-
Bertsekas, D.P.1
-
8
-
-
0003565783
-
-
3rd Edition, Athena Scientific, Belmont, MA
-
Bertsekas, D. P., 2005. Dynamic Programming and Optimal Control, 3rd Edition, Vol. I, Athena Scientific, Belmont, MA.
-
(2005)
Dynamic Programming and Optimal Control
, vol.1
-
-
Bertsekas, D.P.1
-
9
-
-
0003565783
-
-
3rd Edition, Athena Scientific, Belmont, MA
-
Bertsekas, D. P., 2007. Dynamic Programming and Optimal Control, 3rd Edition, Vol. II, Athena Scientific, Belmont, MA.
-
(2007)
Dynamic Programming and Optimal Control
, vol.2
-
-
Bertsekas, D.P.1
-
11
-
-
41049095293
-
New Algorithms of the Q-Learning Type
-
Bhatnagar, S., and Babu, K. M., 2008. "New Algorithms of the Q-Learning Type," Automatica, Vol. 44, pp. 1111-1119.
-
(2008)
Automatica
, vol.44
, pp. 1111-1119
-
-
Bhatnagar, S.1
Babu, K.M.2
-
12
-
-
0032075427
-
Asynchronous Stochastic Approximations
-
correction note in ibid., Vol. 38, pp. 662-663
-
Borkar, V. S., 1998. "Asynchronous Stochastic Approximations," SIAM J. on Control and Optimization, Vol. 36, pp. 840-851; correction note in ibid., Vol. 38, pp. 662-663.
-
(1998)
SIAM J. on Control and Optimization
, vol.36
, pp. 840-851
-
-
Borkar, V.S.1
-
14
-
-
0036832950
-
Technical Update: Least-Squares Temporal Difference Learning
-
Boyan, J. A., 2002. "Technical Update: Least-Squares Temporal Difference Learning," Machine Learning, Vol. 49, pp. 1-15.
-
(2002)
Machine Learning
, vol.49
, pp. 1-15
-
-
Boyan, J.A.1
-
15
-
-
0001771345
-
Linear Least-Squares Algorithms for Temporal Difference Learning
-
Bradtke, S. J., and Barto, A. G., 1996. "Linear Least-Squares Algorithms for Temporal Difference Learning," Machine Learning, Vol. 22, pp. 33-57.
-
(1996)
Machine Learning
, vol.22
, pp. 33-57
-
-
Bradtke, S.J.1
Barto, A.G.2
-
16
-
-
34547120053
-
-
Springer, N.Y.
-
Chang, H. S., Fu, M. C., Hu, J., Marcus, S. I., 2007. Simulation-Based Algorithms for Markov Decision Processes, Springer, N.Y.
-
(2007)
Simulation-Based Algorithms for Markov Decision Processes
-
-
Chang, H.S.1
Fu, M.C.2
Hu, J.3
Marcus, S.I.4
-
18
-
-
33646435300
-
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
-
Choi, D. S., and Van Roy, B., 2006. "A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, Vol. 16, pp. 207-239.
-
(2006)
Discrete Event Dynamic Systems
, vol.16
, pp. 207-239
-
-
Choi, D.S.1
Van Roy, B.2
-
19
-
-
84880694195
-
Stable Function Approximation in Dynamic Programming
-
Gordon, G. J., 1995. "Stable Function Approximation in Dynamic Programming," Proc. ICML.
-
(1995)
Proc. ICML
-
-
Gordon, G.J.1
-
21
-
-
0000439891
-
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
-
Jaakkola, T., Jordan, M. I., and Singh, S. P., 1994. "On the Convergence of Stochastic Iterative Dynamic Programming Algorithms," Neural Computation, Vol. 6, pp. 1185-1201.
-
(1994)
Neural Computation
, vol.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
22
-
-
0000624333
-
Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
-
Jaakkola, T., Singh, S. P., and Jordan, M. I., 1995. "Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems," Proc. NIPS.
-
(1995)
Proc. NIPS
-
-
Jaakkola, T.1
Singh, S.P.2
Jordan, M.I.3
-
23
-
-
17444414191
-
Basis Function Adaptation in Temporal Difference Reinforcement Learning
-
Menache, I., Mannor, S., and Shimkin, N., 2005. "Basis Function Adaptation in Temporal Difference Reinforcement Learning," Ann. Oper. Res., Vol. 134, pp. 215-238.
-
(2005)
Ann. Oper. Res.
, vol.134
, pp. 215-238
-
-
Menache, I.1
Mannor, S.2
Shimkin, N.3
-
24
-
-
79951481923
-
Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
-
Maei, H. R., Szepesvari, C., Bhatnagar, S., Silver, D., Precup, D., and Sutton, R. S., 2009. "Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation," Proc. NIPS.
-
(2009)
Proc. NIPS
-
-
Maei, H.R.1
Szepesvari, C.2
Bhatnagar, S.3
Silver, D.4
Precup, D.5
Sutton, R.S.6
-
28
-
-
71149099079
-
Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation
-
Sutton, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvari, C., and Wiewiora, E., 2009. "Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation," Proc. ICML.
-
(2009)
Proc. ICML
-
-
Sutton, R.S.1
Maei, H.R.2
Precup, D.3
Bhatnagar, S.4
Silver, D.5
Szepesvari, C.6
Wiewiora, E.7
-
29
-
-
85048441545
-
A Convergent O(n) Algorithm for Off-Policy Temporal-Difference Learning with Linear Function Approximation
-
Sutton, R. S., Szepesvari, C., and Maei, H. R., 2008. "A Convergent O(n) Algorithm for Off-Policy Temporal-Difference Learning with Linear Function Approximation," Proc. NIPS.
-
(2008)
Proc. NIPS
-
-
Sutton, R.S.1
Szepesvari, C.2
Maei, H.R.3
-
30
-
-
0004007508
-
-
MIT Press, Cambridge, MA
-
Sutton, R. S., and Barto, A. G., 1998. Reinforcement Learning, MIT Press, Cambridge, MA.
-
(1998)
Reinforcement Learning
-
-
Sutton, R.S.1
Barto, A.G.2
-
31
-
-
33847202724
-
Learning to Predict by the Methods of Temporal Differences
-
Sutton, R. S., 1988. "Learning to Predict by the Methods of Temporal Differences," Machine Learning, Vol. 3, pp. 9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
32
-
-
0022783899
-
Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms
-
Tsitsiklis, J. N., Bertsekas, D. P., and Athans, M., 1986. "Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms," IEEE Trans. on Aut. Control, Vol. AC-31, pp. 803-812.
-
(1986)
IEEE Trans. on Aut. Control
, vol.AC-31
, pp. 803-812
-
-
Tsitsiklis, J.N.1
Bertsekas, D.P.2
Athans, M.3
-
33
-
-
0029752470
-
Feature-Based Methods for Large-Scale Dynamic Programming
-
Tsitsiklis, J. N., and Van Roy, B., 1996. "Feature-Based Methods for Large-Scale Dynamic Programming," Machine Learning, Vol. 22, pp. 59-94.
-
(1996)
Machine Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
34
-
-
0033351917
-
Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing Financial Derivatives
-
Tsitsiklis, J. N., and Van Roy, B., 1999. "Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing Financial Derivatives," IEEE Trans. on Aut. Control, Vol. 44, pp. 1840-1851.
-
(1999)
IEEE Trans. on Aut. Control
, vol.44
, pp. 1840-1851
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
35
-
-
0028497630
-
Asynchronous Stochastic Approximation and Q-Learning
-
Tsitsiklis, J. N., 1994. "Asynchronous Stochastic Approximation and Q-Learning," Machine Learning, Vol. 16, pp. 185-202.
-
(1994)
Machine Learning
, vol.16
, pp. 185-202
-
-
Tsitsiklis, J.N.1
-
36
-
-
0042466434
-
On the Convergence of Optimistic Policy Iteration
-
Tsitsiklis, J. N., 2002. "On the Convergence of Optimistic Policy Iteration," J. of Machine Learning Research, Vol. 3, pp. 59-72.
-
(2002)
J. of Machine Learning Research
, vol.3
, pp. 59-72
-
-
Tsitsiklis, J.N.1
-
39
-
-
0039967456
-
-
Report NU-CCS-93-11, College of Computer Science, Northeastern University, Boston, MA
-
Williams, R. J., and Baird, L. C., 1993. "Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Critic Learning Systems," Report NU-CCS-93-11, College of Computer Science, Northeastern University, Boston, MA.
-
(1993)
Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Critic Learning Systems
-
-
Williams, R.J.1
Baird, L.C.2
-
42
-
-
67650458822
-
Basis Function Adaptation Methods for Cost Approximation in MDP
-
Yu, H., and Bertsekas, D. P., 2009. "Basis Function Adaptation Methods for Cost Approximation in MDP," Proc. IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, Tenn.
-
(2009)
Proc. IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, Tenn
-
-
Yu, H.1
Bertsekas, D.P.2
|