-
1
-
-
61849084542
-
A learning algorithm for risk-sensitive cost
-
Tech. Report No. 2006/25, Dept. of Math, Indian Institute of Science, Bangalore, India
-
A. Basu, Bhattacharyya, V. Borkar, A learning algorithm for risk-sensitive cost, Tech. Report No. 2006/25, Dept. of Math., Indian Institute of Science, Bangalore, India, 2006
-
(2006)
-
-
Basu, A.1
Bhattacharyya2
Borkar, V.3
-
2
-
-
85036496976
-
Improved temporal difference methods with linear function approximation
-
Si J., Barto A., and Powell W. (Eds), IEEE Press, NY
-
Bertsekas D.P., Borkar V., and Nedić A. Improved temporal difference methods with linear function approximation. In: Si J., Barto A., and Powell W. (Eds). Learning and Approximate Dynamic Programming (2004), IEEE Press, NY
-
(2004)
Learning and Approximate Dynamic Programming
-
-
Bertsekas, D.P.1
Borkar, V.2
Nedić, A.3
-
4
-
-
4243567726
-
Temporal differences-based policy iteration and applications in neuro-dynamic programming, Lab. for Info. and Dec. Sys
-
Report LIDS-P-2349, MIT, Cambridge, MA
-
D.P. Bertsekas, S. Ioffe, Temporal differences-based policy iteration and applications in neuro-dynamic programming, Lab. for Info. and Dec. Sys. Report LIDS-P-2349, MIT, Cambridge, MA, 1996
-
(1996)
-
-
Bertsekas, D.P.1
Ioffe, S.2
-
5
-
-
79960439092
-
Solution of large systems of equations using approximate dynamic programming methods, Lab. for Info. and Dec. Sys
-
Report LIDS-2754, MIT, Cambridge, MA
-
D.P. Bertsekas, H. Yu, Solution of large systems of equations using approximate dynamic programming methods, Lab. for Info. and Dec. Sys. Report LIDS-2754, MIT, Cambridge, MA, 2007
-
(2007)
-
-
Bertsekas, D.P.1
Yu, H.2
-
8
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
Boyan J.A. Technical update: Least-squares temporal difference learning. Machine Learning 49 (2002) 1-15
-
(2002)
Machine Learning
, vol.49
, pp. 1-15
-
-
Boyan, J.A.1
-
9
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
Bradtke S.J., and Barto A.G. Linear least-squares algorithms for temporal difference learning. Machine Learning 22 (1996) 33-57
-
(1996)
Machine Learning
, vol.22
, pp. 33-57
-
-
Bradtke, S.J.1
Barto, A.G.2
-
10
-
-
33646435300
-
A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
-
Choi D.S., and Van Roy B. A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynamic Systems: Theory and Applications 16 (2006) 207-239
-
(2006)
Discrete Event Dynamic Systems: Theory and Applications
, vol.16
, pp. 207-239
-
-
Choi, D.S.1
Van Roy, B.2
-
11
-
-
85162635208
-
Monte Carlo methods for the iteration of linear operators
-
Curtiss J.H. Monte Carlo methods for the iteration of linear operators. Journal of Mathematics and Physics 32 (1953) 209-232
-
(1953)
Journal of Mathematics and Physics
, vol.32
, pp. 209-232
-
-
Curtiss, J.H.1
-
12
-
-
0041541978
-
A theoretical comparison of the efficiencies of two classical methods and a Monte Carlo method for computing one component of the solution of a set of linear algebraic equations
-
Meyer H.A. (Ed), Wiley, New York, NY
-
Curtiss J.H. A theoretical comparison of the efficiencies of two classical methods and a Monte Carlo method for computing one component of the solution of a set of linear algebraic equations. In: Meyer H.A. (Ed). Symposium on Monte Carlo Methods (1954), Wiley, New York, NY 191-233
-
(1954)
Symposium on Monte Carlo Methods
, pp. 191-233
-
-
Curtiss, J.H.1
-
14
-
-
0014705837
-
A retrospective and prospective survey of the Monte Carlo method
-
Halton J.H. A retrospective and prospective survey of the Monte Carlo method. SIAM Review 12 (1970) 1-63
-
(1970)
SIAM Review
, vol.12
, pp. 1-63
-
-
Halton, J.H.1
-
17
-
-
17444414191
-
Basis function adaptation in temporal difference reinforcement learning
-
Menache I., Mannor S., and Shimkin N. Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134 (2005)
-
(2005)
Annals of Operations Research
, vol.134
-
-
Menache, I.1
Mannor, S.2
Shimkin, N.3
-
19
-
-
34547982545
-
Analyzing feature generation for value-function approximation
-
Corvallis, OR
-
R. Parr, C. Painter-Wakefield, L. Li, M. Littman, Analyzing feature generation for value-function approximation, in: Proc. of the 24th International Conference on Machine Learning, Corvallis, OR. 2007
-
(2007)
Proc. of the 24th International Conference on Machine Learning
-
-
Parr, R.1
Painter-Wakefield, C.2
Li, L.3
Littman, M.4
-
23
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Sutton R.S. Learning to predict by the methods of temporal differences. Machine Learning 3 (1988) 9-44
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
24
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
Tsitsiklis J.N., and Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42 (1997) 674-690
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
25
-
-
0033221519
-
Average cost temporal-difference learning
-
Tsitsiklis J.N., and Van Roy B. Average cost temporal-difference learning. Automatica 35 (1999) 1799-1808
-
(1999)
Automatica
, vol.35
, pp. 1799-1808
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
26
-
-
0033351917
-
Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives
-
Tsitsiklis J.N., and Van Roy B. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives. IEEE Transactions on Automatic Control 44 (1999) 1840-1851
-
(1999)
IEEE Transactions on Automatic Control
, vol.44
, pp. 1840-1851
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
27
-
-
61849175825
-
-
Approximate dynamic programming with applications in multi-agent systems, Ph.D. Thesis, Dept. of Electrical Engineering and Computer Science, MIT
-
M.J. Valenti, Approximate dynamic programming with applications in multi-agent systems, Ph.D. Thesis, Dept. of Electrical Engineering and Computer Science, MIT, 2007
-
(2007)
-
-
Valenti, M.J.1
-
29
-
-
34547991475
-
Convergence results for some temporal difference methods based on least squares, Lab. for Info. and Dec. Sys
-
Report 2697, MIT, 2006
-
H. Yu, D.P. Bertsekas, Convergence results for some temporal difference methods based on least squares, Lab. for Info. and Dec. Sys. Report 2697, MIT, 2006
-
-
-
Yu, H.1
Bertsekas, D.P.2
-
30
-
-
61849092475
-
A least squares Q-learning algorithm for optimal stopping problems, Lab. for Info. and Dec. Sys
-
Report 2731, MIT, 2007
-
H. Yu, D.P. Bertsekas, A least squares Q-learning algorithm for optimal stopping problems, Lab. for Info. and Dec. Sys. Report 2731, MIT, 2007
-
-
-
Yu, H.1
Bertsekas, D.P.2
|