-
1
-
-
0016556021
-
A new approach to manipulator control: The cerebellar model articulation controller (CMAC)
-
September
-
J. S. Albus. A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Journal of Dynamic Systems Measurement and Control, 97(September): 220-227, 1975.
-
(1975)
Journal of Dynamic Systems Measurement and Control
, vol.97
, pp. 220-227
-
-
Albus, J.S.1
-
2
-
-
0000396062
-
Natural gradient works efficiently in learning
-
S.-i. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2): 251-276, Feb. 1998. (Pubitemid 128463152)
-
(1998)
Neural Computation
, vol.10
, Issue.2
, pp. 251-276
-
-
Amari, S.-I.1
-
3
-
-
40849145988
-
Learning near-optimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path
-
A. Antos, C. Szepesvári, and R. Munos. Learning near-optimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1):89-129, 2008.
-
(2008)
Machine Learning
, vol.71
, Issue.1
, pp. 89-129
-
-
Antos, A.1
Szepesvári, C.2
Munos, R.3
-
4
-
-
85162480829
-
Non-asymptotic analysis of stochastic approximation algorithms for machine learning
-
F. Bach and E. Moulines. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Advances in Neural Information Processing Systems 24, 2011.
-
(2011)
Advances in Neural Information Processing Systems
, vol.24
-
-
Bach, F.1
Moulines, E.2
-
6
-
-
77955920480
-
Accuracy of reinforcement learning algorithms for predicting aircraft taxi-out times: A case-study of Tampa Bay departures
-
P. Balakrishna, R. Ganesan, and L. Sherry. Accuracy of reinforcement learning algorithms for predicting aircraft taxi-out times: A case-study of Tampa Bay departures. Transportation Research Part C: Emerging Technologies, 18(6):950-962, 2010.
-
(2010)
Transportation Research Part C: Emerging Technologies
, vol.18
, Issue.6
, pp. 950-962
-
-
Balakrishna, P.1
Ganesan, R.2
Sherry, L.3
-
8
-
-
61849106433
-
Projected equation methods for approximate solution of large linear systems
-
D. P. Bertsekas and H. Yu. Projected equation methods for approximate solution of large linear systems. Journal of Computational and Applied Mathematics, 227(1):27-50, 2009.
-
(2009)
Journal of Computational and Applied Mathematics
, vol.227
, Issue.1
, pp. 27-50
-
-
Bertsekas, D.P.1
Yu, H.2
-
9
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
DOI 10.1023/A:1017936530646
-
J. A. Boyan. Technical update: Least-squares temporal difference learning. Machine Learning, 49(2):233-246, 2002. (Pubitemid 34325688)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 233-246
-
-
Boyan, J.A.1
-
10
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
S. J. Bradtke and A. G. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1-3):33-57, 1996. (Pubitemid 126724362)
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 33-57
-
-
Bradtke, S.J.1
-
11
-
-
34548275795
-
The Dantzig selector: Statistical estimation when p is much larger than n
-
E. Candes and T. Tao. The Dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics, 35(6):2313-2351, 2005.
-
(2005)
The Annals of Statistics
, vol.35
, Issue.6
, pp. 2313-2351
-
-
Candes, E.1
Tao, T.2
-
12
-
-
33646435300
-
A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
-
D. Choi and B. Roy. A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynamic Systems, 16(2):207-239, 2006.
-
(2006)
Discrete Event Dynamic Systems
, vol.16
, Issue.2
, pp. 207-239
-
-
Choi, D.1
Roy, B.2
-
14
-
-
0032208335
-
Elevator group control using multiple reinforcement learning agents
-
R. H. Crites and A. G. Barto. Elevator group control using multiple reinforcement learning agents. Machine Learning, 33(2-3):235-262, 1998. (Pubitemid 128522644)
-
(1998)
Machine Learning
, vol.12
, Issue.4
, pp. 235-262
-
-
Crites, R.H.1
Barto, A.G.2
-
16
-
-
17444409624
-
A tutorial on the crossentropy method
-
P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein. A tutorial on the crossentropy method. Annals of Operations Research, (1):19-67, 2010.
-
(2010)
Annals of Operations Research
, vol.1
, pp. 19-67
-
-
De Boer, P.-T.1
Kroese, D.P.2
Mannor, S.3
Rubinstein, R.Y.4
-
19
-
-
3242708140
-
Least angle regression
-
DOI 10.1214/009053604000000067
-
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407-499, 2004. (Pubitemid 41250302)
-
(2004)
Annals of Statistics
, vol.32
, Issue.2
, pp. 407-499
-
-
Efron, B.1
Hastie, T.2
Johnstone, I.3
Tibshirani, R.4
Ishwaran, H.5
Knight, K.6
Loubes, J.-M.7
Massart, P.8
Madigan, D.9
Ridgeway, G.10
Rosset, S.11
Zhu, J.I.12
Stine, R.A.13
Turlach, B.A.14
Weisberg, S.15
Hastie, T.16
Johnstone, I.17
Tibshirani, R.18
-
23
-
-
83155175393
-
Model selection in reinforcement learning
-
A.-m. Farahmand and C. Szepesvári. Model selection in reinforcement learning. Machine Learning, 85(3):299-332, 2011.
-
(2011)
Machine Learning
, vol.85
, Issue.3
, pp. 299-332
-
-
Farahmand, A.-M.1
Szepesvári, C.2
-
33
-
-
80053456360
-
Online discovery of feature dependencies
-
A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. P. How. Online discovery of feature dependencies. In Proceedings of the 28th International Conference on Machine Learning, 2011.
-
(2011)
Proceedings of the 28th International Conference on Machine Learning
-
-
Geramifard, A.1
Doshi, F.2
Redding, J.3
Roy, N.4
How, J.P.5
-
37
-
-
0001240715
-
Importance sampling for stochastic simulations
-
P. W. Glynn and D. L. Iglehart. Importance sampling for stochastic simulations. Management Science, 35(11):1367-1392, 1989.
-
(1989)
Management Science
, vol.35
, Issue.11
, pp. 1367-1392
-
-
Glynn, P.W.1
Iglehart, D.L.2
-
47
-
-
0030721089
-
Comparison of CMACs and radial basis functions for kocal function approximators in reinforcement learning
-
R. M. Kretchmar and C. W. Anderson. Comparison of CMACs and radial basis functions for kocal function approximators in reinforcement learning. In International Conference on Neural Networks, 1997.
-
(1997)
International Conference on Neural Networks
-
-
Kretchmar, R.M.1
Anderson, C.W.2
-
50
-
-
56449125197
-
A worst-case comparison between temporal difference and residual gradient with linear function approximation
-
L. Li. A worst-case comparison between temporal difference and residual gradient with linear function approximation. In Proceedings of the 25th International Conference on Machine Learning, 2008.
-
(2008)
Proceedings of the 25th International Conference on Machine Learning
-
-
Li, L.1
-
54
-
-
35748957806
-
Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
-
S. Mahadevan and M. Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, 8(Oct):2169-2231, 2007. (Pubitemid 350046199)
-
(2007)
Journal of Machine Learning Research
, vol.8
, pp. 2169-2231
-
-
Mahadevan, S.1
Maggioni, M.2
-
55
-
-
84867615954
-
Tuning-free step-size adaptation
-
A. R. Mahmood, R. S. Sutton, T. Degris, and P. M. Pilarski. Tuning-free step-size adaptation. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2012.
-
(2012)
IEEE International Conference on Acoustics, Speech and Signal Processing
-
-
Mahmood, A.R.1
Sutton, R.S.2
Degris, T.3
Pilarski, P.M.4
-
56
-
-
17444414191
-
Basis function adaptation in temporal difference reinforcement learning
-
DOI 10.1007/s10479-005-5732-z
-
I. Menache, S. Mannor, and N. Shimkin. Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134(1):215-238, 2005. (Pubitemid 40550047)
-
(2005)
Annals of Operations Research
, vol.134
, Issue.1
, pp. 215-238
-
-
Menache, I.1
Mannor, S.2
Shimkin, N.3
-
58
-
-
0037288398
-
Least squares policy evaluation algorithms with linear function approximation
-
A. Nedic and D. P. Bertsekas. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13(1-2):79-110, 2003.
-
(2003)
Discrete Event Dynamic Systems
, vol.13
, Issue.1-2
, pp. 79-110
-
-
Nedic, A.1
Bertsekas, D.P.2
-
62
-
-
56449092660
-
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
-
R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, and M. L. Littman. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In Proceedings of the 25th International Conference on Machine Learning, 2008.
-
(2008)
Proceedings of the 25th International Conference on Machine Learning
-
-
Parr, R.1
Li, L.2
Taylor, G.3
Painter-Wakefield, C.4
Littman, M.L.5
-
71
-
-
0001201756
-
Some studies in machine learning using the game of checkers
-
A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3):210-229, 1959.
-
(1959)
IBM Journal of Research and Development
, vol.3
, Issue.3
, pp. 210-229
-
-
Samuel, A.L.1
-
72
-
-
77956551905
-
Should one compute the temporal difference fix point or minimize the Bellman residual? the unified oblique projection view
-
B. Scherrer. Should one compute the temporal difference fix point or minimize the Bellman residual? the unified oblique projection view. In Proceedings of the 27th International Conference on Machine Learning, 2010.
-
(2010)
Proceedings of the 27th International Conference on Machine Learning
-
-
Scherrer, B.1
-
74
-
-
1942482175
-
Optimality of reinforcement learning algorithms with linear function approximation
-
R. Schoknecht. Optimality of reinforcement learning algorithms with linear function approximation. In Advances in Neural Information Processing Systems 15, 2002.
-
(2002)
Advances in Neural Information Processing Systems
, vol.15
-
-
Schoknecht, R.1
-
78
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3(1):9-44, 1988.
-
(1988)
Machine Learning
, vol.3
, Issue.1
, pp. 9-44
-
-
Sutton, R.S.1
-
81
-
-
77956513316
-
A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation
-
R. S. Sutton, C. Szepesvári, and H. R. Maei. A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. In Advances in Neural Information Processing Systems 21, 2008.
-
(2008)
Advances in Neural Information Processing Systems
, vol.21
-
-
Sutton, R.S.1
Szepesvári, C.2
Maei, H.R.3
-
82
-
-
71149099079
-
Fast gradient-descent methods for temporal-difference learning with linear function approximation
-
R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the 26th Annual International Conference on Machine Learning, 2009.
-
(2009)
Proceedings of the 26th Annual International Conference on Machine Learning
-
-
Sutton, R.S.1
Maei, H.R.2
Precup, D.3
Bhatnagar, S.4
Silver, D.5
Szepesvári, C.6
Wiewiora, E.7
-
84
-
-
0000985504
-
TD-gammon a self-teaching backgammon program, achieves master-level play
-
G. Tesauro. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
-
(1994)
Neural Computation
, vol.6
, Issue.2
, pp. 215-219
-
-
Tesauro, G.1
-
85
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
PII S0018928697034375
-
J. N. Tsitsiklis and B. van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions On Automatic Control, 42(5):674-690, 1997. (Pubitemid 127760263)
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
87
-
-
33750328566
-
Kernel least-squares temporal difference learning
-
X. Xu, T. Xie, D. Hu, and X. Lu. Kernel least-squares temporal difference learning. International Journal of Information Technology, 11(9):54-63, 2005.
-
(2005)
International Journal of Information Technology
, vol.11
, Issue.9
, pp. 54-63
-
-
Xu, X.1
Xie, T.2
Hu, D.3
Lu, X.4
-
89
-
-
69949155103
-
The composite absolute penalties family for grouped and hierarchical variable selection
-
P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics, 37(6A):3468-3497, 2009.
-
(2009)
The Annals of Statistics
, vol.37
, Issue.6
, pp. 3468-3497
-
-
Zhao, P.1
Rocha, G.2
Yu, B.3
|