-
1
-
-
34548752490
-
Value-iteration based fitted policy iteration: Learning with a single trajectory
-
DOI 10.1109/ADPRL.2007.368207, 4220852, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
-
Antos, A., Szepesvári, Cs., & Munos, R. (2007). Value-iteration based fitted policy iteration: learning with a single trajectory. In 2007 IEEE symposium on approximate dynamic programming and reinforcement learning (ADPRL) (pp. 330-337). Honolulu: IEEE Press. (Pubitemid 47431404)
-
(2007)
Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
, pp. 330-337
-
-
Antos, A.1
Szepesvari, C.2
Munos, R.3
-
2
-
-
85161978146
-
Fitted Q-iteration in continuous action-space MDPs
-
J. Platt, D. Koller, Y. Singer, & S. Roweis Eds., Cambridge: MIT Press
-
Antos, A., Munos, R., & Szepesvári, Cs. (2008a). Fitted Q-iteration in continuous action-space MDPs. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (NIPS-20) (pp. 9-16). Cambridge: MIT Press.
-
(2008)
Advances in Neural Information Processing Systems (NIPS-20)
, pp. 9-16
-
-
Antos, A.1
Munos, R.2
Szepesvári, C.S.3
-
3
-
-
40849145988
-
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
-
Antos, A., Szepesvári, Cs., Munos, R. (2008b). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71, 89-129.
-
(2008)
Machine Learning
, vol.71
, pp. 89-129
-
-
Antos, A.1
Szepesvári, C.S.2
Munos, R.3
-
4
-
-
77956649096
-
A survey of cross-validation procedures for model selection
-
Arlot, S., & Celisse, A. (2009). A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40-79.
-
(2009)
Statistics Surveys
, vol.4
, pp. 40-79
-
-
Arlot, S.1
Celisse, A.2
-
5
-
-
0001347323
-
Complexity regularization with application to artificial neural networks
-
G. Roussas Ed., Norwell: Kluwer Academic
-
Barron, A. R. (1991). Complexity regularization with application to artificial neural networks. In G. Roussas (Ed.), Nonparametric function estimation and related topics (pp. 561-576). Norwell: Kluwer Academic.
-
(1991)
Nonparametric Function Estimation and Related Topics
, pp. 561-576
-
-
Barron, A.R.1
-
6
-
-
84996142818
-
The MDL principle, maximum likelihoods, and statistical risk
-
P. Grünwald, P. Myllymäki, I. Tabus, M. Weinberger, & B. Yu Eds., Tampere international center for signal processing
-
Barron, A. R., Huang, C., Li, J. Q., & Luo, X. (2008). The MDL principle, maximum likelihoods, and statistical risk. In P. Grünwald, P. Myllymäki, I. Tabus, M. Weinberger, & B. Yu (Eds.), TICSP series: Vol. 38. Festschrift in honor of Jorma Rissanen on the occasion of his 75th birthday. Tampere international center for signal processing.
-
(2008)
TICSP Series: Vol. 38. Festschrift in Honor of Jorma Rissanen on the Occasion of His 75th Birthday
-
-
Barron, A.R.1
Huang, C.2
Li, J.Q.3
Luo, X.4
-
7
-
-
0036643049
-
Model selection and error estimation
-
DOI 10.1023/A:1013999503812
-
Bartlett, P. L., Boucheron, S., & Lugosi, G. (2002). Model selection and error estimation. Machine Learning, 48 (1-3), 85-113. (Pubitemid 34247574)
-
(2002)
Machine Learning
, vol.48
, Issue.1-3
, pp. 85-113
-
-
Bartlett, P.L.1
Boucheron, S.2
Lugosi, G.3
-
8
-
-
26444592981
-
Local rademacher complexities
-
DOI 10.1214/009053605000000282
-
Bartlett, P. L., Bousquet, O., & Mendelson, S. (2005). Local Rademacher complexities. Annals of Statistics, 33(4), 1497-1537. (Pubitemid 41423979)
-
(2005)
Annals of Statistics
, vol.33
, Issue.4
, pp. 1497-1537
-
-
Bartlett, P.L.1
Bousquet, O.2
Mendelson, S.3
-
12
-
-
31844451013
-
Reinforcement learning with Gaussian processes
-
DOI 10.1145/1102351.1102377, ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
-
Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In ICML'05: proceedings of the 22nd international conference on machine learning (pp. 201-208). New York: ACM. (Pubitemid 43183334)
-
(2005)
ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
, pp. 201-208
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
13
-
-
21844465127
-
Tree-based batch mode reinforcement learning
-
Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 503-556.
-
(2005)
Journal of Machine Learning Research
, vol.6
, pp. 503-556
-
-
Ernst, D.1
Geurts, P.2
Wehenkel, L.3
-
15
-
-
70449644892
-
Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems
-
Farahmand, A.-m., Ghavamzadeh, M., Szepesvári, Cs., & Mannor, S. (2009a). Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems. In Proceedings of American control conference (ACC) (pp. 725-730).
-
(2009)
Proceedings of American Control Conference (ACC)
, pp. 725-730
-
-
Farahmand, A.-M.1
Ghavamzadeh, M.2
Szepesvári, C.S.3
Mannor, S.4
-
16
-
-
70049096468
-
Regularized policy iteration
-
D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou Eds., Cambridge: MIT Press
-
Farahmand, A.-m., Ghavamzadeh, M., Szepesvári, Cs., & Mannor, S. (2009b). Regularized policy iteration. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (NIPS-21) (pp. 441-448). Cambridge: MIT Press.
-
(2009)
Advances in Neural Information Processing Systems (NIPS-21)
, pp. 441-448
-
-
Farahmand, A.-M.1
Ghavamzadeh, M.2
Szepesvári, C.S.3
Mannor, S.4
-
17
-
-
0003624357
-
-
New York: Springer
-
Györfi, L., Kohler, M., Krzyžak, A., & Walk, H. (2002). A distribution-free theory of nonparametric regression. New York: Springer.
-
(2002)
A Distribution-free Theory of Nonparametric Regression
-
-
Györfi, L.1
Kohler, M.2
Krzyžak, A.3
Walk, H.4
-
18
-
-
0003684449
-
-
Berlin: Springer
-
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: data mining, inference, and prediction. Berlin: Springer.
-
(2001)
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
-
-
Hastie, T.1
Tibshirani, R.2
Friedman, J.3
-
23
-
-
34548803187
-
Sparse temporal difference learning using LASSO
-
DOI 10.1109/ADPRL.2007.368210, 4220855, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
-
Loth, M., Davy, M., & Preux, P. (2007). Sparse temporal difference learning using LASSO. In IEEE international symposium on approximate dynamic programming and reinforcement learning (pp. 352-359). (Pubitemid 47431407)
-
(2007)
Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
, pp. 352-359
-
-
Loth, M.1
Davy, M.2
Preux, P.3
-
24
-
-
23744490659
-
Complexity regularization via localized random penalties
-
DOI 10.1214/009053604000000463
-
Lugosi, G., & Wegkamp, M. (2004). Complexity regularization via localized random penalties. Annals of Statistics, 32, 1679-1697. (Pubitemid 41250281)
-
(2004)
Annals of Statistics
, vol.32
, Issue.4
, pp. 1679-1697
-
-
Lugosi, G.1
Wegkamp, M.2
-
26
-
-
0033904367
-
Nonparametric time series prediction through adaptive model selection
-
DOI 10.1023/A:1007602715810
-
Meir, R. (2000). Nonparametric time series prediction through adaptive model selection. Machine Learning, 39(1), 5-34. (Pubitemid 30588208)
-
(2000)
Machine Learning
, vol.39
, Issue.1
, pp. 5-34
-
-
Meir, R.1
-
27
-
-
17444414191
-
Basis function adaptation in temporal difference reinforcement learning
-
DOI 10.1007/s10479-005-5732-z
-
Menache, I., Mannor, S., & Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operation Research, 134(1), 215-238. (Pubitemid 40550047)
-
(2005)
Annals of Operations Research
, vol.134
, Issue.1
, pp. 215-238
-
-
Menache, I.1
Mannor, S.2
Shimkin, N.3
-
29
-
-
0031647695
-
Memory-Universal Prediction of Stationary Random Processes
-
PII S0018944898000017
-
Modha, D. S., & Masry, E. (1998). Memory-universal prediction of stationary random processes. IEEE Transactions on Information Theory, 44(1), 117-133. (Pubitemid 128737883)
-
(1998)
IEEE Transactions on Information Theory
, vol.44
, Issue.1
, pp. 117-133
-
-
Modha, D.S.1
-
30
-
-
34547982545
-
Analyzing feature generation for value-function approximation
-
New York: ACM
-
Parr, R., Painter-Wakefield, C., Li, L., & Littman, M. (2007). Analyzing feature generation for value-function approximation. In ICML'07: proceedings of the 24th international conference on machine learning (pp. 737-744). New York: ACM.
-
(2007)
ICML'07: Proceedings of the 24th International Conference on Machine Learning
, pp. 737-744
-
-
Parr, R.1
Painter-Wakefield, C.2
Li, L.3
Littman, M.4
-
32
-
-
33646398129
-
Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method
-
Riedmiller, M. (2005). Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In 16th European conference on machine learning (pp. 317-328).
-
(2005)
16th European Conference on Machine Learning
, pp. 317-328
-
-
Riedmiller, M.1
-
33
-
-
0034336998
-
Concentration of measure inequalities for Markov chains and φ-mixing processes
-
Samson, P.-M. (2000). Concentration of measure inequalities for Markov chains and φ-mixing processes. Annals of Probability, 28(1), 416-461.
-
(2000)
Annals of Probability
, vol.28
, Issue.1
, pp. 416-461
-
-
Samson, P.-M.1
-
38
-
-
50849091035
-
Oracle inequalities for multi-fold cross validation
-
van der Vaart, A. W., Dudoit, S., & van der Laan, M. J. (2006). Oracle inequalities for multi-fold cross validation. Statistics & Decisions, 24, 351-372.
-
(2006)
Statistics & Decisions
, vol.24
, pp. 351-372
-
-
Van Der Vaart, A.W.1
Dudoit, S.2
Van Der Laan, M.J.3
-
40
-
-
21144432799
-
Model selection in nonparametric regression
-
Wegkamp, M. (2003). Model selection in nonparametric regression. Annals of Statistics, 31(1), 252-273.
-
(2003)
Annals of Statistics
, vol.31
, Issue.1
, pp. 252-273
-
-
Wegkamp, M.1
-
41
-
-
34547098844
-
Kernel-based least squares policy iteration for reinforcement learning
-
DOI 10.1109/TNN.2007.899161, Neural Networks for Feedback Control Systems
-
Xu, X., Hu, D., & Lu, X. (2007). Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18, 973-992. (Pubitemid 47098876)
-
(2007)
IEEE Transactions on Neural Networks
, vol.18
, Issue.4
, pp. 973-992
-
-
Xu, X.1
Hu, D.2
Lu, X.3
|