메뉴 건너뛰기




Volumn 85, Issue 3, 2011, Pages 299-332

Model selection in reinforcement learning

Author keywords

Adaptivity; Complexity regularization; Finite sample bounds; Model selection; Off policy learning; Offline learning; Reinforcement learning

Indexed keywords

ADAPTIVITY; COMPLEXITY REGULARIZATIONS; FINITE-SAMPLE BOUNDS; MODEL SELECTION; OFF-LINE LEARNING; OFF-POLICY LEARNING;

EID: 83155175393     PISSN: 08856125     EISSN: 15730565     Source Type: Journal    
DOI: 10.1007/s10994-011-5254-7     Document Type: Article
Times cited : (61)

References (41)
  • 2
    • 85161978146 scopus 로고    scopus 로고
    • Fitted Q-iteration in continuous action-space MDPs
    • J. Platt, D. Koller, Y. Singer, & S. Roweis Eds., Cambridge: MIT Press
    • Antos, A., Munos, R., & Szepesvári, Cs. (2008a). Fitted Q-iteration in continuous action-space MDPs. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (NIPS-20) (pp. 9-16). Cambridge: MIT Press.
    • (2008) Advances in Neural Information Processing Systems (NIPS-20) , pp. 9-16
    • Antos, A.1    Munos, R.2    Szepesvári, C.S.3
  • 3
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • Antos, A., Szepesvári, Cs., Munos, R. (2008b). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71, 89-129.
    • (2008) Machine Learning , vol.71 , pp. 89-129
    • Antos, A.1    Szepesvári, C.S.2    Munos, R.3
  • 4
    • 77956649096 scopus 로고    scopus 로고
    • A survey of cross-validation procedures for model selection
    • Arlot, S., & Celisse, A. (2009). A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40-79.
    • (2009) Statistics Surveys , vol.4 , pp. 40-79
    • Arlot, S.1    Celisse, A.2
  • 5
    • 0001347323 scopus 로고
    • Complexity regularization with application to artificial neural networks
    • G. Roussas Ed., Norwell: Kluwer Academic
    • Barron, A. R. (1991). Complexity regularization with application to artificial neural networks. In G. Roussas (Ed.), Nonparametric function estimation and related topics (pp. 561-576). Norwell: Kluwer Academic.
    • (1991) Nonparametric Function Estimation and Related Topics , pp. 561-576
    • Barron, A.R.1
  • 6
    • 84996142818 scopus 로고    scopus 로고
    • The MDL principle, maximum likelihoods, and statistical risk
    • P. Grünwald, P. Myllymäki, I. Tabus, M. Weinberger, & B. Yu Eds., Tampere international center for signal processing
    • Barron, A. R., Huang, C., Li, J. Q., & Luo, X. (2008). The MDL principle, maximum likelihoods, and statistical risk. In P. Grünwald, P. Myllymäki, I. Tabus, M. Weinberger, & B. Yu (Eds.), TICSP series: Vol. 38. Festschrift in honor of Jorma Rissanen on the occasion of his 75th birthday. Tampere international center for signal processing.
    • (2008) TICSP Series: Vol. 38. Festschrift in Honor of Jorma Rissanen on the Occasion of His 75th Birthday
    • Barron, A.R.1    Huang, C.2    Li, J.Q.3    Luo, X.4
  • 7
    • 0036643049 scopus 로고    scopus 로고
    • Model selection and error estimation
    • DOI 10.1023/A:1013999503812
    • Bartlett, P. L., Boucheron, S., & Lugosi, G. (2002). Model selection and error estimation. Machine Learning, 48 (1-3), 85-113. (Pubitemid 34247574)
    • (2002) Machine Learning , vol.48 , Issue.1-3 , pp. 85-113
    • Bartlett, P.L.1    Boucheron, S.2    Lugosi, G.3
  • 8
    • 26444592981 scopus 로고    scopus 로고
    • Local rademacher complexities
    • DOI 10.1214/009053605000000282
    • Bartlett, P. L., Bousquet, O., & Mendelson, S. (2005). Local Rademacher complexities. Annals of Statistics, 33(4), 1497-1537. (Pubitemid 41423979)
    • (2005) Annals of Statistics , vol.33 , Issue.4 , pp. 1497-1537
    • Bartlett, P.L.1    Bousquet, O.2    Mendelson, S.3
  • 24
    • 23744490659 scopus 로고    scopus 로고
    • Complexity regularization via localized random penalties
    • DOI 10.1214/009053604000000463
    • Lugosi, G., & Wegkamp, M. (2004). Complexity regularization via localized random penalties. Annals of Statistics, 32, 1679-1697. (Pubitemid 41250281)
    • (2004) Annals of Statistics , vol.32 , Issue.4 , pp. 1679-1697
    • Lugosi, G.1    Wegkamp, M.2
  • 26
    • 0033904367 scopus 로고    scopus 로고
    • Nonparametric time series prediction through adaptive model selection
    • DOI 10.1023/A:1007602715810
    • Meir, R. (2000). Nonparametric time series prediction through adaptive model selection. Machine Learning, 39(1), 5-34. (Pubitemid 30588208)
    • (2000) Machine Learning , vol.39 , Issue.1 , pp. 5-34
    • Meir, R.1
  • 27
    • 17444414191 scopus 로고    scopus 로고
    • Basis function adaptation in temporal difference reinforcement learning
    • DOI 10.1007/s10479-005-5732-z
    • Menache, I., Mannor, S., & Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operation Research, 134(1), 215-238. (Pubitemid 40550047)
    • (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 29
    • 0031647695 scopus 로고    scopus 로고
    • Memory-Universal Prediction of Stationary Random Processes
    • PII S0018944898000017
    • Modha, D. S., & Masry, E. (1998). Memory-universal prediction of stationary random processes. IEEE Transactions on Information Theory, 44(1), 117-133. (Pubitemid 128737883)
    • (1998) IEEE Transactions on Information Theory , vol.44 , Issue.1 , pp. 117-133
    • Modha, D.S.1
  • 32
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method
    • Riedmiller, M. (2005). Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In 16th European conference on machine learning (pp. 317-328).
    • (2005) 16th European Conference on Machine Learning , pp. 317-328
    • Riedmiller, M.1
  • 33
    • 0034336998 scopus 로고    scopus 로고
    • Concentration of measure inequalities for Markov chains and φ-mixing processes
    • Samson, P.-M. (2000). Concentration of measure inequalities for Markov chains and φ-mixing processes. Annals of Probability, 28(1), 416-461.
    • (2000) Annals of Probability , vol.28 , Issue.1 , pp. 416-461
    • Samson, P.-M.1
  • 40
    • 21144432799 scopus 로고    scopus 로고
    • Model selection in nonparametric regression
    • Wegkamp, M. (2003). Model selection in nonparametric regression. Annals of Statistics, 31(1), 252-273.
    • (2003) Annals of Statistics , vol.31 , Issue.1 , pp. 252-273
    • Wegkamp, M.1
  • 41
    • 34547098844 scopus 로고    scopus 로고
    • Kernel-based least squares policy iteration for reinforcement learning
    • DOI 10.1109/TNN.2007.899161, Neural Networks for Feedback Control Systems
    • Xu, X., Hu, D., & Lu, X. (2007). Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18, 973-992. (Pubitemid 47098876)
    • (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
    • Xu, X.1    Hu, D.2    Lu, X.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.