메뉴 건너뛰기




Volumn , Issue , 2012, Pages 564-573

Sparse Q-learning with mirror descent

Author keywords

[No Author keywords available]

Indexed keywords

GENERATING FUNCTIONS; HIGH DIMENSIONAL SPACES; LEGENDRE TRANSFORMS; MAHALANOBIS DISTANCES; MARKOV DECISION PROCESSES; ONLINE CONVEX OPTIMIZATIONS; REINFORCEMENT LEARNING METHOD; TEMPORAL DIFFERENCES;

EID: 84886008156     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (25)

References (32)
  • 3
    • 49949144765 scopus 로고
    • The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming
    • L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(3):200-217, 1967.
    • (1967) USSR Computational Mathematics and Mathematical Physics , vol.7 , Issue.3 , pp. 200-217
    • Bregman, L.1
  • 5
    • 0037403111 scopus 로고    scopus 로고
    • Mirror descent and nonlinear projected subgradient methods for convex optimization
    • A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 2003.
    • (2003) Operations Research Letters
    • Beck, A.1    Teboulle, M.2
  • 6
    • 84866655737 scopus 로고    scopus 로고
    • The ordered subsets mirror descent optimization method with applications to tomography
    • A. Ben-Tal, T. Margalit, and A. Nemirovski. The ordered subsets mirror descent optimization method with applications to tomography. SIAM Journal of Optimization, Jan 2001.
    • (2001) SIAM Journal of Optimization, Ja
    • Ben-Tal, A.1    Margalit, T.2    Nemirovski, A.3
  • 8
    • 0344875562 scopus 로고    scopus 로고
    • The robustness of the p-norm algorithms
    • Decembe
    • Claudio Gentile. The robustness of the p-norm algorithms. Mach. Learn., 53:265-299, December 2003.
    • (2003) Mach. Learn. , vol.53 , pp. 265-299
    • Gentile, C.1
  • 14
    • 0008815681 scopus 로고
    • Exponentiated gradient versus gradient descent for linear predictors
    • Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1995.
    • (1995) Information and Computation , vol.132
    • Kivinen, J.1    Warmuth, M.K.2
  • 15
    • 34250091945 scopus 로고
    • Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm
    • Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm. In Machine Learning, pages 285-318, 1988.
    • (1988) Machine Learning , pp. 285-318
    • Littlestone, N.1
  • 17
    • 70349322784 scopus 로고    scopus 로고
    • Learning representation and control in markov decision processes: New frontiers
    • S. Mahadevan. Learning Representation and Control in Markov Decision Processes: New Frontiers. Foundations and Trends in Machine Learning, 1(4):403-565, 2009.
    • (2009) Foundations and Trends in Machine Learning , vol.1 , Issue.4 , pp. 403-565
    • Mahadevan, S.1
  • 18
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: A laplacian framework for learning representation and control in markov decision processes
    • S. Mahadevan and M. Maggioni. Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes. Journal of Machine Learning Research, 8:2169-2231, 2007.
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
    • Mahadevan, S.1    Maggioni, M.2
  • 19
    • 65249121279 scopus 로고    scopus 로고
    • Primal-dual subgradient methods for convex problems
    • Jan
    • Y Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, Jan 2009.
    • (2009) Mathematical Programming
    • Nesterov, Y.1
  • 20
    • 70450197241 scopus 로고    scopus 로고
    • Robust stochastic approximation approach to stochastic programming
    • A Nemirovski, A Juditsky, G Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 14(4):1574-1609, 2009.
    • (2009) SIAM Journal on Optimization , vol.14 , Issue.4 , pp. 1574-1609
    • Nemirovski, A.1    Juditsky, A.2    Lan, G.3    Shapiro, A.4
  • 23
    • 77956538796 scopus 로고    scopus 로고
    • Feature selection using regularization in approximate linear programs for markov decision processes
    • M. Petrik, G. Taylor, R. Parr, and S. Zilberstein. Feature selection using regularization in approximate linear programs for markov decision processes. In ICML, pages 871-878, 2010.
    • (2010) ICML , pp. 871-878
    • Petrik, M.1    Taylor, G.2    Parr, R.3    Zilberstein, S.4
  • 28
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 29
    • 0008815095 scopus 로고
    • On the worst-case analysis of temporal-difference learning algorithms
    • Morgan Kaufmann
    • Robert Schapire Schapire and Manfred K. Warmuth. On the worst-case analysis of temporal-difference learning algorithms. In Machine Learning, pages 266-274. Morgan Kaufmann, 1994.
    • (1994) Machine Learning , pp. 266-274
    • Schapire, R.S.1    Warmuth, M.K.2
  • 30
    • 0035273403 scopus 로고    scopus 로고
    • Online learning control by association and reinforcement
    • J. Si and Y.T. Wang. Online learning control by association and reinforcement. Neural Networks, IEEE Transactions on, 12(2):264-276, 2001.
    • (2001) Neural Networks, IEEE Transactions on , vol.12 , Issue.2 , pp. 264-276
    • Si, J.1    Wang, Y.T.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.