메뉴 건너뛰기




Volumn 11, Issue , 2010, Pages 2543-2596

Dual averaging methods for regularized stochastic learning and online optimization

Author keywords

1 regularization; Accelerated gradient methods; Dual averaging methods; Online optimization; Stochastic learning; Structural convex optimization

Indexed keywords

ACCELERATED GRADIENT METHODS; AVERAGING METHOD; ONLINE OPTIMIZATION; STOCHASTIC LEARNING; STRUCTURAL CONVEX OPTIMIZATION;

EID: 78649396336     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (648)

References (68)
  • 2
    • 33747151859 scopus 로고    scopus 로고
    • Interior gradient and proximal methods for convex and conic optimization
    • A. Auslender and M. Teboulle. Interior gradient and proximal methods for convex and conic optimization. SIAM Journal on Optimization, 16:697-725, 2006.
    • (2006) SIAM Journal on Optimization , vol.16 , pp. 697-725
    • Auslender, A.1    Teboulle, M.2
  • 3
    • 84972574511 scopus 로고
    • Weighted sums of certain dependent random variables
    • K. Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 19:357-367, 1967.
    • (1967) Tohoku Mathematical Journal , vol.19 , pp. 357-367
    • Azuma, K.1
  • 4
    • 41549108812 scopus 로고    scopus 로고
    • Algorithms for sparse linear classifiers in the massive data setting
    • S. Balakrishnan and D. Madigan. Algorithms for sparse linear classifiers in the massive data setting. Journal of Machine Learning Research, 9:313-337, 2008.
    • (2008) Journal of Machine Learning Research , vol.9 , pp. 313-337
    • Balakrishnan, S.1    Madigan, D.2
  • 5
    • 85162021730 scopus 로고    scopus 로고
    • Adaptive online gradient descent
    • J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, MIT Press, Cambridge, MA
    • P. Bartlett, E. Hazan, and A. Rakhlin. Adaptive online gradient descent. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 65-72. MIT Press, Cambridge, MA, 2008.
    • (2008) Advances in Neural. Information Processing Systems , vol.20 , pp. 65-72
    • Bartlett, P.1    Hazan, E.2    Rakhlin, A.3
  • 6
    • 85014561619 scopus 로고    scopus 로고
    • A fast iterative shrinkage-threshold algorithm for linear inverse problems
    • A. Beck and M. Teboulle. A fast iterative shrinkage-threshold algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2 (1):183-202, 2009.
    • (2009) SIAM Journal on Imaging Sciences , vol.2 , Issue.1 , pp. 183-202
    • Beck, A.1    Teboulle, M.2
  • 7
    • 85162035281 scopus 로고    scopus 로고
    • The tradeoffs of large scale learning
    • J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, MIT Press, Cambridge, MA
    • L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 161-168. MIT Press, Cambridge, MA, 2008.
    • (2008) Advances in Neural. Information Processing Systems , vol.20 , pp. 161-168
    • Bottou, L.1    Bousquet, O.2
  • 8
    • 84899022736 scopus 로고    scopus 로고
    • Large scale online learning
    • S. Thrun, L. Saul, and B. Schölkopf, editors, MIT Press, Cambridge, MA
    • L. Bottou and Y. LeCun. Large scale online learning. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16, pages 217-224. MIT Press, Cambridge, MA, 2004.
    • (2004) Advances in Neural. Information Processing Systems , vol.16 , pp. 217-224
    • Bottou, L.1    LeCun, Y.2
  • 10
    • 76749123278 scopus 로고    scopus 로고
    • Differentiable sparse coding
    • D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, MIT Press, Cambridge, MA, USA
    • D. M. Bradley and J. A. Bagnell. Differentiable sparse coding. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 113-120. MIT Press, Cambridge, MA, USA, 2009.
    • (2009) Advances in Neural. Information Processing Systems , vol.21 , pp. 113-120
    • Bradley, D.M.1    Bagnell, J.A.2
  • 11
    • 41449091142 scopus 로고    scopus 로고
    • Iterated hard shrinkage for minimization problems with sparsity constraints
    • K. Bredies and D. A. Lorenz. Iterated hard shrinkage for minimization problems with sparsity constraints. SIAM Journal on Scientific Computing, 30 (2):657-683, 2008.
    • (2008) SIAM Journal on Scientific Computing , vol.30 , Issue.2 , pp. 657-683
    • Bredies, K.1    Lorenz, D.A.2
  • 12
    • 78649405622 scopus 로고    scopus 로고
    • An interior-point stochastic approximation method and an l1-regularized delta rule
    • D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, MIT Press
    • P. Carbonetto, M. Schmidt, and N. De Freitas. An interior-point stochastic approximation method and an l1-regularized delta rule. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 233-240. MIT Press, 2009.
    • (2009) Advances in Neural. Information Processing Systems , vol.21 , pp. 233-240
    • Carbonetto, P.1    Schmidt, M.2    De Freitas, N.3
  • 14
    • 0000433247 scopus 로고
    • Convergence analysis of a proximal-like minimization algorithm using Bregman functions
    • August
    • G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3 (3):538-543, August 1993.
    • (1993) SIAM Journal on Optimization , vol.3 , Issue.3 , pp. 538-543
    • Chen, G.1    Teboulle, M.2
  • 15
    • 0031496462 scopus 로고    scopus 로고
    • Convergence rates in forward-backward splitting
    • G. H.-G. Chen and R. T. Rockafellar. Convergence rates in forward-backward splitting. SIAM Journal on Optimization, 7 (2):421-444, 1997.
    • (1997) SIAM Journal on Optimization , vol.7 , Issue.2 , pp. 421-444
    • Chen, G.H.-G.1    Rockafellar, R.T.2
  • 17
    • 75249102673 scopus 로고    scopus 로고
    • Efficient online and batch learning using forward backward splitting
    • J. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research, 10:2873-2898, 2009.
    • (2009) Journal of Machine Learning Research , vol.10 , pp. 2873-2898
    • Duchi, J.1    Singer, Y.2
  • 19
    • 80052423377 scopus 로고    scopus 로고
    • Adaptive subgradient methods for online learning and stochastic optimization
    • To appear in
    • J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. To appear in Journal of Machine Learning Research, 2010.
    • (2010) Journal of Machine Learning Research
    • Duchi, J.1    Hazan, E.2    Singer, Y.3
  • 20
    • 0041657519 scopus 로고    scopus 로고
    • Interior-point methods for massive support vector machines
    • M. C. Ferris and T. S. Munson. Interior-point methods for massive support vector machines. SIAM Journal on Optimization, 13 (3):783-804, 2003.
    • (2003) SIAM Journal on Optimization , vol.13 , Issue.3 , pp. 783-804
    • Ferris, M.C.1    Munson, T.S.2
  • 21
    • 39449126969 scopus 로고    scopus 로고
    • Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems
    • M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE Journal on Selected Topics in Signal Processing, 1 (4):586-597, 2007.
    • (2007) IEEE Journal on Selected Topics in Signal Processing , vol.1 , Issue.4 , pp. 586-597
    • Figueiredo, M.A.T.1    Nowak, R.D.2    Wright, S.J.3
  • 22
    • 0002384441 scopus 로고
    • On tail probabilities for martingales
    • D. A. Freedman. On tail probabilities for martingales. The Annals of Probability, 3 (1):100-118, 1975.
    • (1975) The Annals of Probability , vol.3 , Issue.1 , pp. 100-118
    • Freedman, D.A.1
  • 23
    • 0344875562 scopus 로고    scopus 로고
    • The robustness of the p-norm algorithms
    • C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53:265-299, 2003.
    • (2003) Machine Learning , vol.53 , pp. 265-299
    • Gentile, C.1
  • 24
    • 45849097241 scopus 로고    scopus 로고
    • Local strong convexity and local Lipschitz continuity of the gradient of convex functions
    • R. Goebel and R. T. Rockafellar. Local strong convexity and local Lipschitz continuity of the gradient of convex functions. Journal of Convex Analysis, 15 (2):263-270, 2008.
    • (2008) Journal of Convex Analysis , vol.15 , Issue.2 , pp. 263-270
    • Goebel, R.1    Rockafellar, R.T.2
  • 27
    • 77956508892 scopus 로고    scopus 로고
    • Accelerated gradient methods for stochastic optimization and online learning
    • Y Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors
    • C. Hu, J. T. Kwok, and W. Pan. Accelerated gradient methods for stochastic optimization and online learning. In Y Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 781-789. 2009.
    • (2009) Advances in Neural. Information Processing Systems , vol.22 , pp. 781-789
    • Hu, C.1    Kwok, J.T.2    Pan, W.3
  • 28
    • 73249115078 scopus 로고    scopus 로고
    • A randomized incremental subgradient method for distributed optimization in networked systems
    • B. Johansson, M. Rabi, and M. Johansson. A randomized incremental subgradient method for distributed optimization in networked systems. SIAM Journal on Optimization, 20 (3):1157-1170, 2009.
    • (2009) SIAM Journal on Optimization , vol.20 , Issue.3 , pp. 1157-1170
    • Johansson, B.1    Rabi, M.2    Johansson, M.3
  • 29
    • 84864918805 scopus 로고    scopus 로고
    • Large deviations of vector-valued martingales in 2-smooth normed spaces
    • Manuscript submitted to, arXiv:0809.0813vl
    • A. Juditsky and A. Nemirovski. Large deviations of vector-valued martingales in 2-smooth normed spaces. Manuscript submitted to The Annals of Probability, 2008. arXiv:0809.0813vl.
    • (2008) The Annals of Probability
    • Juditsky, A.1    Nemirovski, A.2
  • 30
    • 31344435933 scopus 로고    scopus 로고
    • Recursive aggregation of estimators by mirror descent algorithm with averaging
    • A. Juditsky, A. Nazin, A. Tsybakov, and N. Vayatis. Recursive aggregation of estimators by mirror descent algorithm with averaging. Problems of Information Transmission, 41 (4):368-384, 2005.
    • (2005) Problems of Information Transmission , vol.41 , Issue.4 , pp. 368-384
    • Juditsky, A.1    Nazin, A.2    Tsybakov, A.3    Vayatis, N.4
  • 31
    • 78649399493 scopus 로고    scopus 로고
    • On the generalization ability of online strongly convex programming algorithms
    • D. Koller, D. Schuurmans, Y Bengio, and L. Bottou, editors, MIT Press, Cambridge, MA, USA
    • S. M. Kakade and A. Tewari. On the generalization ability of online strongly convex programming algorithms. In D. Koller, D. Schuurmans, Y Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 801-808. MIT Press, Cambridge, MA, USA, 2009.
    • (2009) Advances in Neural. Information Processing Systems , vol.21 , pp. 801-808
    • Kakade, S.M.1    Tewari, A.2
  • 32
    • 0001079593 scopus 로고
    • Stochastic estimation of the maximum of a regression function
    • J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annuals of Mathematical Statistics, 23:462-466, 1952.
    • (1952) The Annuals of Mathematical Statistics , vol.23 , pp. 462-466
    • Kiefer, J.1    Wolfowitz, J.2
  • 33
    • 0008815681 scopus 로고    scopus 로고
    • Exponentiated gradient versus gradient descent for linear predictors
    • J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132 (1):1-63, 1997.
    • (1997) Information and Computation , vol.132 , Issue.1 , pp. 1-63
    • Kivinen, J.1    Warmuth, M.K.2
  • 35
    • 84862283078 scopus 로고    scopus 로고
    • An optimal method for stochastic composite optimization
    • To appear in
    • G. Lan. An optimal method for stochastic composite optimization. To appear in Mathematical Programming, 2010.
    • (2010) Mathematical Programming
    • Lan, G.1
  • 36
    • 78649429189 scopus 로고    scopus 로고
    • Validation analysis of robust stochastic approximation methods
    • Submitted to
    • G. Lan, A. Nemirovski, and A. Shapiro. Validation analysis of robust stochastic approximation methods. Submitted to Mathematical Programming, 2008.
    • (2008) Mathematical Programming
    • Lan, G.1    Nemirovski, A.2    Shapiro, A.3
  • 37
    • 78651417720 scopus 로고    scopus 로고
    • Primal-dual first-order methods with O (1/ε) iterationcomplexity for cone programming
    • February, Published online, DOI 10.1007/sl0107-008-0261-6
    • G. Lan, Z. Lu, and R. D. C. Monteiro. Primal-dual first-order methods with O (1/ε) iterationcomplexity for cone programming. Mathematical Programming, February 2009. Published online, DOI 10.1007/sl0107-008-0261-6.
    • (2009) Mathematical Programming
    • Lan, G.1    Lu, Z.2    Monteiro, R.D.C.3
  • 39
    • 0032203257 scopus 로고    scopus 로고
    • Gradient-based learning applied to document recognition
    • Dataset available at
    • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 (11):2278-2324, 1998. Dataset available at http://yann. lecun.com/exdb/mnist.
    • (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
    • LeCun, Y.1    Bottou, L.2    Bengio, Y.3    Haffner, P.4
  • 40
    • 0000345334 scopus 로고
    • Splitting algorithms for the sum of two nonlinear operators
    • P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16:964-979, 1979.
    • (1979) SIAM Journal on Numerical Analysis , vol.16 , pp. 964-979
    • Lions, P.-L.1    Mercier, B.2
  • 42
    • 0036342213 scopus 로고    scopus 로고
    • Incremental subgradient methods for nondifferentiable optimization
    • A. Nedic and D. P. Bertsekas. Incremental subgradient methods for nondifferentiable optimization. SIAM Journal on Optimization, 12 (1):109-138, 2001.
    • (2001) SIAM Journal on Optimization , vol.12 , Issue.1 , pp. 109-138
    • Nedic, A.1    Bertsekas, D.P.2
  • 43
    • 70450197241 scopus 로고    scopus 로고
    • Robust stochastic approximation approach to stochastic programming
    • A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19 (4):1574-1609, 2009.
    • (2009) SIAM Journal on Optimization , vol.19 , Issue.4 , pp. 1574-1609
    • Nemirovski, A.1    Juditsky, A.2    Lan, G.3    Shapiro, A.4
  • 45
    • 34548480020 scopus 로고
    • 2)
    • Translated from Russian by A. Rosa
    • 2). Soviet Math. Doklady, 27 (2):372-376, 1983. Translated from Russian by A. Rosa.
    • (1983) Soviet Math. Doklady , vol.27 , Issue.2 , pp. 372-376
    • Nesterov, Yu.1
  • 47
    • 17444406259 scopus 로고    scopus 로고
    • Smooth minimization of nonsmooth functions
    • Yu. Nesterov. Smooth minimization of nonsmooth functions. Mathematical Programming, 103:127-152, 2005.
    • (2005) Mathematical Programming , vol.103 , pp. 127-152
    • Nesterov, Yu.1
  • 48
    • 67651063011 scopus 로고    scopus 로고
    • Gradient methods for minimizing composite objective function
    • Catholic University of Louvain, Center for Operations Research and Econometrics
    • Yu. Nesterov. Gradient methods for minimizing composite objective function. Technical Report 2007/76, Catholic University of Louvain, Center for Operations Research and Econometrics, 2007.
    • (2007) Technical Report 2007/76
    • Nesterov, Yu.1
  • 50
    • 65249121279 scopus 로고    scopus 로고
    • Primal-dual subgradient methods for convex problems
    • Appeared early as CORE discussion paper 2005/67, Catholic University of Louvain, Center for Operations Research and Econometrics
    • Yu. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120 (1):221-259, 2009. Appeared early as CORE discussion paper 2005/67, Catholic University of Louvain, Center for Operations Research and Econometrics.
    • (2009) Mathematical Programming , vol.120 , Issue.1 , pp. 221-259
    • Nesterov, Yu.1
  • 51
    • 44349128988 scopus 로고    scopus 로고
    • Confidence level solutions for stochastic programming
    • Yu. Nesterov and J.-Ph. Vial. Confidence level solutions for stochastic programming. Automatica, 44 (6):1559-1568, 2008.
    • (2008) Automatica , vol.44 , Issue.6 , pp. 1559-1568
    • Nesterov, Yu.1    Vial, J.-Ph.2
  • 53
    • 70450201554 scopus 로고    scopus 로고
    • Incremental stochastic subgradient algorithms for convex optimization
    • S. Sundhar Ram, A. Nedic, and V. V. Veeravalli. Incremental stochastic subgradient algorithms for convex optimization. SIAM Journal on Optimization, 20 (2):691-717, 2009.
    • (2009) SIAM Journal on Optimization , vol.20 , Issue.2 , pp. 691-717
    • Ram, S.S.1    Nedic, A.2    Veeravalli, V.V.3
  • 57
    • 77951165785 scopus 로고    scopus 로고
    • Mind the duality gap: Logarithmic regret algorithms for online optimization
    • D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, MIT Press
    • S. Shalev-Shwartz and S. M. Kakade. Mind the duality gap: Logarithmic regret algorithms for online optimization. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1457-1464. MIT Press, 2009.
    • (2009) Advances in Neural. Information Processing Systems , vol.21 , pp. 1457-1464
    • Shalev-Shwartz, S.1    Kakade, S.M.2
  • 58
    • 84864036264 scopus 로고    scopus 로고
    • Convex repeated games and Fenchel duality
    • B. Schölkopf, J. Platt, and T. Hofmann, editors, MIT Press
    • S. Shalev-Shwartz and Y. Singer. Convex repeated games and Fenchel duality. In B. Schölkopf, J. Platt, and T. Hofmann, editors, Advances in Neural Information Processing Systems, volume 19, pages 1265-1272. MIT Press, 2006.
    • (2006) Advances in Neural. Information Processing Systems , vol.19 , pp. 1265-1272
    • Shalev-Shwartz, S.1    Singer, Y.2
  • 62
    • 0032222083 scopus 로고    scopus 로고
    • An incremental gradient (-projection) method with momentum term and adaptive stepsize rule
    • P. Tseng. An incremental gradient (-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization, 8 (2):506-531, 1998.
    • (1998) SIAM Journal on Optimization , vol.8 , Issue.2 , pp. 506-531
    • Tseng, P.1
  • 63
    • 0033884548 scopus 로고    scopus 로고
    • A modified forward-backward splitting method for maximal monotone mappings
    • P. Tseng. A modified forward-backward splitting method for maximal monotone mappings. SIAM Journal on Control and Optimization, 38 (2):431-446, 2000.
    • (2000) SIAM Journal on Control and Optimization , vol.38 , Issue.2 , pp. 431-446
    • Tseng, P.1
  • 64
    • 70049111607 scopus 로고    scopus 로고
    • On accelerated proximal gradient methods for convex-concave optimization
    • Manuscript submitted to
    • P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Manuscript submitted to SIAM Journal on Optimization, 2008.
    • (2008) SIAM Journal on Optimization
    • Tseng, P.1
  • 65
    • 0027607344 scopus 로고
    • On the convergence of exponential multiplier method for convex programming
    • P. Tseng and D. P. Bertsekas. On the convergence of exponential multiplier method for convex programming. Mathematical Programming, 60:1-19, 1993.
    • (1993) Mathematical Programming , vol.60 , pp. 1-19
    • Tseng, P.1    Bertsekas, D.P.2
  • 67
    • 14344259207 scopus 로고    scopus 로고
    • Solving large scale linear prediction problems using stochastic gradient descent algorithms
    • Banff, Alberta, Canada
    • T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21st International Conference on Machine Learning (ICML), pages 116-123, Banff, Alberta, Canada, 2004.
    • (2004) Proceedings of the 21st International Conference on Machine Learning (ICML) , pp. 116-123
    • Zhang, T.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.