-
1
-
-
84899845537
-
A reliable effective terascale linear learning system
-
A. Agarwal, O. Chapelle, M. Dudík, and J. Langford. A reliable effective terascale linear learning system. The Journal of Machine Learning Research, 15(1): 1111–1133, 2014.
-
(2014)
The Journal of Machine Learning Research
, vol.15
, Issue.1
, pp. 1111-1133
-
-
Agarwal, A.1
Chapelle, O.2
Dudík, M.3
Langford, J.4
-
7
-
-
84937908747
-
Saga: A fast incremental gradient method with support for non-strongly convex composite objectives
-
A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems 27, pages 1646–1654, 2014.
-
(2014)
Advances in Neural Information Processing Systems
, vol.27
, pp. 1646-1654
-
-
Defazio, A.1
Bach, F.2
Lacoste-Julien, S.3
-
8
-
-
0020763823
-
Truncated-Newton algorithms for large-scale unconstrained optimization
-
R. S. Dembo and T. Steihaug. Truncated-Newton algorithms for large-scale unconstrained optimization. Mathematical Programming, 26(2):190–212, 1983.
-
(1983)
Mathematical Programming
, vol.26
, Issue.2
, pp. 190-212
-
-
Dembo, R.S.1
Steihaug, T.2
-
9
-
-
0000746005
-
Inexact newton methods
-
R. S. Dembo, S. C. Eisenstat, and T. Steihaug. Inexact Newton methods. SIAM Journal on Numerical analysis, 19(2):400–408, 1982.
-
(1982)
SIAM Journal on Numerical Analysis
, vol.19
, Issue.2
, pp. 400-408
-
-
Dembo, R.S.1
Eisenstat, S.C.2
Steihaug, T.3
-
10
-
-
0002663672
-
Quasi-Newton methods, motivation and theory
-
J. E. Dennis, Jr and J. J. Moré. Quasi-Newton methods, motivation and theory. SIAM review, 19(1): 46–89, 1977.
-
(1977)
SIAM Review
, vol.19
, Issue.1
, pp. 46-89
-
-
Dennis, J.E.1
Moré, J.J.2
-
11
-
-
80052250414
-
Adaptive subgradient methods for online learning and stochastic optimization
-
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12:2121–2159, 2011.
-
(2011)
The Journal of Machine Learning Research
, vol.12
, pp. 2121-2159
-
-
Duchi, J.1
Hazan, E.2
Singer, Y.3
-
13
-
-
84898963415
-
Accelerating stochastic gradient descent using predictive variance reduction
-
R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems, pages 315–323, 2013.
-
(2013)
Advances in Neural Information Processing Systems
, pp. 315-323
-
-
Johnson, R.1
Zhang, T.2
-
15
-
-
84876811202
-
RCV1: A new benchmark collection for text categorization research
-
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research, 5:361–397, 2004.
-
(2004)
The Journal of Machine Learning Research
, vol.5
, pp. 361-397
-
-
Lewis, D.D.1
Yang, Y.2
Rose, T.G.3
Li, F.4
-
16
-
-
33646887390
-
On the limited memory BFGS method for large scale optimization
-
D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1-3):503–528, 1989.
-
(1989)
Mathematical Programming
, vol.45
, Issue.1-3
, pp. 503-528
-
-
Liu, D.C.1
Nocedal, J.2
-
20
-
-
65249121279
-
Primal-dual subgradient methods for convex problems
-
Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120 (1):221–259, 2009.
-
(2009)
Mathematical Programming
, vol.120
, Issue.1
, pp. 221-259
-
-
Nesterov, Y.1
-
22
-
-
0000255539
-
Fast exact multiplication by the Hessian
-
B. A. Pearlmutter. Fast exact multiplication by the Hessian. Neural Computation, 6(1):147–160, 1994.
-
(1994)
Neural Computation
, vol.6
, Issue.1
, pp. 147-160
-
-
Pearlmutter, B.A.1
-
23
-
-
84880089568
-
Parallel stochastic gradient algorithms for large-scale matrix completion
-
B. Recht and C. Ré. Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation, 5(2):201–226, 2013.
-
(2013)
Mathematical Programming Computation
, vol.5
, Issue.2
, pp. 201-226
-
-
Recht, B.1
Ré, C.2
-
25
-
-
84877725219
-
A stochastic gradient method with an exponential convergence rate for finite training sets
-
N. L. Roux, M. Schmidt, and F. R. Bach. A stochastic gradient method with an exponential convergence rate for finite training sets. In Advances in Neural Information Processing Systems, pages 2663–2671, 2012.
-
(2012)
Advances in Neural Information Processing Systems
, pp. 2663-2671
-
-
Roux, N.L.1
Schmidt, M.2
Bach, F.R.3
-
27
-
-
84875134236
-
Stochastic dual coordinate ascent methods for regularized loss
-
S. Shalev-Shwartz and T. Zhang. Stochastic dual coordinate ascent methods for regularized loss. The Journal of Machine Learning Research, 14(1):567–599, 2013.
-
(2013)
The Journal of Machine Learning Research
, vol.14
, Issue.1
, pp. 567-599
-
-
Shalev-Shwartz, S.1
Zhang, T.2
-
29
-
-
84892623436
-
On the importance of initialization and momentum in deep learning
-
I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning, pages 1139–1147, 2013.
-
(2013)
International Conference on Machine Learning
, pp. 1139-1147
-
-
Sutskever, I.1
Martens, J.2
Dahl, G.3
Hinton, G.4
-
30
-
-
84899020608
-
Variance reduction for stochastic gradient optimization
-
C. Wang, X. Chen, A. J. Smola, and E. P. Xing. Variance reduction for stochastic gradient optimization. In Advances in Neural Information Processing Systems, pages 181–189, 2013.
-
(2013)
Advances in Neural Information Processing Systems
, pp. 181-189
-
-
Wang, C.1
Chen, X.2
Smola, A.J.3
Xing, E.P.4
|