-
1
-
-
84877760312
-
Large scale distributed deep networks
-
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al., "Large scale distributed deep networks," in Advances in neural information processing systems, 2012, pp. 1223-1231.
-
(2012)
Advances in Neural Information Processing Systems
, pp. 1223-1231
-
-
Dean, J.1
Corrado, G.2
Monga, R.3
Chen, K.4
Devin, M.5
Mao, M.6
Senior, A.7
Tucker, P.8
Yang, K.9
Le, Q.V.10
-
2
-
-
84937912100
-
Scaling distributed machine learning with the parameter server
-
M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su, "Scaling distributed machine learning with the parameter server," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pp. 583-598.
-
(2014)
11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)
, pp. 583-598
-
-
Li, M.1
Andersen, D.G.2
Park, J.W.3
Smola, A.J.4
Ahmed, A.5
Josifovski, V.6
Long, J.7
Shekita, E.J.8
Su, B.-Y.9
-
3
-
-
85069497682
-
Project adam: Building an efficient and scalable deep learning training system
-
T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, "Project adam: Building an efficient and scalable deep learning training system," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pp. 571-582.
-
(2014)
11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)
, pp. 571-582
-
-
Chilimbi, T.1
Suzue, Y.2
Apacible, J.3
Kalyanaraman, K.4
-
4
-
-
84905269646
-
On parallelizability of stochastic gradient descent for speech dnns
-
IEEE
-
F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, "On parallelizability of stochastic gradient descent for speech dnns," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 235-239.
-
(2014)
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, pp. 235-239
-
-
Seide, F.1
Fu, H.2
Droppo, J.3
Li, G.4
Yu, D.5
-
5
-
-
84910099898
-
Distributed asynchronous optimization of convolutional neural networks
-
W. Chan and I. Lane, "Distributed asynchronous optimization of convolutional neural networks." in INTERSPEECH, 2014, pp. 1073-1077.
-
(2014)
INTERSPEECH
, pp. 1073-1077
-
-
Chan, W.1
Lane, I.2
-
6
-
-
84965099508
-
Asynchronous parallel stochastic gradient for nonconvex optimization
-
X. Lian, Y. Huang, Y. Li, and J. Liu, "Asynchronous parallel stochastic gradient for nonconvex optimization," in Advances in Neural Information Processing Systems, 2015, pp. 2737-2745.
-
(2015)
Advances in Neural Information Processing Systems
, pp. 2737-2745
-
-
Lian, X.1
Huang, Y.2
Li, Y.3
Liu, J.4
-
7
-
-
84892854517
-
Stochastic first-and zeroth-order methods for nonconvex stochastic programming
-
S. Ghadimi and G. Lan, "Stochastic first-and zeroth-order methods for nonconvex stochastic programming," SIAM Journal on Optimization, vol. 23, no. 4, pp. 2341-2368, 2013.
-
(2013)
SIAM Journal on Optimization
, vol.23
, Issue.4
, pp. 2341-2368
-
-
Ghadimi, S.1
Lan, G.2
-
8
-
-
84857527621
-
Optimal distributed online prediction using mini-batches
-
O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao, "Optimal distributed online prediction using mini-batches," Journal of Machine Learning Research, vol. 13, no. Jan, pp. 165-202, 2012.
-
(2012)
Journal of Machine Learning Research
, vol.13
, Issue.JAN
, pp. 165-202
-
-
Dekel, O.1
Gilad-Bachrach, R.2
Shamir, O.3
Xiao, L.4
-
9
-
-
70450197241
-
Robust stochastic approximation approach to stochastic programming
-
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, "Robust stochastic approximation approach to stochastic programming," SIAM Journal on optimization, vol. 19, no. 4, pp. 1574-1609, 2009.
-
(2009)
SIAM Journal on Optimization
, vol.19
, Issue.4
, pp. 1574-1609
-
-
Nemirovski, A.1
Juditsky, A.2
Lan, G.3
Shapiro, A.4
-
10
-
-
84898963415
-
Accelerating stochastic gradient descent using predictive variance reduction
-
R. Johnson and T. Zhang, "Accelerating stochastic gradient descent using predictive variance reduction," in Advances in Neural Information Processing Systems, 2013, pp. 315-323.
-
(2013)
Advances in Neural Information Processing Systems
, pp. 315-323
-
-
Johnson, R.1
Zhang, T.2
-
11
-
-
84919793228
-
A proximal stochastic gradient method with progressive variance reduction
-
L. Xiao and T. Zhang, "A proximal stochastic gradient method with progressive variance reduction," SIAM Journal on Optimization, vol. 24, no. 4, pp. 2057-2075, 2014.
-
(2014)
SIAM Journal on Optimization
, vol.24
, Issue.4
, pp. 2057-2075
-
-
Xiao, L.1
Zhang, T.2
-
12
-
-
56449086680
-
A dual coordinate descent method for large-scale linear svm
-
ACM
-
C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan, "A dual coordinate descent method for large-scale linear svm," in Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp. 408-415.
-
(2008)
Proceedings of the 25th International Conference on Machine Learning
, pp. 408-415
-
-
Hsieh, C.-J.1
Chang, K.-W.2
Lin, C.-J.3
Keerthi, S.S.4
Sundararajan, S.5
-
14
-
-
84912553418
-
Fast distributed coordinate descent for non-strongly convex losses
-
IEEE
-
O. Fercoq, Z. Qu, P. Richtárik, and M. Takáč, "Fast distributed coordinate descent for non-strongly convex losses," in 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2014, pp. 1-6.
-
(2014)
2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
, pp. 1-6
-
-
Fercoq, O.1
Qu, Z.2
Richtárik, P.3
Takáč, M.4
-
17
-
-
84930680874
-
An asynchronous parallel stochastic coordinate descent algorithm
-
J. Liu, S. J. Wright, C. Re, V. Bittorf, and S. Sridhar, "An asynchronous parallel stochastic coordinate descent algorithm," Journal of Machine Learning Research, vol. 16, no. 285-322, pp. 1-5, 2015.
-
(2015)
Journal of Machine Learning Research
, vol.16
, Issue.285-322
, pp. 1-5
-
-
Liu, J.1
Wright, S.J.2
Re, C.3
Bittorf, V.4
Sridhar, S.5
-
18
-
-
84937863713
-
Communication-efficient distributed dual coordinate ascent
-
M. Jaggi, V. Smith, M. Takac, J. Terhorst, S. Krishnan, T. Hofmann, and M. I. Jordan, "Communication-efficient distributed dual coordinate ascent," in Advances in Neural Information Processing Systems, 2014, pp. 3068-3076.
-
(2014)
Advances in Neural Information Processing Systems
, pp. 3068-3076
-
-
Jaggi, M.1
Smith, V.2
Takac, M.3
Terhorst, J.4
Krishnan, S.5
Hofmann, T.6
Jordan, M.I.7
-
19
-
-
84865692149
-
Efficiency of coordinate descent methods on huge-scale optimization problems
-
Y. Nesterov, "Efficiency of coordinate descent methods on huge-scale optimization problems," SIAM Journal on Optimization, vol. 22, no. 2, pp. 341-362, 2012.
-
(2012)
SIAM Journal on Optimization
, vol.22
, Issue.2
, pp. 341-362
-
-
Nesterov, Y.1
-
20
-
-
84925448697
-
Asynchronous stochastic coordinate descent: Parallelism and convergence properties
-
J. Liu and S. J. Wright, "Asynchronous stochastic coordinate descent: Parallelism and convergence properties," SIAM Journal on Optimization, vol. 25, no. 1, pp. 351-376, 2015.
-
(2015)
SIAM Journal on Optimization
, vol.25
, Issue.1
, pp. 351-376
-
-
Liu, J.1
Wright, S.J.2
-
21
-
-
85006096866
-
Asynchronous accelerated stochastic gradient descent
-
Q. Meng, W. Chen, J. Yu, T. Wang, Z.-M. Ma, and T.-Y. Liu, "Asynchronous accelerated stochastic gradient descent," in Proceedings of the 25th international joint conference on Artificial Intelligence, 2016.
-
(2016)
Proceedings of the 25th International Joint Conference on Artificial Intelligence
-
-
Meng, Q.1
Chen, W.2
Yu, J.3
Wang, T.4
Ma, Z.-M.5
Liu, T.-Y.6
-
22
-
-
84897116612
-
Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
-
P. Richtárik and M. Takáč, "Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function," Mathematical Programming, vol. 144, no. 1-2, pp. 1-38, 2014.
-
(2014)
Mathematical Programming
, vol.144
, Issue.1-2
, pp. 1-38
-
-
Richtárik, P.1
Takáč, M.2
-
24
-
-
85029443880
-
-
accessed January 11, 2016
-
"Cifar10 dataset," https://www.cs.toronto.edu/~kriz/cifar.html, accessed January 11, 2016.
-
Cifar10 Dataset
-
-
-
25
-
-
85015254242
-
-
accessed January 11, 2016
-
"Cifar10 model," https://github.com/eladhoffer/ConvNet-torch/blob/master/Models/Model.lua, accessed January 11, 2016.
-
Cifar10 Model
-
-
-
26
-
-
84904163933
-
Dropout: A simple way to prevent neural networks from overfitting
-
N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
-
(2014)
Journal of Machine Learning Research
, vol.15
, Issue.1
, pp. 1929-1958
-
-
Srivastava, N.1
Hinton, G.E.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
28
-
-
84872543023
-
Efficient backprop
-
Springer
-
Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, "Efficient backprop," in Neural networks: Tricks of the trade. Springer, 2012, pp. 9-48.
-
(2012)
Neural Networks: Tricks of the Trade
, pp. 9-48
-
-
LeCun, Y.A.1
Bottou, L.2
Orr, G.B.3
Müller, K.-R.4
|