SCOPUS 정보 검색 플랫폼

Proceedings of MLHPC 2016: Machine Learning in HPC Environments - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis

Volumn , Issue , 2017, Pages 56-62

Practical efficiency of asynchronous stochastic gradient descent

(2) Bhardwaj, Onkar a Cong, Guojing a

a IBM T J WATSON RESEARCH CENTER (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; LEARNING SYSTEMS; SPEED; STOCHASTIC SYSTEMS;

COMMUNICATION COST; COMPUTING RESOURCE; CONVERGENCE RATES; CONVERGENCE SPEED; FINITE-TIME CONVERGENCE; LARGE-SCALE MACHINE LEARNING; MATHEMATICAL ANALYSIS; STOCHASTIC GRADIENT DESCENT;

COST BENEFIT ANALYSIS;

EID: 85015209547 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/MLHPC.2016.10 Document Type: Conference Paper

Times cited : (7)

References (28)

1
- 84877760312
- Large scale distributed deep networks
- J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al., "Large scale distributed deep networks," in Advances in neural information processing systems, 2012, pp. 1223-1231.
- (2012) Advances in Neural Information Processing Systems , pp. 1223-1231
- Dean, J.¹ Corrado, G.² Monga, R.³ Chen, K.⁴ Devin, M.⁵ Mao, M.⁶ Senior, A.⁷ Tucker, P.⁸ Yang, K.⁹ Le, Q.V.¹⁰

2
- 84937912100
- Scaling distributed machine learning with the parameter server
- M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su, "Scaling distributed machine learning with the parameter server," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pp. 583-598.
- (2014) 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) , pp. 583-598
- Li, M.¹ Andersen, D.G.² Park, J.W.³ Smola, A.J.⁴ Ahmed, A.⁵ Josifovski, V.⁶ Long, J.⁷ Shekita, E.J.⁸ Su, B.-Y.⁹

3
- 85069497682
- Project adam: Building an efficient and scalable deep learning training system
- T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, "Project adam: Building an efficient and scalable deep learning training system," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pp. 571-582.
- (2014) 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) , pp. 571-582
- Chilimbi, T.¹ Suzue, Y.² Apacible, J.³ Kalyanaraman, K.⁴

4
- 84905269646
- On parallelizability of stochastic gradient descent for speech dnns
- IEEE
- F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, "On parallelizability of stochastic gradient descent for speech dnns," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 235-239.
- (2014) 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 235-239
- Seide, F.¹ Fu, H.² Droppo, J.³ Li, G.⁴ Yu, D.⁵

5
- 84910099898
- Distributed asynchronous optimization of convolutional neural networks
- W. Chan and I. Lane, "Distributed asynchronous optimization of convolutional neural networks." in INTERSPEECH, 2014, pp. 1073-1077.
- (2014) INTERSPEECH , pp. 1073-1077
- Chan, W.¹ Lane, I.²

6
- 84965099508
- Asynchronous parallel stochastic gradient for nonconvex optimization
- X. Lian, Y. Huang, Y. Li, and J. Liu, "Asynchronous parallel stochastic gradient for nonconvex optimization," in Advances in Neural Information Processing Systems, 2015, pp. 2737-2745.
- (2015) Advances in Neural Information Processing Systems , pp. 2737-2745
- Lian, X.¹ Huang, Y.² Li, Y.³ Liu, J.⁴

7
- 84892854517
- Stochastic first-and zeroth-order methods for nonconvex stochastic programming
- S. Ghadimi and G. Lan, "Stochastic first-and zeroth-order methods for nonconvex stochastic programming," SIAM Journal on Optimization, vol. 23, no. 4, pp. 2341-2368, 2013.
- (2013) SIAM Journal on Optimization , vol.23 , Issue.4 , pp. 2341-2368
- Ghadimi, S.¹ Lan, G.²

8
- 84857527621
- Optimal distributed online prediction using mini-batches
- O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao, "Optimal distributed online prediction using mini-batches," Journal of Machine Learning Research, vol. 13, no. Jan, pp. 165-202, 2012.
- (2012) Journal of Machine Learning Research , vol.13 , Issue.JAN , pp. 165-202
- Dekel, O.¹ Gilad-Bachrach, R.² Shamir, O.³ Xiao, L.⁴

9
- 70450197241
- Robust stochastic approximation approach to stochastic programming
- A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, "Robust stochastic approximation approach to stochastic programming," SIAM Journal on optimization, vol. 19, no. 4, pp. 1574-1609, 2009.
- (2009) SIAM Journal on Optimization , vol.19 , Issue.4 , pp. 1574-1609
- Nemirovski, A.¹ Juditsky, A.² Lan, G.³ Shapiro, A.⁴

10
- 84898963415
- Accelerating stochastic gradient descent using predictive variance reduction
- R. Johnson and T. Zhang, "Accelerating stochastic gradient descent using predictive variance reduction," in Advances in Neural Information Processing Systems, 2013, pp. 315-323.
- (2013) Advances in Neural Information Processing Systems , pp. 315-323
- Johnson, R.¹ Zhang, T.²

11
- 84919793228
- A proximal stochastic gradient method with progressive variance reduction
- L. Xiao and T. Zhang, "A proximal stochastic gradient method with progressive variance reduction," SIAM Journal on Optimization, vol. 24, no. 4, pp. 2057-2075, 2014.
- (2014) SIAM Journal on Optimization , vol.24 , Issue.4 , pp. 2057-2075
- Xiao, L.¹ Zhang, T.²

12
- 56449086680
- A dual coordinate descent method for large-scale linear svm
- ACM
- C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan, "A dual coordinate descent method for large-scale linear svm," in Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp. 408-415.
- (2008) Proceedings of the 25th International Conference on Machine Learning , pp. 408-415
- Hsieh, C.-J.¹ Chang, K.-W.² Lin, C.-J.³ Keerthi, S.S.⁴ Sundararajan, S.⁵

13
- 84899031876
- P. Richtárik and M. Takác, "Distributed coordinate descent method for learning with big data," 2013.
- (2013) Distributed Coordinate Descent Method for Learning with Big Data
- Richtárik, P.¹ Takác, M.²

14
- 84912553418
- Fast distributed coordinate descent for non-strongly convex losses
- IEEE
- O. Fercoq, Z. Qu, P. Richtárik, and M. Takáč, "Fast distributed coordinate descent for non-strongly convex losses," in 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2014, pp. 1-6.
- (2014) 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) , pp. 1-6
- Fercoq, O.¹ Qu, Z.² Richtárik, P.³ Takáč, M.⁴

15
- 84969531564
- arXiv preprint arXiv:1412.6293
- J. Konečnỳ, Z. Qu, and P. Richtárik, "Semi-stochastic coordinate descent," arXiv preprint arXiv:1412.6293, 2014.
- (2014) Semi-stochastic Coordinate Descent
- Konečnỳ, J.¹ Qu, Z.² Richtárik, P.³

16
- 84962921030
- arXiv preprint arXiv:1412.8060
- Z. Qu and P. Richtárik, "Coordinate descent with arbitrary sampling i: Algorithms and complexity," arXiv preprint arXiv:1412.8060, 2014.
- (2014) Coordinate Descent with Arbitrary Sampling I: Algorithms and Complexity
- Qu, Z.¹ Richtárik, P.²

17
- 84930680874
- An asynchronous parallel stochastic coordinate descent algorithm
- J. Liu, S. J. Wright, C. Re, V. Bittorf, and S. Sridhar, "An asynchronous parallel stochastic coordinate descent algorithm," Journal of Machine Learning Research, vol. 16, no. 285-322, pp. 1-5, 2015.
- (2015) Journal of Machine Learning Research , vol.16 , Issue.285-322 , pp. 1-5
- Liu, J.¹ Wright, S.J.² Re, C.³ Bittorf, V.⁴ Sridhar, S.⁵

18
- 84937863713
- Communication-efficient distributed dual coordinate ascent
- M. Jaggi, V. Smith, M. Takac, J. Terhorst, S. Krishnan, T. Hofmann, and M. I. Jordan, "Communication-efficient distributed dual coordinate ascent," in Advances in Neural Information Processing Systems, 2014, pp. 3068-3076.
- (2014) Advances in Neural Information Processing Systems , pp. 3068-3076
- Jaggi, M.¹ Smith, V.² Takac, M.³ Terhorst, J.⁴ Krishnan, S.⁵ Hofmann, T.⁶ Jordan, M.I.⁷

19
- 84865692149
- Efficiency of coordinate descent methods on huge-scale optimization problems
- Y. Nesterov, "Efficiency of coordinate descent methods on huge-scale optimization problems," SIAM Journal on Optimization, vol. 22, no. 2, pp. 341-362, 2012.
- (2012) SIAM Journal on Optimization , vol.22 , Issue.2 , pp. 341-362
- Nesterov, Y.¹

20
- 84925448697
- Asynchronous stochastic coordinate descent: Parallelism and convergence properties
- J. Liu and S. J. Wright, "Asynchronous stochastic coordinate descent: Parallelism and convergence properties," SIAM Journal on Optimization, vol. 25, no. 1, pp. 351-376, 2015.
- (2015) SIAM Journal on Optimization , vol.25 , Issue.1 , pp. 351-376
- Liu, J.¹ Wright, S.J.²

21
- 85006096866
- Asynchronous accelerated stochastic gradient descent
- Q. Meng, W. Chen, J. Yu, T. Wang, Z.-M. Ma, and T.-Y. Liu, "Asynchronous accelerated stochastic gradient descent," in Proceedings of the 25th international joint conference on Artificial Intelligence, 2016.
- (2016) Proceedings of the 25th International Joint Conference on Artificial Intelligence
- Meng, Q.¹ Chen, W.² Yu, J.³ Wang, T.⁴ Ma, Z.-M.⁵ Liu, T.-Y.⁶

22
- 84897116612
- Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
- P. Richtárik and M. Takáč, "Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function," Mathematical Programming, vol. 144, no. 1-2, pp. 1-38, 2014.
- (2014) Mathematical Programming , vol.144 , Issue.1-2 , pp. 1-38
- Richtárik, P.¹ Takáč, M.²

23
- 0003696537
- Springer Science & Business Media
- Y. Nesterov, Introductory lectures on convex optimization: A basic course. Springer Science & Business Media, 2013, vol. 87.
- (2013) Introductory Lectures on Convex Optimization: A Basic Course , vol.87
- Nesterov, Y.¹

24
- 85029443880
- accessed January 11, 2016
- "Cifar10 dataset," https://www.cs.toronto.edu/~kriz/cifar.html, accessed January 11, 2016.
- Cifar10 Dataset

25
- 85015254242
- accessed January 11, 2016
- "Cifar10 model," https://github.com/eladhoffer/ConvNet-torch/blob/master/Models/Model.lua, accessed January 11, 2016.
- Cifar10 Model

26
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
- (2014) Journal of Machine Learning Research , vol.15 , Issue.1 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.E.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

27
- 84971543880
- "Torch: A scientific computing framework for luajit," http://torch.ch.
- Torch: A Scientific Computing Framework for Luajit

28
- 84872543023
- Efficient backprop
- Springer
- Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, "Efficient backprop," in Neural networks: Tricks of the trade. Springer, 2012, pp. 9-48.
- (2012) Neural Networks: Tricks of the Trade , pp. 9-48
- LeCun, Y.A.¹ Bottou, L.² Orr, G.B.³ Müller, K.-R.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.