SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn 2017-December, Issue , 2017, Pages 1710-1721

QSGD: Communication-efficient SGD via gradient quantization and encoding

(5) Alistarh, Dan a Grubic, Demjan a Li, Jerry Z b Tomioka, Ryota c Vojnovic, Milan d

a ETH ZURICH (Switzerland)

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

c MICROSOFT RESEARCH (United States)

d LONDON SCHOOL OF ECONOMICS (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

BANDWIDTH; DEEP NEURAL NETWORKS; INFORMATION DISSEMINATION; INFORMATION THEORY; ITERATIVE METHODS; SIGNAL ENCODING; SPEECH RECOGNITION; STOCHASTIC SYSTEMS;

AUTOMATED SPEECH RECOGNITION; COMMUNICATION BANDWIDTH; COMPRESSION SCHEME; CONVEX OBJECTIVES; FUNDAMENTAL BARRIERS; INFORMATION-THEORETIC LOWER BOUNDS; PARALLEL IMPLEMENTATIONS; STOCHASTIC GRADIENT DESCENT;

ECONOMIC AND SOCIAL EFFECTS;

EID: 85042820163 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1868)

References (45)

1
- 84958264664
- arXiv preprint
- Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
- (2016) Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems
- Abadi, M.¹ Agarwal, A.² Barham, P.³ Brevdo, E.⁴ Chen, Z.⁵ Citro, C.⁶ Corrado, G.S.⁷ Davis, A.⁸ Dean, J.⁹ Devin, M.¹⁰

2
- 0004319970
- Springer Science & Business Media
- Alex Acero. Acoustical and environmental robustness in automatic speech recognition, Volume 201. Springer Science & Business Media, 2012.
- (2012) Acoustical and Environmental Robustness in Automatic Speech Recognition , vol.201
- Acero, A.¹

3
- 84928146953
- An introduction to computational networks and the computational network toolkit
- Technical report August 2014
- Amit Agarwal, Eldar Akchurin, Chris Basoglu, Guoguo Chen, Scott Cyphers, Jasha Droppo, Adam Eversole, Brian Guenter, Mark Hillebrand, Ryan Hoens, et al. An introduction to computational networks and the computational network toolkit. Technical report, Tech. Rep. MSR-TR-2014-112, August 2014., 2014.
- (2014) Tech. Rep. MSR-TR-2014-112
- Agarwal, A.¹ Akchurin, E.² Basoglu, C.³ Chen, G.⁴ Cyphers, S.⁵ Droppo, J.⁶ Eversole, A.⁷ Guenter, B.⁸ Hillebrand, M.⁹ Hoens, R.¹⁰

4
- 85047008334
- arXiv preprint
- Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. QSGD: Communication-efficient SGD via gradient quantization and encoding. arXiv preprint arXiv:1610.02132, 2016.
- (2016) QSGD: Communication-Efficient SGD Via Gradient Quantization and Encoding
- Alistarh, D.¹ Grubic, D.² Li, J.³ Tomioka, R.⁴ Vojnovic, M.⁵

5
- 84965119190
- Communication complexity of distributed convex learning and optimization
- Yossi Arjevani and Ohad Shamir. Communication complexity of distributed convex learning and optimization. In NIPS, 2015.
- (2015) NIPS
- Arjevani, Y.¹ Shamir, O.²

6
- 84923409847
- Cambridge University Press
- Ron Bekkerman, Mikhail Bilenko, and John Langford. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press, 2011.
- (2011) Scaling Up Machine Learning: Parallel and Distributed Approaches
- Bekkerman, R.¹ Bilenko, M.² Langford, J.³

7
- 84983143287
- Convex optimization: Algorithms and complexity
- Sébastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231-357, 2015.
- (2015) Foundations and Trends® in Machine Learning , vol.8 , Issue.3-4 , pp. 231-357
- Bubeck, S.¹

8
- 85069497682
- Project adam: Building an efficient and scalable deep learning training system
- October
- Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In OSDI, October 2014.
- (2014) OSDI
- Chilimbi, T.¹ Suzue, Y.² Apacible, J.³ Kalyanaraman, K.⁴

9
- 85047002261
- Accessed: 2017-02-24
- Cntk brainscript file for alexnet. https://github.com/Microsoft/CNTK/tree/master/Examples/Image/Classification/AlexNet/BrainScript. Accessed: 2017-02-24.
- Cntk Brainscript File for Alexnet

10
- 84965170302
- Taming the wild: A unified analysis of hogwild-style algorithms
- Christopher M De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré. Taming the wild: A unified analysis of hogwild-style algorithms. In NIPS, 2015.
- (2015) NIPS
- De Sa Christopher, M.¹ Zhang, C.² Olukotun, K.³ Ré, C.⁴

11
- 84877760312
- Large scale distributed deep networks
- Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks. In NIPS, 2012.
- (2012) NIPS
- Dean, J.¹ Corrado, G.² Monga, R.³ Chen, K.⁴ Devin, M.⁵ Mao, M.⁶ Senior, A.⁷ Tucker, P.⁸ Yang, K.⁹ Le, Q.V.¹⁰

12
- 72249100259
- Imagenet: A large-scale hierarchical image database
- CVPR 2009. IEEE Conference on IEEE
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248-255. IEEE, 2009.
- (2009) Computer Vision and Pattern Recognition, 2009 , pp. 248-255
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Li, F.-F.⁶

13
- 85060524241
- Asynchronous stochastic convex optimization
- John C Duchi, Sorathan Chaturapruek, and Christopher Ré. Asynchronous stochastic convex optimization. NIPS, 2015.
- (2015) NIPS
- Duchi, J.C.¹ Chaturapruek, S.² Ré, C.³

14
- 0016486577
- Universal codeword sets and representations of the integers
- Peter Elias. Universal codeword sets and representations of the integers. IEEE transactions on information theory, 21(2):194-203, 1975.
- (1975) IEEE Transactions on Information Theory , vol.21 , Issue.2 , pp. 194-203
- Elias, P.¹

15
- 84892854517
- Stochastic first- and zeroth-order methods for nonconvex stochastic programming
- Saeed Ghadimi and Guanghui Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341-2368, 2013.
- (2013) SIAM Journal on Optimization , vol.23 , Issue.4 , pp. 2341-2368
- Ghadimi, S.¹ Lan, G.²

16
- 84970003080
- Deep learning with limited numerical precision
- Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Deep learning with limited numerical precision. In ICML, pages 1737-1746, 2015.
- (2015) ICML , pp. 1737-1746
- Gupta, S.¹ Agrawal, A.² Gopalakrishnan, K.³ Narayanan, P.⁴

17
- 84965175092
- arXiv preprint
- Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
- (2015) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- Han, S.¹ Mao, H.² Dally, W.J.³

18
- 84986274465
- Deep residual learning for image recognition
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016.
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 770-778
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

19
- 0031573117
- Long short-term memory
- Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

20
- 85013626529
- Binarized neural networks
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In Advances in Neural Information Processing Systems, pages 4107-4115, 2016.
- (2016) Advances in Neural Information Processing Systems , pp. 4107-4115
- Hubara, I.¹ Courbariaux, M.² Soudry, D.³ El-Yaniv, R.⁴ Bengio, Y.⁵

21
- 84986325668
- Firecaffe: Near-linear acceleration of deep neural network training on compute clusters
- Forrest N Iandola, Matthew W Moskewicz, Khalid Ashraf, and Kurt Keutzer. Firecaffe: near-linear acceleration of deep neural network training on compute clusters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2592-2600, 2016.
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 2592-2600
- Iandola, F.N.¹ Moskewicz, M.W.² Ashraf, K.³ Keutzer, K.⁴

22
- 84964923476
- arXiv preprint
- Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
- (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Ioffe, S.¹ Szegedy, C.²

23
- 84898963415
- Accelerating stochastic gradient descent using predictive variance reduction
- Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In NIPS, 2013.
- (2013) NIPS
- Johnson, R.¹ Zhang, T.²

24
- 85047011234
- arXiv preprint
- Jakub Konečnỳ. Stochastic, distributed and federated optimization for machine learning. arXiv preprint arXiv:1707.01155, 2017.
- (2017) Stochastic, Distributed and Federated Optimization for Machine Learning
- Konečnỳ, J.¹

25
- 77956002520
- Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images, 2009.
- (2009) Learning Multiple Layers of Features from Tiny Images
- Krizhevsky, A.¹ Hinton, G.²

26
- 84876231242
- Imagenet classification with deep convolutional neural networks
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
- (2012) Advances in Neural Information Processing Systems , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

27
- 6344235947
- Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of handwritten digits, 1998.
- (1998) The Mnist Database of Handwritten Digits
- LeCun, Y.¹ Cortes, C.² Burges, C.J.C.³

28
- 84937912100
- Scaling distributed machine learning with the parameter server
- Mu Li, David G andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. Scaling distributed machine learning with the parameter server. In OSDI, 2014.
- (2014) OSDI
- Li, M.¹ Andersen, D.G.² Park, J.W.³ Smola, A.J.⁴ Ahmed, A.⁵ Josifovski, V.⁶ Long, J.⁷ Shekita, E.J.⁸ Su, B.-Y.⁹

29
- 84965099508
- Asynchronous parallel stochastic gradient for nonconvex optimization
- Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu. Asynchronous parallel stochastic gradient for nonconvex optimization. In NIPS. 2015.
- (2015) NIPS
- Lian, X.¹ Huang, Y.² Li, Y.³ Liu, J.⁴

30
- 84998585218
- arXiv preprint
- Arvind Neelakantan, Luke Vilnis, Quoc V Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens. Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807, 2015.
- (2015) Adding Gradient Noise Improves Learning for Very Deep Networks
- Neelakantan, A.¹ Vilnis, L.² Le, Q.V.³ Sutskever, I.⁴ Kaiser, L.⁵ Kurach, K.⁶ Martens, J.⁷

31
- 85047010441
- Accessed: 2017-11-4
- Cntk implementation of qsgd. https://gitlab.com/demjangrubic/QSGD. Accessed: 2017-11-4.
- Cntk Implementation of Qsgd

32
- 85162467517
- Hogwild: A lock-free approach to parallelizing stochastic gradient descent
- Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.
- (2011) NIPS
- Recht, B.¹ Re, C.² Wright, S.³ Niu, F.⁴

33
- 0000016172
- A stochastic approximation method
- Herbert Robbins and Sutton Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pages 400-407, 1951.
- (1951) The Annals of Mathematical Statistics , pp. 400-407
- Robbins, H.¹ Monro, S.²

34
- 28444499884
- IEEE Press, Piscataway, NJ
- Richard Schreier and Gabor C Temes. Understanding delta-sigma data converters, Volume 74. IEEE Press, Piscataway, NJ, 2005.
- (2005) Understanding Delta-sigma Data Converters , vol.74
- Schreier, R.¹ Temes, G.C.²

35
- 84910069984
- 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns
- Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In INTERSPEECH, 2014.
- (2014) INTERSPEECH
- Seide, F.¹ Fu, H.² Droppo, J.³ Li, G.⁴ Yu, D.⁵

36
- 84925410541
- arXiv preprint
- Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

37
- 84959142008
- Scalable distributed DNN training using commodity GPU cloud computing
- Nikko Strom. Scalable distributed DNN training using commodity GPU cloud computing. In INTERSPEECH, 2015.
- (2015) INTERSPEECH
- Strom, N.¹

38
- 85047018109
- arXiv preprint
- Ananda Theertha Suresh, Felix X Yu, H Brendan McMahan, and Sanjiv Kumar. Distributed mean estimation with limited communication. arXiv preprint arXiv:1611.00429, 2016.
- (2016) Distributed Mean Estimation with Limited Communication
- Suresh, A.T.¹ Yu, F.X.² Brendan McMahan, H.³ Kumar, S.⁴

39
- 84999165875
- Seiya Tokui, Kenta Oono, Shohei Hido, CA San Mateo, and Justin Clayton. Chainer: a next-generation open source framework for deep learning.
- Chainer: A Next-generation Open Source Framework for Deep Learning
- Tokui, S.¹ Oono, K.² Hido, S.³ San Mateo, C.A.⁴ Clayton, J.⁵

40
- 34547376584
- Communication complexity of convex optimization
- John N Tsitsiklis and Zhi-Quan Luo. Communication complexity of convex optimization. Journal of Complexity, 3(3), 1987.
- (1987) Journal of Complexity , vol.3 , Issue.3
- Tsitsiklis, J.N.¹ Luo, Z.-Q.²

41
- 85044036193
- arXiv preprint
- Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Tern-grad: Ternary gradients to reduce communication in distributed deep learning. arXiv preprint arXiv:1705.07878, 2017.
- (2017) Tern-grad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
- Wen, W.¹ Xu, C.² Yan, F.³ Wu, C.⁴ Wang, Y.⁵ Chen, Y.⁶ Li, H.⁷

42
- 85047018959
- Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning
- Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, and Ce Zhang. Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning. In International Conference on Machine Learning, pages 4035-4043, 2017.
- (2017) International Conference on Machine Learning , pp. 4035-4043
- Zhang, H.¹ Li, J.² Kara, K.³ Alistarh, D.⁴ Liu, J.⁵ Zhang, C.⁶

43
- 84965152276
- Deep learning with elastic averaging sgd
- Sixin Zhang, Anna E Choromanska, and Yann LeCun. Deep learning with elastic averaging sgd. In Advances in Neural Information Processing Systems, pages 685-693, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 685-693
- Zhang, S.¹ Choromanska, A.E.² LeCun, Y.³

44
- 84899033780
- Information-theoretic lower bounds for distributed statistical estimation with communication constraints
- Yuchen Zhang, John Duchi, Michael I Jordan, and Martin J Wainwright. Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In NIPS, 2013.
- (2013) NIPS
- Zhang, Y.¹ Duchi, J.² Jordan, M.I.³ Wainwright, M.J.⁴

45
- 85023600253
- arXiv preprint
- Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
- (2016) Dorefa-net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- Zhou, S.¹ Wu, Y.² Ni, Z.³ Zhou, X.⁴ Wen, H.⁵ Zou, Y.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.