-
1
-
-
84958264664
-
-
arXiv preprint
-
Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
-
(2016)
Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems
-
-
Abadi, M.1
Agarwal, A.2
Barham, P.3
Brevdo, E.4
Chen, Z.5
Citro, C.6
Corrado, G.S.7
Davis, A.8
Dean, J.9
Devin, M.10
-
3
-
-
84928146953
-
An introduction to computational networks and the computational network toolkit
-
Technical report August 2014
-
Amit Agarwal, Eldar Akchurin, Chris Basoglu, Guoguo Chen, Scott Cyphers, Jasha Droppo, Adam Eversole, Brian Guenter, Mark Hillebrand, Ryan Hoens, et al. An introduction to computational networks and the computational network toolkit. Technical report, Tech. Rep. MSR-TR-2014-112, August 2014., 2014.
-
(2014)
Tech. Rep. MSR-TR-2014-112
-
-
Agarwal, A.1
Akchurin, E.2
Basoglu, C.3
Chen, G.4
Cyphers, S.5
Droppo, J.6
Eversole, A.7
Guenter, B.8
Hillebrand, M.9
Hoens, R.10
-
4
-
-
85047008334
-
-
arXiv preprint
-
Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. QSGD: Communication-efficient SGD via gradient quantization and encoding. arXiv preprint arXiv:1610.02132, 2016.
-
(2016)
QSGD: Communication-Efficient SGD Via Gradient Quantization and Encoding
-
-
Alistarh, D.1
Grubic, D.2
Li, J.3
Tomioka, R.4
Vojnovic, M.5
-
5
-
-
84965119190
-
Communication complexity of distributed convex learning and optimization
-
Yossi Arjevani and Ohad Shamir. Communication complexity of distributed convex learning and optimization. In NIPS, 2015.
-
(2015)
NIPS
-
-
Arjevani, Y.1
Shamir, O.2
-
7
-
-
84983143287
-
Convex optimization: Algorithms and complexity
-
Sébastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231-357, 2015.
-
(2015)
Foundations and Trends® in Machine Learning
, vol.8
, Issue.3-4
, pp. 231-357
-
-
Bubeck, S.1
-
8
-
-
85069497682
-
Project adam: Building an efficient and scalable deep learning training system
-
October
-
Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In OSDI, October 2014.
-
(2014)
OSDI
-
-
Chilimbi, T.1
Suzue, Y.2
Apacible, J.3
Kalyanaraman, K.4
-
9
-
-
85047002261
-
-
Accessed: 2017-02-24
-
Cntk brainscript file for alexnet. https://github.com/Microsoft/CNTK/tree/master/Examples/Image/Classification/AlexNet/BrainScript. Accessed: 2017-02-24.
-
Cntk Brainscript File for Alexnet
-
-
-
10
-
-
84965170302
-
Taming the wild: A unified analysis of hogwild-style algorithms
-
Christopher M De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré. Taming the wild: A unified analysis of hogwild-style algorithms. In NIPS, 2015.
-
(2015)
NIPS
-
-
De Sa Christopher, M.1
Zhang, C.2
Olukotun, K.3
Ré, C.4
-
11
-
-
84877760312
-
Large scale distributed deep networks
-
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks. In NIPS, 2012.
-
(2012)
NIPS
-
-
Dean, J.1
Corrado, G.2
Monga, R.3
Chen, K.4
Devin, M.5
Mao, M.6
Senior, A.7
Tucker, P.8
Yang, K.9
Le, Q.V.10
-
12
-
-
72249100259
-
Imagenet: A large-scale hierarchical image database
-
CVPR 2009. IEEE Conference on IEEE
-
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248-255. IEEE, 2009.
-
(2009)
Computer Vision and Pattern Recognition, 2009
, pp. 248-255
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Li, F.-F.6
-
13
-
-
85060524241
-
Asynchronous stochastic convex optimization
-
John C Duchi, Sorathan Chaturapruek, and Christopher Ré. Asynchronous stochastic convex optimization. NIPS, 2015.
-
(2015)
NIPS
-
-
Duchi, J.C.1
Chaturapruek, S.2
Ré, C.3
-
14
-
-
0016486577
-
Universal codeword sets and representations of the integers
-
Peter Elias. Universal codeword sets and representations of the integers. IEEE transactions on information theory, 21(2):194-203, 1975.
-
(1975)
IEEE Transactions on Information Theory
, vol.21
, Issue.2
, pp. 194-203
-
-
Elias, P.1
-
15
-
-
84892854517
-
Stochastic first- and zeroth-order methods for nonconvex stochastic programming
-
Saeed Ghadimi and Guanghui Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341-2368, 2013.
-
(2013)
SIAM Journal on Optimization
, vol.23
, Issue.4
, pp. 2341-2368
-
-
Ghadimi, S.1
Lan, G.2
-
16
-
-
84970003080
-
Deep learning with limited numerical precision
-
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Deep learning with limited numerical precision. In ICML, pages 1737-1746, 2015.
-
(2015)
ICML
, pp. 1737-1746
-
-
Gupta, S.1
Agrawal, A.2
Gopalakrishnan, K.3
Narayanan, P.4
-
18
-
-
84986274465
-
Deep residual learning for image recognition
-
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016.
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 770-778
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
20
-
-
85013626529
-
Binarized neural networks
-
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In Advances in Neural Information Processing Systems, pages 4107-4115, 2016.
-
(2016)
Advances in Neural Information Processing Systems
, pp. 4107-4115
-
-
Hubara, I.1
Courbariaux, M.2
Soudry, D.3
El-Yaniv, R.4
Bengio, Y.5
-
21
-
-
84986325668
-
Firecaffe: Near-linear acceleration of deep neural network training on compute clusters
-
Forrest N Iandola, Matthew W Moskewicz, Khalid Ashraf, and Kurt Keutzer. Firecaffe: near-linear acceleration of deep neural network training on compute clusters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2592-2600, 2016.
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 2592-2600
-
-
Iandola, F.N.1
Moskewicz, M.W.2
Ashraf, K.3
Keutzer, K.4
-
23
-
-
84898963415
-
Accelerating stochastic gradient descent using predictive variance reduction
-
Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In NIPS, 2013.
-
(2013)
NIPS
-
-
Johnson, R.1
Zhang, T.2
-
28
-
-
84937912100
-
Scaling distributed machine learning with the parameter server
-
Mu Li, David G andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. Scaling distributed machine learning with the parameter server. In OSDI, 2014.
-
(2014)
OSDI
-
-
Li, M.1
Andersen, D.G.2
Park, J.W.3
Smola, A.J.4
Ahmed, A.5
Josifovski, V.6
Long, J.7
Shekita, E.J.8
Su, B.-Y.9
-
29
-
-
84965099508
-
Asynchronous parallel stochastic gradient for nonconvex optimization
-
Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu. Asynchronous parallel stochastic gradient for nonconvex optimization. In NIPS. 2015.
-
(2015)
NIPS
-
-
Lian, X.1
Huang, Y.2
Li, Y.3
Liu, J.4
-
30
-
-
84998585218
-
-
arXiv preprint
-
Arvind Neelakantan, Luke Vilnis, Quoc V Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens. Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807, 2015.
-
(2015)
Adding Gradient Noise Improves Learning for Very Deep Networks
-
-
Neelakantan, A.1
Vilnis, L.2
Le, Q.V.3
Sutskever, I.4
Kaiser, L.5
Kurach, K.6
Martens, J.7
-
31
-
-
85047010441
-
-
Accessed: 2017-11-4
-
Cntk implementation of qsgd. https://gitlab.com/demjangrubic/QSGD. Accessed: 2017-11-4.
-
Cntk Implementation of Qsgd
-
-
-
32
-
-
85162467517
-
Hogwild: A lock-free approach to parallelizing stochastic gradient descent
-
Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.
-
(2011)
NIPS
-
-
Recht, B.1
Re, C.2
Wright, S.3
Niu, F.4
-
35
-
-
84910069984
-
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns
-
Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In INTERSPEECH, 2014.
-
(2014)
INTERSPEECH
-
-
Seide, F.1
Fu, H.2
Droppo, J.3
Li, G.4
Yu, D.5
-
37
-
-
84959142008
-
Scalable distributed DNN training using commodity GPU cloud computing
-
Nikko Strom. Scalable distributed DNN training using commodity GPU cloud computing. In INTERSPEECH, 2015.
-
(2015)
INTERSPEECH
-
-
Strom, N.1
-
40
-
-
34547376584
-
Communication complexity of convex optimization
-
John N Tsitsiklis and Zhi-Quan Luo. Communication complexity of convex optimization. Journal of Complexity, 3(3), 1987.
-
(1987)
Journal of Complexity
, vol.3
, Issue.3
-
-
Tsitsiklis, J.N.1
Luo, Z.-Q.2
-
41
-
-
85044036193
-
-
arXiv preprint
-
Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Tern-grad: Ternary gradients to reduce communication in distributed deep learning. arXiv preprint arXiv:1705.07878, 2017.
-
(2017)
Tern-grad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
-
-
Wen, W.1
Xu, C.2
Yan, F.3
Wu, C.4
Wang, Y.5
Chen, Y.6
Li, H.7
-
42
-
-
85047018959
-
Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning
-
Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, and Ce Zhang. Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning. In International Conference on Machine Learning, pages 4035-4043, 2017.
-
(2017)
International Conference on Machine Learning
, pp. 4035-4043
-
-
Zhang, H.1
Li, J.2
Kara, K.3
Alistarh, D.4
Liu, J.5
Zhang, C.6
-
44
-
-
84899033780
-
Information-theoretic lower bounds for distributed statistical estimation with communication constraints
-
Yuchen Zhang, John Duchi, Michael I Jordan, and Martin J Wainwright. Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In NIPS, 2013.
-
(2013)
NIPS
-
-
Zhang, Y.1
Duchi, J.2
Jordan, M.I.3
Wainwright, M.J.4
-
45
-
-
85023600253
-
-
arXiv preprint
-
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
-
(2016)
Dorefa-net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
-
-
Zhou, S.1
Wu, Y.2
Ni, Z.3
Zhou, X.4
Wen, H.5
Zou, Y.6
|