SCOPUS 정보 검색 플랫폼

Proceedings of the 11th European Conference on Computer Systems, EuroSys 2016

Volumn , Issue , 2016, Pages

GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server

(5) Cui, Henggang a Zhang, Hao a Ganger, Gregory R a Gibbons, Phillip B a Xing, Eric P a

a CARNEGIE MELLON UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

NETWORK LAYERS; PROGRAM PROCESSORS;

COMPUTATIONAL RESOURCES; DATA MOVEMENTS; GPU IMPLEMENTATION; MULTIPLE MACHINE; NEW PARAMETERS; STATE OF THE ART; TRAINING IMAGE; TRAINING THROUGHPUTS;

DISTRIBUTED COMPUTER SYSTEMS;

EID: 84971575164 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2901318.2901323 Document Type: Conference Paper

Times cited : (310)

References (36)

1
- 84954514573
- NVIDIA cuBLAS https://developer.nvidia.com/cublas.
- NVIDIA cuBLAS

2
- 84971553276
- NVIDIA cuDNN https://developer.nvidia.com/cudnn.
- NVIDIA cuDNN

3
- 84858012279
- Scalable inference in latent variable models
- A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In WSDM, 2012.
- (2012) WSDM
- Ahmed, A.¹ Aly, M.² Gonzalez, J.³ Narayanamurthy, S.⁴ Smola, A.J.⁵

4
- 84919919193
- Distributed stochastic gradient MCMC
- S. Ahn, B. Shahbaba, and M. Welling. Distributed stochastic gradient MCMC. In ICML, 2014.
- (2014) ICML
- Ahn, S.¹ Shahbaba, B.² Welling, M.³

5
- 34547975052
- Scaling learning algorithms towards AI
- Y. Bengio, Y. LeCun, et al. Scaling learning algorithms towards AI. Large-scale kernel machines, 34(5), 2007.
- (2007) Large-scale Kernel Machines , vol.34 , Issue.5
- Bengio, Y.¹ LeCun, Y.²

6
- 84990032982
- arXiv preprint arXiv:1512.01274
- T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274, 2015.
- (2015) MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
- Chen, T.¹ Li, M.² Li, Y.³ Lin, M.⁴ Wang, N.⁵ Wang, M.⁶ Xiao, T.⁷ Xu, B.⁸ Zhang, C.⁹ Zhang, Z.¹⁰

7
- 85069497682
- Project Adam: Building an efficient and scalable deep learning training system
- T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project Adam: Building an efficient and scalable deep learning training system. In OSDI, 2014.
- (2014) OSDI
- Chilimbi, T.¹ Suzue, Y.² Apacible, J.³ Kalyanaraman, K.⁴

8
- 84866714584
- Multi-column deep neural networks for image classification
- D. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In CVPR, 2012.
- (2012) CVPR
- Ciresan, D.¹ Meier, U.² Schmidhuber, J.³

9
- 84897484337
- Deep learning with COTS HPC systems
- A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, and N. Andrew. Deep learning with COTS HPC systems. In ICML, 2013.
- (2013) ICML
- Coates, A.¹ Huval, B.² Wang, T.³ Wu, D.⁴ Catanzaro, B.⁵ Andrew, N.⁶

10
- 85077475089
- Exploiting bounded staleness to speed up big data analytics
- H. Cui, J. Cipar, Q. Ho, J. K. Kim, S. Lee, A. Kumar, J. Wei, W. Dai, G. R. Ganger, P. B. Gibbons, G. A. Gibson, and E. P.Xing. Exploiting bounded staleness to speed up big data analytics. In USENIX ATC, 2014.
- (2014) USENIX ATC
- Cui, H.¹ Cipar, J.² Ho, Q.³ Kim, J.K.⁴ Lee, S.⁵ Kumar, A.⁶ Wei, J.⁷ Dai, W.⁸ Ganger, G.R.⁹ Gibbons, P.B.¹⁰ Gibson, G.A.¹¹ Xing, E.P.¹²

11
- 85118315826
- Exploiting iterative-ness for parallel ML computations
- H. Cui, A. Tumanov, J. Wei, L. Xu, W. Dai, J. Haber-Kucharsky, Q. Ho, G. R. Ganger, P. B. Gibbons, G. A. Gibson, and E. P. Xing. Exploiting iterative-ness for parallel ML computations. In SoCC, 2014.
- (2014) SoCC
- Cui, H.¹ Tumanov, A.² Wei, J.³ Xu, L.⁴ Dai, W.⁵ Haber-Kucharsky, J.⁶ Ho, Q.⁷ Ganger, G.R.⁸ Gibbons, P.B.⁹ Gibson, G.A.¹⁰ Xing, E.P.¹¹

12
- 84971509545
- Scalable deep learning on distributed GPUs with a GPU-specialized parameter server
- H. Cui, G. R. Ganger, and P. B. Gibbons. Scalable deep learning on distributed GPUs with a GPU-specialized parameter server. CMU PDL Technical Report (CMU-PDL-15-107), 2015.
- (2015) CMU PDL Technical Report
- Cui, H.¹ Ganger, G.R.² Gibbons, P.B.³

13
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- G. E. Dahl, D. Yu, L. Deng, and A. Acero. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 2012.
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.1
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

14
- 84877760312
- Large scale distributed deep networks
- J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al. Large scale distributed deep networks. In NIPS, 2012.
- (2012) NIPS
- Dean, J.¹ Corrado, G.² Monga, R.³ Chen, K.⁴ Devin, M.⁵ Mao, M.⁶ Senior, A.⁷ Tucker, P.⁸ Yang, K.⁹ Le, Q.V.¹⁰

15
- 85198028989
- ImageNet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
- (2009) CVPR
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

16
- 84944046597
- arXiv preprint arXiv:1411.4389
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. arXiv preprint arXiv:1411.4389, 2014.
- (2014) Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

17
- 84891720231
- PRObE: A thousand-node experimental cluster for computer systems research
- G. Gibson, G. Grider, A. Jacobson, and W. Lloyd. PRObE: A thousand-node experimental cluster for computer systems research. USENIX ;login:, 2013.
- (2013) USENIX ;login:
- Gibson, G.¹ Grider, G.² Jacobson, A.³ Lloyd, W.⁴

18
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 2012.
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰

19
- 84898988368
- More effective distributed ML via a Stale Synchronous Parallel parameter server
- Q. Ho, J. Cipar, H. Cui, S. Lee, J. K. Kim, P. B. Gibbons, G. A. Gibson, G. R. Ganger, and E. P. Xing. More effective distributed ML via a Stale Synchronous Parallel parameter server. In NIPS, 2013.
- (2013) NIPS
- Ho, Q.¹ Cipar, J.² Cui, H.³ Lee, S.⁴ Kim, J.K.⁵ Gibbons, P.B.⁶ Gibson, G.A.⁷ Ganger, G.R.⁸ Xing, E.P.⁹

20
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8), 1997.
- (1997) Neural Computation , vol.9 , Issue.8
- Hochreiter, S.¹ Schmidhuber, J.²

21
- 84913555165
- arXiv preprint arXiv:1408.5093
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
- (2014) Caffe: Convolutional Architecture for Fast Feature Embedding
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

22
- 84932095919
- arXiv preprint arXiv:1404.5997
- A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014.
- (2014) One Weird Trick for Parallelizing Convolutional Neural Networks
- Krizhevsky, A.¹

23
- 84876231242
- ImageNet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

24
- 84937912100
- Scaling distributed machine learning with the parameter server
- M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the parameter server. In OSDI, 2014.
- (2014) OSDI
- Li, M.¹ Andersen, D.G.² Park, J.W.³ Smola, A.J.⁴ Ahmed, A.⁵ Josifovski, V.⁶ Long, J.⁷ Shekita, E.J.⁸ Su, B.-Y.⁹

25
- 82155188108
- Piccolo: Building fast, distributed programs with partitioned tables
- R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI, 2010.
- (2010) OSDI
- Power, R.¹ Li, J.²

26
- 84947041871
- ImageNet large scale visual recognition challenge
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015.
- (2015) International Journal of Computer Vision
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰ Berg, A.C.¹¹ Fei-Fei, L.¹²

27
- 84884955228
- arXiv preprint arXiv:1212.0402
- K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
- (2012) Ucf101: A Dataset of 101 Human Actions Classes from Videos in the Wild
- Soomro, K.¹ Zamir, A.R.² Shah, M.³

28
- 84964983441
- arXiv preprint arXiv:1409.4842
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.
- (2014) Going Deeper with Convolutions
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

29
- 84939821075
- arXiv preprint arXiv:1411.4555
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. arXiv preprint arXiv:1411.4555, 2014.
- (2014) Show and Tell: A Neural Image Caption Generator
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

30
- 84906870319
- Minerva: A scalable and highly efficient training platform for deep learning
- M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, and Z. Zhang. Minerva: A scalable and highly efficient training platform for deep learning. NIPS 2014 Workshop of Distributed Matrix Computations, 2014.
- (2014) NIPS 2014 Workshop of Distributed Matrix Computations
- Wang, M.¹ Xiao, T.² Li, J.³ Zhang, J.⁴ Hong, C.⁵ Zhang, Z.⁶

31
- 84912132796
- arXiv preprint arXiv:1405.4402
- Y. Wang, X. Zhao, Z. Sun, H. Yan, L. Wang, Z. Jin, L. Wang, Y. Gao, J. Zeng, Q. Yang, et al. Towards topic modeling for big data. arXiv preprint arXiv:1405.4402, 2014.
- (2014) Towards Topic Modeling for Big Data
- Wang, Y.¹ Zhao, X.² Sun, Z.³ Yan, H.⁴ Wang, L.⁵ Jin, Z.⁶ Wang, L.⁷ Gao, Y.⁸ Zeng, J.⁹ Yang, Q.¹⁰

32
- 84959036260
- Managed communication and consistency for fast data-parallel iterative analytics
- J. Wei, W. Dai, A. Qiao, Q. Ho, H. Cui, G. R. Ganger, P. B. Gibbons, G. A. Gibson, and E. P. Xing. Managed communication and consistency for fast data-parallel iterative analytics. In SoCC, 2015.
- (2015) SoCC
- Wei, J.¹ Dai, W.² Qiao, A.³ Ho, Q.⁴ Cui, H.⁵ Ganger, G.R.⁶ Gibbons, P.B.⁷ Gibson, G.A.⁸ Xing, E.P.⁹

33
- 84930572185
- arXiv preprint arXiv:1501.02876
- R. Wu, S. Yan, Y. Shan, Q. Dang, and G. Sun. Deep image: Scaling up image recognition. arXiv preprint arXiv:1501.02876, 2015.
- (2015) Deep Image: Scaling Up Image Recognition
- Wu, R.¹ Yan, S.² Shan, Y.³ Dang, Q.⁴ Sun, G.⁵

34
- 84959228762
- Beyond short snippets: Deep networks for video classification
- J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classification. In CVPR, 2015.
- (2015) CVPR
- Yue-Hei Ng, J.¹ Hausknecht, M.² Vijayanarasimhan, S.³ Vinyals, O.⁴ Monga, R.⁵ Toderici, G.⁶

35
- 84971554516
- arXiv preprint arXiv:1512.06216
- H. Zhang, Z. Hu, J. Wei, P. Xie, G. Kim, Q. Ho, and E. Xing. Poseidon: A system architecture for efficient GPU-based deep learning on multiple machines. arXiv preprint arXiv:1512.06216, 2015.
- (2015) Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines
- Zhang, H.¹ Hu, Z.² Wei, J.³ Xie, P.⁴ Kim, G.⁵ Ho, Q.⁶ Xing, E.⁷

36
- 84912111128
- Asynchronous distributed ADMM algorithm for global variable consensus optimization
- R. Zhang and J. Kwok. Asynchronous distributed ADMM algorithm for global variable consensus optimization. In ICML, 2014.
- (2014) ICML
- Zhang, R.¹ Kwok, J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.