SCOPUS 정보 검색 플랫폼

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Volumn , Issue , 2017, Pages 193-205

S-caffe: Co-designing MPI runtimes and caffe for scalable deep learning on modern GPU clusters

(4) Awan, Ammar Ahmad a Hamidouche, Khaled a Hashmi, Jahanzeb Maqbool a Panda, Dhabaleswar K a

a Ohio State University (United States)

Author keywords

Caffe; CUDA aware MPI; Deep learning; Distributed training; MPI reduce

Indexed keywords

COBALT; COBALT COMPOUNDS; DESIGN; GRAPHICS PROCESSING UNIT; PARALLEL PROGRAMMING; PROGRAM PROCESSORS;

AGGREGATION SCHEMES; CAFFE; CO-DESIGN METHODOLOGY; CUDA-AWARE MPI; HIERARCHICAL REDUCTION; IN-DEPTH ANALYSIS; MASSIVELY PARALLELS; MPI REDUCE;

DEEP LEARNING;

EID: 85014452127 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/3018743.3018769 Document Type: Conference Paper

Times cited : (124)

References (49)

1
- 85014491906
- Caffe: Multi-GPU Usage and Performance. https://github. com/yahoo/caffe/blob/master/docs/multigpu.md.

2
- 85014502106
- KESCH: Cray CS-Storm System. http://www.cscs.ch/computers/kesch escha/index.html.

3
- 85014442860
- Intel Caffe. https://github.com/intelcaffe.

4
- 85014428233
- A Unified Runtime System for Heterogeneous Multicore Architectures. http://starpu.gforge.inria.fr.

5
- 85014443136
- Online; accessed Dec-2016
- ILSVRC2012 Dataset. http://image-net.org/challenges/LSVRC/2012/index, 2012. [Online; accessed Dec-2016].
- (2012)

6
- 85014480125
- Online; accessed Dec-2016
- Caffe Website. http://caffe.berkeleyvision.org/, 2015. [Online; accessed Dec-2016].
- (2015)

7
- 85014476564
- Online; accessed Dec-2016
- CaffeNet. http://papers.nips.cc/book/advances-in-neuralinformation-processing-systems-25-2012, 2015. [Online; accessed Dec-2016].
- (2015)

8
- 85014434114
- Online; accessed Dec-2016
- GPU Direct RDMA. http://docs.nvidia.com/cuda/gpudirectrdma/, 2015. [Online; accessed Dec-2016].
- (2015)

9
- 85014443969
- Online; accessed Dec-2016
- HPC: Powering Deep Learning. http://computing.ornl. gov/workshops/SMC15/docs/bcatanzaro smcc.pdf, 2015. [Online; accessed Dec-2016].
- (2015)

10
- 85014500526
- Online; accessed Dec-2016
- LMDB. http://symas.com/mdb/, 2015. [Online; accessed Dec-2016].
- (2015)

11
- 85014503814
- Online; accessed Dec-2016
- Nvidia Development Platform for Autonomous Cars. http://www.nvidia.com/object/drive-px.html, 2016. [Online; accessed Dec-2016].
- (2016)

12
- 85014461247
- Online; accessed Dec-2016
- CNTK. http://www.cntk.ai/, 2016. [Online; accessed Dec-2016].
- (2016)

13
- 85014461431
- Online; accessed Dec-2016
- Nvidia GPUs Comparison. http://www.extremetech.com/computing/194391-nvidias-new-tesla-k80-doubles-up-ongpu-horsepower, 2016. [Online; accessed Dec-2016].
- (2016)

14
- 84958264664
- Software
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems, 2015. Software available from tensorflow.org.
- TensorFlow: Large-scale Machine Learning on Heterogeneous Systems, 2015
- Abadi, M.¹ Agarwal, A.² Barham, P.³ Brevdo, E.⁴ Chen, Z.⁵ Citro, C.⁶ Corrado, G.S.⁷ Davis, A.⁸ Dean, J.⁹ Devin, M.¹⁰

15
- 41249087856
- General purpose molecular dynamics simulations fully implemented on graphics processing units
- J. A. Anderson, C. D. Lorenz, and A. Travesset. General Purpose Molecular Dynamics Simulations Fully Implemented on Graphics Processing Units. Journal of Computational Physics, 227(10):5342-5359, 2008.
- (2008) Journal of Computational Physics , vol.227 , Issue.10 , pp. 5342-5359
- Anderson, J.A.¹ Lorenz, C.D.² Travesset, A.³

16
- 85017408696
- Comparative study of caffe, neon, theano, and torch for deep learning
- S. Bahrampour, N. Ramakrishnan, L. Schott, and M. Shah. Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning. CoRR, abs/1511.06435, 2016.
- (2016) CoRR
- Bahrampour, S.¹ Ramakrishnan, N.² Schott, L.³ Shah, M.⁴

17
- 84937942087
- arXiv preprint arXiv:1211.5590
- F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N. Bouchard, D. Warde-Farley, and Y. Bengio. Theano: New Features and Speed Improvements. arXiv preprint arXiv:1211.5590, 2012.
- (2012) Theano: New Features and Speed Improvements
- Bastien, F.¹ Lamblin, P.² Pascanu, R.³ Bergstra, J.⁴ Goodfellow, I.⁵ Bergeron, A.⁶ Bouchard, N.⁷ Warde-Farley, D.⁸ Bengio, Y.⁹

18
- 84936957551
- University of California, San Francisco
- D. Case, J. Berryman, R. Betz, D. Cerutti, T. Cheatham III, T. Darden, R. Duke, T. Giese, H. Gohlke, A. Goetz, et al. AMBER 2015. University of California, San Francisco, 2015.
- (2015) AMBER 2015
- Case, D.¹ Berryman, J.² Betz, R.³ Cerutti, D.⁴ Cheatham, T.⁵ Darden, T.⁶ Duke, R.⁷ Giese, T.⁸ Gohlke, H.⁹ Goetz, A.¹⁰

19
- 85069497682
- Project adam: Building an efficient and scalable deep learning training system
- Berkeley, CA, USA, USENIX Association
- T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 571-582, Berkeley, CA, USA, 2014. USENIX Association. ISBN 978-1-931971-16-4. URL http://dl.acm.org/citation. cfm?id=2685048.2685094.
- (2014) Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14 , pp. 571-582
- Chilimbi, T.¹ Suzue, Y.² Apacible, J.³ Kalyanaraman, K.⁴

20
- 84897484337
- Deep learning with COTS HPC systems
- A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, and N. Andrew. Deep Learning with COTS HPC Systems. In Proceedings of the 30th international conference on machine learning, pages 1337-1345, 2013.
- (2013) Proceedings of the 30th International Conference on Machine Learning , pp. 1337-1345
- Coates, A.¹ Huval, B.² Wang, T.³ Wu, D.⁴ Catanzaro, B.⁵ Andrew, N.⁶

21
- 5044234815
- Technical report, IDIAP
- R. Collobert, S. Bengio, and J. Mariéthoz. Torch: A Modular Machine Learning Software Library. Technical report, IDIAP, 2002.
- (2002) Torch: A Modular Machine Learning Software Library
- Collobert, R.¹ Bengio, S.² Mariéthoz, J.³

22
- 85014448741
- Online; accessed Dec-2016
- Cray. http://docs.cray.com/books/004-3689-001/html-004-3689-001/004-3689-001-toc.html, 2016. [Online; accessed Dec-2016].
- (2016)

23
- 84971575164
- Geeps: Scalable deep learning on distributed GPUs with a GPUspecialized parameter server
- New York, NY, USA, ACM
- H. Cui, H. Zhang, G. R. Ganger, P. B. Gibbons, and E. P. Xing. Geeps: Scalable deep learning on distributed gpus with a gpuspecialized parameter server. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys'16, pages 4:1-4:16, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4240-7. doi: 10.1145/2901318.2901323. URL http://doi.acm.org/10.1145/2901318.2901323.
- (2016) Proceedings of the Eleventh European Conference on Computer Systems, EuroSys'16 , pp. 41-416
- Cui, H.¹ Zhang, H.² Ganger, G.R.³ Gibbons, P.B.⁴ Xing, E.P.⁵

24
- 84877760312
- Large scale distributed deep networks
- J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al. Large Scale Distributed Deep Networks. In Advances in Neural Information Processing Systems, pages 1223-1231, 2012.
- (2012) Advances in Neural Information Processing Systems , pp. 1223-1231
- Dean, J.¹ Corrado, G.² Monga, R.³ Chen, K.⁴ Devin, M.⁵ Mao, M.⁶ Senior, A.⁷ Tucker, P.⁸ Yang, K.⁹ Le, Q.V.¹⁰

25
- 85198028989
- Imagenet: A large-scale hierarchical image database
- IEEE
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A Large-Scale Hierarchical Image Database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248-255. IEEE, 2009.
- (2009) Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , pp. 248-255
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

26
- 85014475551
- Google. Google's Remote Procedure Call Library (gRPC). http://www.grpc.io,.
- Google's Remote Procedure Call Library (gRPC)

27
- 85014480443
- Google. Distributed TensorFlow: Github Issues. https://github.com/tensorflow/models/issues/698,.
- Distributed TensorFlow: Github Issues

28
- 77954066718
- Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-direct capabilities
- IEEE
- R. L. Graham, S. Poole, P. Shamis, G. Bloch, N. Bloch, H. Chapman, M. Kagan, A. Shahar, I. Rabinovitz, and G. Shainer. Overlapping Computation and Communication: Barrier Algorithms and ConnectX-2 CORE-Direct Capabilities. In Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, pages 1-8. IEEE, 2010.
- (2010) Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on , pp. 1-8
- Graham, R.L.¹ Poole, S.² Shamis, P.³ Bloch, G.⁴ Bloch, N.⁵ Chapman, H.⁶ Kagan, M.⁷ Shahar, A.⁸ Rabinovitz, I.⁹ Shainer, G.¹⁰

29
- 56749151145
- Implementation and performance analysis of non-blocking collective operations for MPI
- IEEE
- T. Hoefler, A. Lumsdaine, and W. Rehm. Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In Supercomputing, 2007. SC'07. Proceedings of the 2007 ACM/IEEE Conference on, pages 1-10. IEEE, 2007.
- (2007) Supercomputing, 2007. SC'07. Proceedings of the 2007 ACM/IEEE Conference on , pp. 1-10
- Hoefler, T.¹ Lumsdaine, A.² Rehm, W.³

30
- 85015228288
- arXiv preprint arXiv:1511.00175, y2015
- F. N. Iandola, K. Ashraf, M. W. Moskewicz, and K. Keutzer. FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters. arXiv preprint arXiv:1511.00175, y2015.
- FireCaffe: Near-linear Acceleration of Deep Neural Network Training on Compute Clusters
- Iandola, F.N.¹ Ashraf, K.² Moskewicz, M.W.³ Keutzer, K.⁴

31
- 85014424561
- Inspur. https://github.com/Caffe-MPI/Caffe-MPI.github.io, 2016.
- (2016)

32
- 85014466865
- J. Dean. Keynote: Large Scale Deep Learning.
- Keynote: Large Scale Deep Learning
- Dean, J.¹

33
- 84913555165
- arXiv preprint arXiv:1408.5093
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093, 2014.
- (2014) Caffe: Convolutional Architecture for Fast Feature Embedding
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

34
- 84946590547
- One weird trick for parallelizing convolutional neural networks
- A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. CoRR, abs/1404.5997, 2014.
- (2014) CoRR
- Krizhevsky, A.¹

35
- 77956002520
- A. Krizhevsky and G. Hinton. Learning Multiple Layers of Features from Tiny Images, 2009.
- (2009) Learning Multiple Layers of Features from Tiny Images
- Krizhevsky, A.¹ Hinton, G.²

36
- 84876231242
- ImageNet classification with deep convolutional neural networks
- F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Curran Associates, Inc.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097-1105. Curran Associates, Inc., 2012. URL http://papers.nips.cc/paper/4824-imagenet-classificationwith-deep-convolutional-neural-networks.pdf.
- (2012) Advances in Neural Information Processing Systems 25 , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

37
- 85012994753
- arXiv
- S. Lee, S. Purushwalkam, M. Cogswell, D. J. Crandall, and D. Batra. Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks. arXiv, 2015. URL http://arxiv.org/abs/1511.06314.
- (2015) Why M Heads Are Better Than One: Training a Diverse Ensemble of Deep Networks
- Lee, S.¹ Purushwalkam, S.² Cogswell, M.³ Crandall, D.J.⁴ Batra, D.⁵

38
- 84939241380
- arXiv preprint arXiv:1312.4400
- M. Lin, Q. Chen, and S. Yan. Network in Network. arXiv preprint arXiv:1312.4400, 2013.
- (2013) Network in Network
- Lin, M.¹ Chen, Q.² Yan, S.³

39
- 74049119615
- Lustre. Parallel File System. http://lustre.org.
- Parallel File System

40
- 32844469834
- H. Meuer, E. Strohmaier, J. Dongarra, and H. Simon. TOP 500 Supercomputer Sites. http://www.top500.org.
- TOP 500 Supercomputer Sites
- Meuer, H.¹ Strohmaier, E.² Dongarra, J.³ Simon, H.⁴

41
- 84879810323
- MVAPICH2: MPI over InfiniBand, 10GigE/iWARP and RoCE. https://mvapich.cse.ohio-state.edu/.
- MVAPICH2: MPI over InfiniBand, 10GigE/iWARP and RoCE

42
- 85014428518
- Network Based Computing Laboratory. OSU Micro-Benchmarks. http://mvapich.cse.ohio-state.edu/benchmarks/, 2016.
- (2016) OSU Micro-Benchmarks

43
- 58849145268
- C. Nvidia. Programming Guide, 2008.
- (2008) Programming Guide
- Nvidia, C.¹

44
- 84925410541
- arXiv preprint arXiv:1409.1556
- K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

45
- 84937522268
- Going deeper with convolutions
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1-9, 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 1-9
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

46
- 85014458046
- The HiDL Team. High Performance Deep Learning (HiDL) Project. http://hidl.cse.ohio-state.edu.
- High Performance Deep Learning (HiDL) Project

47
- 84870598466
- The Open MPI Development Team. Open MPI: Open Source High Performance Computing. http://www.open-mpi.org.
- Open MPI: Open Source High Performance Computing

48
- 85014437412
- arXiv preprint arXiv:1603.02339
- A. Vishnu, C. Siegel, and J. Daily. Distributed TensorFlow with MPI. arXiv preprint arXiv:1603.02339, 2016.
- (2016) Distributed TensorFlow with MPI
- Vishnu, A.¹ Siegel, C.² Daily, J.³

49
- 85014442834
- ArXiv e-prints, June
- D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck. Deep Learning for Identifying Metastatic Breast Cancer. ArXiv e-prints, June 2016.
- (2016) Deep Learning for Identifying Metastatic Breast Cancer
- Wang, D.¹ Khosla, A.² Gargeya, R.³ Irshad, H.⁴ Beck, A.H.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.