SCOPUS 정보 검색 플랫폼

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Volumn , Issue , 2009, Pages

Performance analysis of memory transfers and GEMM subroutines on NVIDIA Tesla GPU cluster

(3) Allada, Veerendra a Benjegerdes, Troy a Bode, Brett b

a Iowa State University (United States)

b UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

Author keywords

CUBLAS; CUDA; GPU cluster; Math Kernel Library; NetPIPE; Performance; Tesla

Indexed keywords

APPLICATION ACCELERATOR; BASIC LINEAR ALGEBRA SUBROUTINES; COMMODITY CLUSTERS; COMPUTATIONAL CHEMISTRY; DOUBLE PRECISION; EFFICIENT IMPLEMENTATION; GRAPHICAL PROCESSING UNITS; HIGH PERFORMANCE COMPUTING SYSTEMS; KERNEL LIBRARIES; MATRIX; OVERALL EFFICIENCY; PERFORMANCE ANALYSIS; PRICE RATIO; SCIENTIFIC APPLICATIONS;

CLUSTER COMPUTING; COMPUTATION THEORY; COMPUTATIONAL EFFICIENCY; EQUIPMENT TESTING; PROGRAM PROCESSORS; SUBROUTINES; TECHNICAL PRESENTATIONS;

COMPUTER GRAPHICS EQUIPMENT;

EID: 72049102909 PISSN: 15525244 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CLUSTR.2009.5289124 Document Type: Conference Paper

Times cited : (11)

References (22)

1
- 10044281444
- Netpipe: A network protocol independent performace evaluator
- Q. O. Snell, A. R. Mikler, and J. L. Gustafson, "Netpipe: A network protocol independent performace evaluator," in In Proceedings of the IASTED International Conference on Intelligent Information Management and Systems, 1996.
- (1996) Proceedings of the IASTED International Conference on Intelligent Information Management and Systems
- Snell, Q.O.¹ Mikler, A.R.² Gustafson, J.L.³

2
- 0004302191
- San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2003.
- (2003) Computer Architecture: A Quantitative Approach
- Hennessy, J.L.¹ Patterson, D.A.²

3
- 84877609547
- Brook for gpus: Stream computing on graphics hardware
- NewYork,NY, USA: ACM
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for gpus: stream computing on graphics hardware," in SIGGRAPH '04: ACM SIGGRAPH 2004 Papers. NewYork,NY, USA: ACM, 2004, pp. 777-786.
- (2004) SIGGRAPH '04: ACM SIGGRAPH 2004 Papers , pp. 777-786
- Buck, I.¹ Foley, T.² Horn, D.³ Sugerman, J.⁴ Fatahalian, K.⁵ Houston, M.⁶ Hanrahan, P.⁷

4
- 78651550268
- Scalable parallel programming with cuda
- J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable parallel programming with cuda," Queue, vol.6, no.2, pp. 40-53, 2008.
- (2008) Queue , vol.6 , Issue.2 , pp. 40-53
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

5
- 49049088756
- Gpu computing
- May
- J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, and J. Phillips, "Gpu computing," Proceedings of the IEEE, vol.96, no.5, pp. 879-899, May 2008.
- (2008) Proceedings of the IEEE , vol.96 , Issue.5 , pp. 879-899
- Owens, J.¹ Houston, M.² Luebke, D.³ Green, S.⁴ Stone, J.⁵ Phillips, J.⁶

6
- 33947588048
- A survey of general-purpose computation on graphics hardware
- [Online]. Available
- J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krger, A. E. Lefohn, and T. J. Purcell, "A survey of general-purpose computation on graphics hardware," Computer Graphics Forum, vol.26, no.1, pp. 80-113, 2007. [Online]. Available: http://www.blackwellsynergy.com/doi/pdf/10.1111/j. 1467-8659.2007.01012.x
- (2007) Computer Graphics Forum , vol.26 , Issue.1 , pp. 80-113
- Owens, J.D.¹ Luebke, D.² Govindaraju, N.³ Harris, M.⁴ Krger, J.⁵ Lefohn, A.E.⁶ Purcell, T.J.⁷

7
- 49249086142
- Larrabee: A many-core x86 architecture for visual computing
- L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan, "Larrabee: a many-core x86 architecture for visual computing," ACM Trans. Graph., vol.27, no.3, pp. 1-15, 2008.
- (2008) ACM Trans. Graph. , vol.27 , Issue.3 , pp. 1-15
- Seiler, L.¹ Carmean, D.² Sprangle, E.³ Forsyth, T.⁴ Abrash, M.⁵ Dubey, P.⁶ Junkins, S.⁷ Lake, A.⁸ Sugerman, J.⁹ Cavin, R.¹⁰ Espasa, R.¹¹ Grochowski, E.¹² Juan, T.¹³ Hanrahan, P.¹⁴

8
- 72049110101
- http://www.khronos.org/opencl/

9
- 25844503119
- Introduction to the cell multiprocessor
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy, "Introduction to the cell multiprocessor," IBM J. Res. Dev., vol. 49, no. 4/5, pp. 589-604, 2005.
- (2005) IBM J. Res. Dev. , vol.49 , Issue.4-5 , pp. 589-604
- Kahle, J.A.¹ Day, M.N.² Hofstee, H.P.³ Johns, C.R.⁴ Maeurer, T.R.⁵ Shippy, D.⁶

10
- 33846818766
- Examining the viability of FPGA supercomputing
- S. Craven and P. Athanas, "Examining the viability of FPGA supercomputing," EURASIP J. Embedded Syst., vol.2007, no.1, pp. 13-13, 2007.
- (2007) EURASIP J. Embedded Syst. , vol.2007 , Issue.1 , pp. 13-13
- Craven, S.¹ Athanas, P.²

11
- 44849137198
- Nvidia tesla: A unified graphics and computing architecture
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "Nvidia tesla: A unified graphics and computing architecture," IEEE Micro, vol.28, no.2, pp. 39-55, 2008.
- (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
- Lindholm, E.¹ Nickolls, J.² Oberman, S.³ Montrym, J.⁴

12
- 72049114039
- NVIDIA
- NVIDIA, CUDA Programming Guide 2.2, 2009.
- (2009) CUDA Programming Guide 2 , vol.2

13
- 0018515759
- Basic linear algebra subprograms for fortran usage
- C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, "Basic linear algebra subprograms for fortran usage," ACM Trans. Math. Softw., vol.5, no.3, pp. 308-323, 1979.
- (1979) ACM Trans. Math. Softw. , vol.5 , Issue.3 , pp. 308-323
- Lawson, C.L.¹ Hanson, R.J.² Kincaid, D.R.³ Krogh, F.T.⁴

14
- 72049097299
- NVIDIA
- NVIDIA, cuda CUBLAS Library 2.1, 2008.
- (2008) Cuda CUBLAS Library 2 , vol.1

15
- 72049122269
- Intel
- Intel, Intel Math Kernel Library for Linux* OS, 2009.
- (2009) Intel Math Kernel Library for Linux* OS

16
- 85008989421
- Hint: A new way to measure computer performance
- J. Gustafson and Q. Snell, "Hint: A new way to measure computer performance," Hawaii International Conference on System Sciences, vol.0, p. 392, 1995.
- (1995) Hawaii International Conference on System Sciences , pp. 392
- Gustafson, J.¹ Snell, Q.²

17
- 84999370993
- The linpack benchmark : AAAn explanation
- NewYork,NY, USA: Springer-Verlag New York, Inc.
- J. J. Dongarra, "the linpack benchmark : an explanation," in Proceedings of the 1st International Conference on Supercomputing. NewYork,NY, USA: Springer-Verlag New York, Inc., 1988, pp. 456-474.
- (1988) Proceedings of the 1st International Conference on Supercomputing , pp. 456-474
- Dongarra, J.J.¹

18
- 72049109841
- Intel. [Online]. Available
- Intel, thread Affinity Interface. [Online]. Available: http://software.intel.com/en-us/intel-compilers/
- Thread Affinity Interface

19
- 23944462603
- Gpu cluster for high performance computing
- Washington, DC, USA: IEEE Computer Society
- Z. Fan, F. Qiu, A. Kaufman, and S. Yoakum-Stover, "Gpu cluster for high performance computing," in SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing. Washington, DC, USA: IEEE Computer Society, 2004, p. 47.
- (2004) SC '04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing , pp. 47
- Fan, Z.¹ Qiu, F.² Kaufman, A.³ Yoakum-Stover, S.⁴

20
- 50949166640
- Evaluation and tuning of the level 3 cublas for graphics processors
- 1-8, April
- S. Barrachina, M. Castillo, F. Igual, R. Mayo, and E. Quintana-Orti, "Evaluation and tuning of the level 3 cublas for graphics processors," Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pp. 1-8, April 2008.
- (2008) Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
- Barrachina, S.¹ Castillo, M.² Igual, F.³ Mayo, R.⁴ Quintana-Orti, E.⁵

21
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded gpu using cuda
- New York, NY, USA: ACM
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei W. Hwu, "Optimization principles and application performance evaluation of a multithreaded gpu using cuda," in PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. New York, NY, USA: ACM, 2008, pp. 73-82.
- (2008) PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 73-82
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Mei, W.⁶ Hwu, W.⁷

22
- 70350771131
- Benchmarking gpus to tune dense linear algebra
- V. Volkov and J. Demmel, "Benchmarking gpus to tune dense linear algebra," in SC, 2008, p. 31.
- (2008) SC , pp. 31
- Volkov, V.¹ Demmel, J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.