SCOPUS 정보 검색 플랫폼

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008

Volumn , Issue , 2008, Pages

Benchmarking GPUs to tune dense linear algebra

(2) Volkov, Vasily a Demmel, James W a

a UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMIC OPTIMIZATION; CHOLESKY FACTORIZATIONS; DENSE LINEAR ALGEBRA; MATRIX; MEMORY SYSTEMS; MULTI CORE; MULTITHREADED;

ALGEBRA; BENCHMARKING; COMPUTER GRAPHICS EQUIPMENT; COMPUTER SCIENCE;

PROGRAM PROCESSORS;

EID: 70350771131 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/SC.2008.5214359 Document Type: Conference Paper

Times cited : (635)

References (23)

1
- 70350758060
- ABTS, D., BATAINEH, A., SCOTT, S., FAANES, G., SCHWARZMEIER, J., LUNDBERG, E., JOHNSON, T., BYE, M., AND SCHWOERER, G. 2007. The Cray BlackWidow: A Highly Scalable Vector Multiprocessor, SC'07. AGARWAL R. C., AND GUSTAVSON, F.G. 1989. Vector and parallel algorithms for Cholesky factorization on IBM 3090, Supercomputing' 89, 225-233.
- ABTS, D., BATAINEH, A., SCOTT, S., FAANES, G., SCHWARZMEIER, J., LUNDBERG, E., JOHNSON, T., BYE, M., AND SCHWOERER, G. 2007. The Cray BlackWidow: A Highly Scalable Vector Multiprocessor, SC'07. AGARWAL R. C., AND GUSTAVSON, F.G. 1989. Vector and parallel algorithms for Cholesky factorization on IBM 3090, Supercomputing' 89, 225-233.

2
- 70350762187
- ALVERSON, R., CALLAHAN, D., CUMMINGS, D., KOBLENZ, B., PORTERFIELD, A., AND SMITH, B. 1990. The Tera Computer System, ICS'90, 1-6. AMD. 2006. ATI CTM Guide, version 1.01.
- ALVERSON, R., CALLAHAN, D., CUMMINGS, D., KOBLENZ, B., PORTERFIELD, A., AND SMITH, B. 1990. The Tera Computer System, ICS'90, 1-6. AMD. 2006. ATI CTM Guide, version 1.01.

3
- 0025536635
- LAPACK: A portable linear algebra library for high-performance computers
- ANDERSON, E., BAI, Z., DONGARRA, J., GREENBAUM, A., MCKENNEY, A., DU CROZ, J., HAMMERLING, S., DEMMEL, J., BISCHOF, C., AND SORENSEN, D. 1990. LAPACK: a portable linear algebra library for high-performance computers, Supercomputing' 90, 2-11.
- (1990) Supercomputing , vol.90 , pp. 2-11
- ANDERSON, E.¹ BAI, Z.² DONGARRA, J.³ GREENBAUM, A.⁴ MCKENNEY, A.⁵ DU CROZ, J.⁶ HAMMERLING, S.⁷ DEMMEL, J.⁸ BISCHOF, C.⁹ SORENSEN, D.¹⁰

4
- 70350767597
- LINPACK Benchmark Optimizations on a Virtual Processor Grid
- ANDERSON, E., BRANDT, M., AND YANG, C. 2004. LINPACK Benchmark Optimizations on a Virtual Processor Grid, In Cray User Group 2004 Proceedings.
- (2004) Cray User Group 2004 Proceedings
- ANDERSON, E.¹ BRANDT, M.² YANG, C.³

5
- 70350780783
- BABOULIN, M., DONGARRA J., AND TOMOV, S. 2008. Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, Technical Report UT-CS-08-200, University of Tennessee, May 6, 2008 (also LAPACK Working Note 200).
- BABOULIN, M., DONGARRA J., AND TOMOV, S. 2008. Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, Technical Report UT-CS-08-200, University of Tennessee, May 6, 2008 (also LAPACK Working Note 200).

6
- 70350625607
- Solving Dense Linear Systems on Graphics Processors
- 02-02-2008, Universidad Jaime I, February
- BARRACHINA, S., CASTILLO, M., IGUAL, F. D., MAYO, R, AND QUINTANA-ORTI, E. S. 2008. Solving Dense Linear Systems on Graphics Processors, Technical Report ICC 02-02-2008, Universidad Jaime I, February 2008.
- (2008) Technical Report ICC
- BARRACHINA, S.¹ CASTILLO, M.² IGUAL, F.D.³ MAYO, R.⁴ QUINTANA-ORTI, E.S.⁵

7
- 57349180412
- A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs
- BASKARAN, M., BONDHUGULA, U., KRISHNAMOORTHY, S., RAMANUJAM, J., ROUNTEV, A., AND SADAYAPPAN, P. 2008. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs, ISC'08.
- (2008) ISC , vol.8
- BASKARAN, M.¹ BONDHUGULA, U.² KRISHNAMOORTHY, S.³ RAMANUJAM, J.⁴ ROUNTEV, A.⁵ SADAYAPPAN, P.⁶

8
- 0039645052
- An adaptive blocking strategy for matrix factorization
- BISCHOF, C. H., AND LACROUTE, P. G. 1990. An adaptive blocking strategy for matrix factorization, in Proceedings of the Joint International Conference on Vector and Parallel Processing, 210-221.
- (1990) Proceedings of the Joint International Conference on Vector and Parallel Processing , pp. 210-221
- BISCHOF, C.H.¹ LACROUTE, P.G.²

9
- 70350767593
- CASTILLO, M., CHAN, E., IGUAL, F. D., MAYO, R., QUINTANAORTI, E. S., QUINTANA-ORTI, G., VAN DE GEIJN, R., AND VAN ZEE, F. G. 2008. Making Programming Synonymous with Programming for Linear Algebra Libraries, FLAME Working Note #31. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-20, April 17, 2008.
- CASTILLO, M., CHAN, E., IGUAL, F. D., MAYO, R., QUINTANAORTI, E. S., QUINTANA-ORTI, G., VAN DE GEIJN, R., AND VAN ZEE, F. G. 2008. Making Programming Synonymous with Programming for Linear Algebra Libraries, FLAME Working Note #31. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-20, April 17, 2008.

10
- 0030244536
- CHOI, J., DONGARRA, J. J., OSTROUCHOV, L. S., PETITET, A. P., WALKER, D. W., AND WHALEY, R. C. 1996. The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines, Scientific Programming 5, 3, 173-184 (also LAPACK Working Note 80).
- CHOI, J., DONGARRA, J. J., OSTROUCHOV, L. S., PETITET, A. P., WALKER, D. W., AND WHALEY, R. C. 1996. The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines, Scientific Programming 5, 3, 173-184 (also LAPACK Working Note 80).

11
- 70350771484
- DONGARRA, J., DUFF, I. S., SORENSEN, D. C., AND VAN DER VORST, H. A. 1998. Numerical Linear Algebra for High-Performance Computers, SIAM.
- DONGARRA, J., DUFF, I. S., SORENSEN, D. C., AND VAN DER VORST, H. A. 1998. Numerical Linear Algebra for High-Performance Computers, SIAM.

12
- 0025402476
- A Set of Level 3 Basic Linear Algebra Subprograms
- DONGARRA, J. J., DU CROZ, J., HAMMARLING, S., AND DUFF, I. 1990. A Set of Level 3 Basic Linear Algebra Subprograms, ACM Transactions on Mathematical Software 16, 1, 1-17.
- (1990) ACM Transactions on Mathematical Software , vol.16 , Issue.1 , pp. 1-17
- DONGARRA, J.J.¹ DU CROZ, J.² HAMMARLING, S.³ DUFF, I.⁴

13
- 70350769644
- DONGARRA, J., AND OSTROUCHOV, S. 1990. LAPACK Block Factorization Algorithms on the Intel iPSC/860, Technical Report CS-90-115, University of Tennessee (also LAPACK Working Note 24).
- DONGARRA, J., AND OSTROUCHOV, S. 1990. LAPACK Block Factorization Algorithms on the Intel iPSC/860, Technical Report CS-90-115, University of Tennessee (also LAPACK Working Note 24).

14
- 33845468997
- LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
- GALOPPO, N., GOVINDARAJU, N. K., HENSON, M., AND MANOCHA, D. 2005. LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware, SC'05.
- (2005) SC , vol.5
- GALOPPO, N.¹ GOVINDARAJU, N.K.² HENSON, M.³ MANOCHA, D.⁴

15
- 34548292052
- A Memory Model for Scientific Algorithms on Graphcs Processors
- GOVINDARAJU, N. K., LARSEN, S., GRAY, J., AND MANOCHA, D. 2006. A Memory Model for Scientific Algorithms on Graphcs Processors, SC'06.
- (2006) SC , vol.6
- GOVINDARAJU, N.K.¹ LARSEN, S.² GRAY, J.³ MANOCHA, D.⁴

16
- 78651269052
- Understanding the efficiency of GPU algorithms for matrixmatrix multiplication
- FATAHALIAN, K., SUGERMAN, J., AND HANRAHAN, P. 2004. Understanding the efficiency of GPU algorithms for matrixmatrix multiplication, In Graphics Hardware 2004, 133-137.
- (2004) Graphics Hardware 2004 , pp. 133-137
- FATAHALIAN, K.¹ SUGERMAN, J.² HANRAHAN, P.³

17
- 56849107345
- Efficient Gather and Scatter Operations on Graphics Processors
- HE, B., GOVINDARAJU, N. K., LUO, Q., AND SMITH, B. 2007. Efficient Gather and Scatter Operations on Graphics Processors, SC'07.
- (2007) SC , vol.7
- HE, B.¹ GOVINDARAJU, N.K.² LUO, Q.³ SMITH, B.⁴

18
- 70350769643
- HWU, W. W., AND KIRK, D. 2007. ECE 498 AL1: Programming Massively Parallel Processors, Lecture Slides, University of Illinois, Urbana-Champaign. NVIDIA. 2006.
- HWU, W. W., AND KIRK, D. 2007. ECE 498 AL1: Programming Massively Parallel Processors, Lecture Slides, University of Illinois, Urbana-Champaign. NVIDIA. 2006.

19
- 70350777041
- NVIDIA GeForce 8800 GPU Architecture Overview, Technical Brief, November 2006.
- NVIDIA GeForce 8800 GPU Architecture Overview, Technical Brief, November 2006.

20
- 64549155924
- NVIDIA, Version 2.0
- NVIDIA. 2008a. NVIDIA CUDA Compute Unified Device Architecture, Programming Guide, Version 2.0.
- (2008) NVIDIA CUDA Compute Unified Device Architecture, Programming Guide

21
- 70350756074
- NVIDIA, Technical Brief, May
- NVIDIA. 2008b. NVIDIA GeForce GTX 200 GPU Architectural Overview, Technical Brief, May 2008.
- (2008) NVIDIA GeForce GTX 200 GPU Architectural Overview

22
- 70350769642
- QUINTANA-ORTI, G., IGUAL, F. D., QUINTANA-ORTI, E. S., AND VAN DE GEIJN, R. 2008. Solving Dense Linear Systems on Platforms with Multiple Hardware Accelerators, FLAME Working Note #32. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-22. May 9, 2008.
- QUINTANA-ORTI, G., IGUAL, F. D., QUINTANA-ORTI, E. S., AND VAN DE GEIJN, R. 2008. Solving Dense Linear Systems on Platforms with Multiple Hardware Accelerators, FLAME Working Note #32. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-22. May 9, 2008.

23
- 79959466764
- Optimization Principles and Application Performance Evaluation of a Multithreaded GPU using CUDA
- ACM Press
- RYOO, S., RODRIGUES, C. I., BAGHSORKHI, S. S., STONE, S. S., KIRK, D. B., AND HWU, W. W. 2008. Optimization Principles and Application Performance Evaluation of a Multithreaded GPU using CUDA, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM Press, 2008, 73-82.
- (2008) Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 73-82
- RYOO, S.¹ RODRIGUES, C.I.² BAGHSORKHI, S.S.³ STONE, S.S.⁴ KIRK, D.B.⁵ HWU, W.W.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.