SCOPUS 정보 검색 플랫폼

Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC'05

Volumn 2005, Issue , 2005, Pages

LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware

(4) Galoppo, Nico a Govindaraju, Naga K a Henson, Michael a Manocha, Dinesh a

a UNIVERSITY OF NORTH CAROLINA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

FRAGMENT PROCESSORS; GRAPHICS HARDWARE; GRAPHICS PROCESSOR UNITS (GPU);

ALGORITHMS; BANDWIDTH; BUFFER STORAGE; COMPUTER HARDWARE; LINEAR SYSTEMS; PARALLEL PROCESSING SYSTEMS;

MICROPROCESSOR CHIPS;

EID: 33845468997 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/SC.2005.42 Document Type: Conference Paper

Times cited : (159)

References (37)

1
- 4644295630
- Evaluating the imagine stream architecture
- AHN, J. H., DALLY, W. J., KHAILANY, B., KAPASI, U. J., AND DAS, A. 2004. Evaluating the imagine stream architecture. In Proceedings of the 31st Annual International Symposium on Computer Architexture, Munich, Germany.
- (2004) Proceedings of the 31st Annual International Symposium on Computer Architexture, Munich, Germany
- Ahn, J.H.¹ Dally, W.J.² Khailany, B.³ Kapasi, U.J.⁴ Das, A.⁵

2
- 0242533311
- Sparse matrix solvers on the gpu: Conjugate gradients and multigrid
- BOLZ, J., FARMER, I., GRINSPUN, E., AND SCHRÖDER, P. 2003. Sparse matrix solvers on the gpu: conjugate gradients and multigrid. ACM Trans. Graph. 22, 3, 917-924.
- (2003) ACM Trans. Graph. , vol.22 , Issue.3 , pp. 917-924
- Bolz, J.¹ Farmer, I.² Grinspun, E.³ Schröder, P.⁴

3
- 10644248153
- Brook for gpus: Stream computing on graphics hardware
- BUCK., I., FOLEY, T., HORN, D., SUGERMAN, J., FATAHALIAN, K., HOUSTON, M., AND HANRAHAN, P. 2004. Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph. 23, 3, 777-786.
- (2004) ACM Trans. Graph. , vol.23 , Issue.3 , pp. 777-786
- Buck, I.¹ Foley, T.² Horn, D.³ Sugerman, J.⁴ Fatahalian, K.⁵ Houston, M.⁶ Hanrahan, P.⁷

4
- 0032659795
- Recursive array layouts and fast parallel matrix multiplication
- CHATTERJEE, S., LEBECK, A. R., PATNALA, P. K., AND THOTTETHODI, M. 1999. Recursive array layouts and fast parallel matrix multiplication. In ACM Symposium on Parallel Algorithms and Architectures, 222-231.
- (1999) ACM Symposium on Parallel Algorithms and Architectures , pp. 222-231
- Chatterjee, S.¹ Lebeck, A.R.² Patnala, P.K.³ Thottethodi, M.⁴

5
- 10444232320
- Merrimac: Supercomputing with streams
- DALLY, W. J., HANRAHAN, P., EREZ, M., KNIGHT, T. J., LABONTE, F., A., J.-H., JAYASENA, N., KAPASI, U. J., DAS, A., GUMMARAJU, J., AND BUCK, I. 2003. Merrimac: Supercomputing with streams. In SC'03.
- (2003) SC'03
- Dally, W.J.¹ Hanrahan, P.² Erez, M.³ Knight, T.J.⁴ Labonte, F.A.J.-H.⁵ Jayasena, N.⁶ Kapasi, U.J.⁷ Das, A.⁸ Gummaraju, J.⁹ Buck, I.¹⁰

6
- 0003424372
- SIAM Books
- DEMMEL, J. W. 1997. Applied Numerical Linear Algebra. SIAM Books.
- (1997) Applied Numerical Linear Algebra
- Demmel, J.W.¹

7
- 0003851784
- SIAM
- DONGARRA, J. J., DUFF, I. S., SORENSEN, D. C., AND VAN DER VORST, H. A. 1998. Numerical Linear Algebra for High-Performance Computers. SIAM.
- (1998) Numerical Linear Algebra for High-performance Computers
- Dongarra, J.J.¹ Duff, I.S.² Sorensen, D.C.³ Van Der Vorst, H.A.⁴

8
- 0042674307
- The LINPACK benchmark: Past, present, and future
- DONGARRA, J. J., LUSZCZEK., P., AND PETITET, A. 2003. The LINPACK benchmark: Past, present, and future. Concurrency and Computation: Practice and Experience 15, 1-18.
- (2003) Concurrency and Computation: Practice and Experience , vol.15 , pp. 1-18
- Dongarra, J.J.¹ Luszczek, P.² Petitet, A.³

9
- 84934343786
- Analysis and performance results of a molecular modeling application on merrimac
- EREZ, M., AHN, J., GARG, A., DALLY, W. J., AND DARVE, E. 2004. Analysis and Performance Results of a Molecular Modeling Application on Merrimac. In SC'04.
- (2004) SC'04
- Erez, M.¹ Ahn, J.² Garg, A.³ Dally, W.J.⁴ Darve, E.⁵

10
- 23944462603
- Gpu cluster for high performance computing
- FAN, Z., QIU, F., KAUFMAN, A., AND YOAKUM-STOVER, S. 2004. Gpu cluster for high performance computing. In ACM/IEEE Supercomputing Conference 2004.
- (2004) ACM/IEEE Supercomputing Conference 2004
- Fan, Z.¹ Qiu, F.² Kaufman, A.³ Yoakum-Stover, S.⁴

11
- 78651269052
- Understanding the efficiency of gpu algorithms for matrix-matrix multiplication
- Eurographics Association
- FATAHALIAN, K., SUGERMAN, J., AND HANRAHAN, P. 2004. Understanding the efficiency of gpu algorithms for matrix-matrix multiplication. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, Eurographics Association.
- (2004) Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware
- Fatahalian, K.¹ Sugerman, J.² Hanrahan, P.³

12
- 33845440618
- Tech. rep., University of Dortmund, Germany
- GÖDDEKE, D. 2005. Gpgpu performance tuning. Tech. rep., University of Dortmund, Germany. http://www.mathematik.uni-dortmund.de/~goeddeke/ gpgpu/.
- (2005) Gpgpu Performance Tuning
- Göddeke, D.¹

13
- 84860038197
- A GPU benchmarking suite
- GPUBENCH, Los Angeles
- GPUBENCH, 2004. A GPU benchmarking suite. GP2 Workshop, Los Angeles. Available online: http://graphics.stanford.edu/projects/gpubanch/.
- (2004) GP2 Workshop

14
- 2342641297
- Addison Wesley
- GRAMA, A., GUPTA, A., KARYPIS, G., AND KUMAR, V. 2003. Introduction to Parallel Computing (2nd ed.). Addison Wesley.
- (2003) Introduction to Parallel Computing (2nd Ed.)
- Grama, A.¹ Gupta, A.² Karypis, G.³ Kumar, V.⁴

15
- 0030677581
- The design and analysis of a cache architecture for texture mapping
- HAKURA, Z., AND GUPTA, A. 1997. The design and analysis of a cache architecture for texture mapping. In Proc. of the 24th International Symposium on Computer Architecture, 108-120.
- (1997) Proc. of the 24th International Symposium on Computer Architecture , pp. 108-120
- Hakura, Z.¹ Gupta, A.²

16
- 10644280791
- Technical Report UIUCDCS-R-2003-2328, University of Illinois at Urbana-Champaign
- HALL, J. D., CARR, N., AND HART, J. 2003. Cache and bandwidth aware matrix multiplication on the gpu. Technical Report UIUCDCS-R-2003-2328, University of Illinois at Urbana-Champaign.
- (2003) Cache and Bandwidth Aware Matrix Multiplication on the Gpu
- Hall, J.D.¹ Carr, N.² Hart, J.³

17
- 84860056799
- HAMMERSTONE, CRAIGHEAD, AND AKELEY, 2003. ARB_vertex_buffer_object OpenGL specification. http://oss.sgi.com/projects/ogl-sample/ registry/ARB/ vartex_buffer_object.txt.
- (2003) ARB_vertex_buffer_object OpenGL Specification
- Hammerstone¹ Craighead² Akeley³

18
- 78651284090
- Simularion of cloud dynamics on graphics hardware
- Eurographics Association, Aire-la-Ville, Switzerland, Switzerland
- HARRIS, M. J., BAXTER, W. V., SCHEUERMANN, T., AND LASTRA, A. 2003. Simularion of cloud dynamics on graphics hardware. In HWWS '03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 92-101.
- (2003) HWWS '03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware , pp. 92-101
- Harris, M.J.¹ Baxter, W.V.² Scheuermann, T.³ Lastra, A.⁴

19
- 85019066865
- Visual simulation of ice crystal growth
- KIM, T., AND LIN, M. 2003. Visual simulation of ice crystal growth. In Proc. of ACM SIGGRAPH/Eurographics Symposium on Computer Animcation.
- (2003) Proc. of ACM SIGGRAPH/Eurographics Symposium on Computer Animcation
- Kim, T.¹ Lin, M.²

20
- 0242533310
- Linear algebra operators for gpu implementation of numerical algorithms
- KRÜGER, J., AND WESTERMANN, R. 2003. Linear algebra operators for gpu implementation of numerical algorithms. ACM Trans. Graph. 22, 3, 908-916.
- (2003) ACM Trans. Graph. , vol.22 , Issue.3 , pp. 908-916
- Krüger, J.¹ Westermann, R.²

21
- 85059252992
- Fast matrix multiplies using graphics hardware
- ACM Press
- LARSEN, E. S., AND MCALLISTER, D. 2001. Fast matrix multiplies using graphics hardware. In Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), ACM Press, 55-55.
- (2001) Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (CDROM) , pp. 55-55
- Larsen, E.S.¹ McAllister, D.²

22
- 33845466554
- LASTRA, A., LIN, M., AND MANOCHA, D. 2004. ACM workshop on general purpose computation on graphics processors.
- (2004) ACM Workshop on General Purpose Computation on Graphics Processors
- Lastra, A.¹ Lin, M.² Manocha, D.³

23
- 10644238428
- Shader algebra
- MCCOOL, M., TOIT, S. D., POPA, T., CHAN, B., AND MOULE, K. 2004. Shader algebra. ACM Trans. Graph. 23, 3, 787-795.
- (2004) ACM Trans. Graph. , vol.23 , Issue.3 , pp. 787-795
- McCool, M.¹ Toit, S.D.² Popa, T.³ Chan, B.⁴ Moule, K.⁵

24
- 33750897917
- Parallel gaussian elimination using openmp and mpi
- MCGINN, S., AND SHAW, R. 2002. Parallel gaussian elimination using openmp and mpi. In Proceedings. 16th Annual International Symposium on High Performance Computing Systems and Applications, 169-173.
- (2002) Proceedings. 16th Annual International Symposium on High Performance Computing Systems and Applications , pp. 169-173
- McGinn, S.¹ Shaw, R.²

25
- 84860039921
- Technical report
- MCLEOD, I., AND YU, H. 2002. Timing comparisons of mathematica, matlab, r, s-plus, c & fortran. Technical report. Available online: http://fisher.stats. uwo.ca/faculty/aim/epubs/MatrixInverseTiming/dafault.htm.
- (2002) Timing Comparisons of Mathematica, Matlab, R, S-plus, C & Fortran
- McLeod, I.¹ Yu, H.²

26
- 33845385759
- Closure models for the computation of dilute bubbly flows
- Forschungszentrum Karlsruhe, April
- MITRAN, S. 2000. Closure models for the computation of dilute bubbly flows. Wissenschaftliche Belichte FZKA 6357, Forschungszentrum Karlsruhe, April.
- (2000) Wissenschaftliche Belichte FZKA , vol.6357
- Mitran, S.¹

27
- 85022121901
- Tech. rep
- NVIDIA CORPORATION. 2004. The geforce 6 series of gpus high performance and quality for complex image effects. Tech. rep. Available on: http://www.nvidia. com/object/IO_12394.html.
- (2004) The Geforce 6 Series of Gpus High Performance and Quality for Complex Image Effects

28
- 84934325826
- Scientific computations on modern parallel vector systems
- OLIKER, L., CANNING, A., CARTER, J., AND SHALF, J. 2004. Scientific computations on modern parallel vector systems. In Supercomputing 2004.
- (2004) Supercomputing 2004
- Oliker, L.¹ Canning, A.² Carter, J.³ Shalf, J.⁴

29
- 0029754520
- Parallel algorithm and architecture for two-step division-free gaussian elimination
- PENG, S., SEDUKHIN, S., AND SEDUKHIN, I. 1996. Parallel algorithm and architecture for two-step division-free gaussian elimination. In 1996 International Conference on Application-Specific Systems, Architectures and Processors (ASAP'96).
- (1996) 1996 International Conference on Application-Specific Systems, Architectures and Processors (ASAP'96)
- Peng, S.¹ Sedukhin, S.² Sedukhin, I.³

30
- 33845461400
- Using graphics cards for quantized FEM computations
- RUMPF, M., AND STRZODKA, R. 2001. Using graphics cards for quantized FEM computations. In Proc. of IASTED Visualization, Imaging and Image Processing Conference (VIIP'01), 193-202.
- (2001) Proc. of IASTED Visualization, Imaging and Image Processing Conference (VIIP'01) , pp. 193-202
- Rumpf, M.¹ Strzodka, R.²

31
- 0030127191
- Kinetic theory for bubbly flow i: Collisionless case
- RUSSO, G., AND SMEREKA, P. 1996. Kinetic theory for bubbly flow i: Collisionless case. SIAM J. Appl. Math. 56, 2, 327-357.
- (1996) SIAM J. Appl. Math. , vol.56 , Issue.2 , pp. 327-357
- Russo, G.¹ Smereka, P.²

32
- 0030127881
- Kinetic theory for bubbly flow II: Fluid dynamic limit
- RUSSO, G., AND SMEREKA, P. 1996. Kinetic theory for bubbly flow II: Fluid dynamic limit. SIAM J. Appl. Math. 56, 2, 358-371.
- (1996) SIAM J. Appl. Math. , vol.56 , Issue.2 , pp. 358-371
- Russo, G.¹ Smereka, P.²

33
- 0038345686
- A performance analysis of pim, stream processing, and tiled processing on memory-intensive signal processing kernels
- SUH, J., KIM, E.-G., CRAGO, S. P., SRINIVASAN, L., AND FRENCH, M. C. 2003. A performance analysis of pim, stream processing, and tiled processing on memory-intensive signal processing kernels. In Proceedings of the International Symposium on Computer Architecture.
- (2003) Proceedings of the International Symposium on Computer Architecture
- Suh, J.¹ Kim, E.-G.² Crago, S.P.³ Srinivasan, L.⁴ French, M.C.⁵

34
- 0036505033
- The raw microprocessor: A computational fabric for software circuits and general purpose programs
- TAYLOR, M. B., KIM, J., MILLER, J., WENTZLAFF, D., GHODRAT, F., GREENWALD, B., HOFFMANN, H., JOHNSON, P., LEE, J.-W., LEE, W., MA, A., SARAF, A., SBNESKI, M., SHNIDMAN, N., FRANK, V. S. M., AMARASINGHE, S., AND AGARWAL, A. 2002. The raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro.
- (2002) IEEE Micro
- Taylor, M.B.¹ Kim, J.² Miller, J.³ Wentzlaff, D.⁴ Ghodrat, F.⁵ Greenwald, B.⁶ Hoffmann, H.⁷ Johnson, P.⁸ Lee, J.-W.⁹ Lee, W.¹⁰ Ma, A.¹¹ Saraf, A.¹² Sbneski, M.¹³ Shnidman, N.¹⁴ Frank, V.S.M.¹⁵ Amarasinghe, S.¹⁶ Agarwal, A.¹⁷

35
- 0011916941
- Tuning Strassen's matrix multiplication for memory efficiency
- IEEE Computer Society
- THOTTETHODI, M. S. 1998. Tuning Strassen's matrix multiplication for memory efficiency. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), IEEE Computer Society, 1-14.
- (1998) Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM) , pp. 1-14
- Thottethodi, M.S.¹

36
- 0343462141
- Automated empirical optimization of software and the ATLAS project
- WHALEY, R. C., PETITET, A., AND DONGARRA, J. J. 2001. Automated empirical optimization of software and the ATLAS project. Parallel Computing 27, 1-2, 3-35
- (2001) Parallel Computing , vol.27 , Issue.1-2 , pp. 3-35
- Whaley, R.C.¹ Petitet, A.² Dongarra, J.J.³

37
- 84860039782
- . Also available as University of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000 (www.netlib.org/lapack/lawns/lawn147.ps).
- (2000) University of Tennessee LAPACK Working Note #147, UT-CS-00-448

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.