SCOPUS 정보 검색 플랫폼

Parallel Computing

Volumn 36, Issue 5-6, 2010, Pages 232-240

Towards dense linear algebra for hybrid GPU accelerated manycore systems

(3) Tomov, Stanimire a Dongarra, Jack a,b,c Baboulin, Marc a,d

a University of Tennessee (United States)

b OAK RIDGE NATIONAL LABORATORY (United States)

c UNIVERSITY OF MANCHESTER (United Kingdom)

d UNIVERSITY OF COIMBRA (Portugal)

Author keywords

Dense linear algebra; Graphics processing units; Hybrid computing; Multicore processors; Parallel algorithms

Indexed keywords

DENSE LINEAR ALGEBRA; EFFICIENT ALGORITHM; GRAPHICS PROCESSING UNIT; GRAPHICS PROCESSOR; HIGH PERFORMANCE COMPUTING; HYBRID COMPONENT; HYBRID COMPUTING; LU FACTORIZATION; MANY-CORE; MULTI CORE; MULTI-CORE PROCESSOR;

ALGEBRA; LEARNING ALGORITHMS; NANOTECHNOLOGY; PARALLEL ALGORITHMS; PARALLEL ARCHITECTURES; PROGRAM PROCESSORS;

COMPUTER GRAPHICS EQUIPMENT;

EID: 77953130739 PISSN: 01678191 EISSN: None Source Type: Journal
DOI: 10.1016/j.parco.2009.12.005 Document Type: Article

Times cited : (366)

References (38)

1
- 0003706460
- SIAM, third ed
- E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen, LAPACK user's guide, SIAM, third ed., 1999.
- (1999) LAPACK user's guide
- Anderson, E.¹ Bai, Z.² Bischof, C.³ Blackford, S.⁴ Demmel, J.⁵ Dongarra, J.⁶ Du Croz, J.⁷ Greenbaum, A.⁸ Hammarling, S.⁹ McKenney, A.¹⁰ Sorensen, D.¹¹

2
- 77953137041
- M. Baboulin, J. Dongarra, S. Tomov, Some issues in dense linear algebra for multicore and special purpose architectures, Technical Report UT-CS-08-615, University of Tennessee, 2008, LAPACK Working Note 200.
- M. Baboulin, J. Dongarra, S. Tomov, Some issues in dense linear algebra for multicore and special purpose architectures, Technical Report UT-CS-08-615, University of Tennessee, 2008, LAPACK Working Note 200.

3
- 77953128856
- Minimizing communication in linear algebra, Technical Report
- May
- G. Ballard, J. Demmel, O. Holtz, O. Schwartz, Minimizing communication in linear algebra, Technical Report, LAPACK Working Note 218, May 2009.
- (2009) LAPACK Working Note , vol.218
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Schwartz, O.⁴

4
- 70350625607
- Solving dense linear systems on graphics processors
- 02-02-2008, Universidad Jaime I, February
- S. Barrachina, M. Castillo, F. Igual, R. Mayo, E. Quintana-Ortí, Solving dense linear systems on graphics processors, Technical Report ICC 02-02-2008, Universidad Jaime I, February, 2008.
- (2008) Technical Report ICC
- Barrachina, S.¹ Castillo, M.² Igual, F.³ Mayo, R.⁴ Quintana-Ortí, E.⁵

5
- 38049058008
- The impact of multicore on math software, PARA, B. Kågström et al, Ed, Springer
- A. Buttari, J. Dongarra, J. Kurzak, J. Langou, P. Luszczek, S. Tomov, The impact of multicore on math software, PARA 2006, in: B. Kågström et al. (Ed.), Lecture Notes in Computer Science, vol. 4699, Springer, 2007, pp. 1-10.
- (2006) Lecture Notes in Computer Science , vol.4699 , pp. 1-10
- Buttari, A.¹ Dongarra, J.² Kurzak, J.³ Langou, J.⁴ Luszczek, P.⁵ Tomov, S.⁶

6
- 35548992612
- Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy
- Buttari A., Dongarra J., Kurzak J., Luszczek P., and Tomov S. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy. ACM Trans. Math. Software 34 4 (2008)
- (2008) ACM Trans. Math. Software , vol.34 , Issue.4
- Buttari, A.¹ Dongarra, J.² Kurzak, J.³ Luszczek, P.⁴ Tomov, S.⁵

7
- 77953131658
- A. Buttari, J. Langou, J. Kurzak, J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Technical Report UT-CS-07-600, University of Tennessee, 2007, LAPACK Working Note 191.
- A. Buttari, J. Langou, J. Kurzak, J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Technical Report UT-CS-07-600, University of Tennessee, 2007, LAPACK Working Note 191.

8
- 77953131513
- J. Demmel, J. Dongarra, B. Parlett, W. Kahan, M. Gu, D. Bindel, Y. Hida, X. Li, O. Marques, E. Riedy, C. Vömel, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Langou, S. Tomov, Prospectus for the next LAPACK and ScaLAPACK libraries, in: PARA'06: State-of-the-Art in Scientific and Parallel Computing (Umeå, Sweden), High Performance Computing Center North (HPC2N) and the Department of Computing Science, Umeå University, Springer, June 2006.
- J. Demmel, J. Dongarra, B. Parlett, W. Kahan, M. Gu, D. Bindel, Y. Hida, X. Li, O. Marques, E. Riedy, C. Vömel, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Langou, S. Tomov, Prospectus for the next LAPACK and ScaLAPACK libraries, in: PARA'06: State-of-the-Art in Scientific and Parallel Computing (Umeå, Sweden), High Performance Computing Center North (HPC2N) and the Department of Computing Science, Umeå University, Springer, June 2006.

9
- 77953133837
- J. Demmel, L. Grigori, M. Hoemmen, J. Langou, Communication-avoiding parallel and sequential QR factorizations, CoRR abs/0806.2159, 2008.
- J. Demmel, L. Grigori, M. Hoemmen, J. Langou, Communication-avoiding parallel and sequential QR factorizations, CoRR abs/0806.2159, 2008.

10
- 0042674307
- The LINPACK benchmark: past, present, and future
- Dongarra J., Luszczek P., and Petitet A. The LINPACK benchmark: past, present, and future. Concurrency and Computation: Practice and Experience 15 (2003) 820
- (2003) Concurrency and Computation: Practice and Experience , vol.15 , pp. 820
- Dongarra, J.¹ Luszczek, P.² Petitet, A.³

11
- 63249085119
- Exploring new architectures in accelerating CFD for air force applications
- July 14-17, 2008
- J. Dongarra, S. Moore, G. Peterson, S. Tomov, J. Allred, V. Natoli, D. Richie, Exploring new architectures in accelerating CFD for air force applications, in: Proceedings of HPCMP Users Group Conference 2008, July 14-17, 2008. .
- (2008) Proceedings of HPCMP Users Group Conference
- Dongarra, J.¹ Moore, S.² Peterson, G.³ Tomov, S.⁴ Allred, J.⁵ Natoli, V.⁶ Richie, D.⁷

12
- 78651269052
- Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
- New York, NY, USA, ACM
- K. Fatahalian, J. Sugerman, P. Hanrahan, Understanding the efficiency of GPU algorithms for matrix-matrix multiplication, in: HWWS '04: Proceedings of the ACM Siggraph/Eurographics Conference on Graphics Hardware (New York, NY, USA), ACM, 2004, pp. 133-137.
- (2004) HWWS '04: Proceedings of the ACM Siggraph/Eurographics Conference on Graphics Hardware , pp. 133-137
- Fatahalian, K.¹ Sugerman, J.² Hanrahan, P.³

13
- 67650686517
- Accelerating LINPACK with CUDA on heterogenous clusters
- New York, NY, USA, ACM
- M. Fatica, Accelerating LINPACK with CUDA on heterogenous clusters, in: GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units (New York, NY, USA), ACM, 2009, pp. 46-51.
- (2009) GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units , pp. 46-51
- Fatica, M.¹

14
- 33845468997
- LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware
- Washington, DC, USA, IEEE Computer Society
- N. Galoppo, N. Govindaraju, M. Henson, D. Manocha, LU-GPU: efficient algorithms for solving dense linear systems on graphics hardware, in: SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (Washington, DC, USA), IEEE Computer Society, 2005, p. 3.
- (2005) SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing , pp. 3
- Galoppo, N.¹ Govindaraju, N.² Henson, M.³ Manocha, D.⁴

15
- 77953136758
- L. Grigori, J. Demmel, H. Xiang, Communication avoiding Gaussian elimination, Technical Report 6523, INRIA, 2008.
- L. Grigori, J. Demmel, H. Xiang, Communication avoiding Gaussian elimination, Technical Report 6523, INRIA, 2008.

16
- 77953136682
- Wolfgang Gruener, Larrabee, CUDA and the quest for the free lunch, TGDaily. .
- Wolfgang Gruener, Larrabee, CUDA and the quest for the free lunch, TGDaily. .

17
- 0036457301
- SIAM
- Higham N. Accuracy and Stability of Numerical Algorithms. second ed. (2002), SIAM
- (2002) Accuracy and Stability of Numerical Algorithms. second ed.
- Higham, N.¹

18
- 77953127461
- AMD fusion now pushed back to 2011
- Hruska J. AMD fusion now pushed back to 2011. Art Technica (2008)
- (2008) Art Technica
- Hruska, J.¹

19
- 0032155271
- GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
- Kågström B., Ling P., and van Loan C. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Software 24 3 (1998) 268-302
- (1998) ACM Trans. Math. Software , vol.24 , Issue.3 , pp. 268-302
- Kågström, B.¹ Ling, P.² van Loan, C.³

20
- 34548206782
- Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems)
- New York, NY, USA, ACM
- Julie Langou, Julien Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Dongarra, Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems), in: SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (New York, NY, USA), ACM, 2006, p. 113.
- (2006) SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing , pp. 113
- Langou, J.¹ Langou, J.² Luszczek, P.³ Kurzak, J.⁴ Buttari, A.⁵ Dongarra, J.⁶

21
- 77953139684
- A note on auto-tuning GEMM for GPUs, Technical Report
- January
- Y. Li, J. Dongarra, S. Tomov, A note on auto-tuning GEMM for GPUs, Technical Report, LAPACK Working Note 212, January 2009.
- (2009) LAPACK Working Note , vol.212
- Li, Y.¹ Dongarra, J.² Tomov, S.³

22
- 77953137832
- NVIDIA, Nvidia Tesla doubles the performance for CUDA developers, Computer Graphics World (06/30/2008).
- NVIDIA, Nvidia Tesla doubles the performance for CUDA developers, Computer Graphics World (06/30/2008).

23
- 77953139608
- NVIDIA, NVIDIA CUDA Programming Guide, 6/07/2008, Version 2.0.
- NVIDIA, NVIDIA CUDA Programming Guide, 6/07/2008, Version 2.0.

24
- 49049088756
- GPU computing
- Owens J., Houston M., Luebke D., Green S., Stone J., and Phillips J. GPU computing. Proceedings of the IEEE 96 5 (2008) 879-899
- (2008) Proceedings of the IEEE , vol.96 , Issue.5 , pp. 879-899
- Owens, J.¹ Houston, M.² Luebke, D.³ Green, S.⁴ Stone, J.⁵ Phillips, J.⁶

25
- 33947588048
- A survey of general-purpose computation on graphics hardware
- Owens J., Luebke D., Govindaraju N., Harris M., Krüger J., Lefohn A., and Purcell T. A survey of general-purpose computation on graphics hardware. Comput. Graphics Forum 26 1 (2007) 80-113
- (2007) Comput. Graphics Forum , vol.26 , Issue.1 , pp. 80-113
- Owens, J.¹ Luebke, D.² Govindaraju, N.³ Harris, M.⁴ Krüger, J.⁵ Lefohn, A.⁶ Purcell, T.⁷

26
- 0038961419
- Random butterfly transformations with applications in computational linear algebra
- Technical Report CSD-950023, Computer Science Department, UCLA
- D. Parker, Random butterfly transformations with applications in computational linear algebra, Technical Report CSD-950023, Computer Science Department, UCLA, 1995.
- (1995)
- Parker, D.¹

27
- 0004609383
- The randomizing FFT: An alternative to pivoting in Gaussian elimination
- Technical Report CSD-950037, Computer Science Department, UCLA
- D. Parker, B. Pierce, The randomizing FFT: an alternative to pivoting in Gaussian elimination, Technical Report CSD-950037, Computer Science Department, UCLA, 1995.
- (1995)
- Parker, D.¹ Pierce, B.²

28
- 33645685697
- Addison-Wesley Professional
- Pharr M., and Fernando R. GPU Gems 2: Programming Techniques for High-performance Graphics and General-purpose Computation (gpu gems) (2005), Addison-Wesley Professional
- (2005) GPU Gems 2: Programming Techniques for High-performance Graphics and General-purpose Computation (gpu gems)
- Pharr, M.¹ Fernando, R.²

29
- 67650021816
- Solving dense linear systems on platforms with multiple hardware accelerators
- New York, NY, USA, ACM
- G. Quintana-Ortí, F.Igual, E.Quintana-Ortí, R. van de Geijn, Solving dense linear systems on platforms with multiple hardware accelerators, in: PPoPP '09: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New York, NY, USA), ACM, 2009, pp. 121-130.
- (2009) PPoPP '09: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 121-130
- Quintana-Ortí, G.¹ Igual, F.² Quintana-Ortí, E.³ van de Geijn, R.⁴

30
- 63249123336
- Programming algorithms-by-blocks for matrix computations on multithreaded architectures
- Technical Report TR-08-04, University of Texas at Austin, FLAME Working Note 29
- G. Quintana-Orti, E. Quintana-Orti, E. Chan, F. van Zee, R. van de Geijn, Programming algorithms-by-blocks for matrix computations on multithreaded architectures, Technical Report TR-08-04, University of Texas at Austin, 2008, FLAME Working Note 29.
- (2008)
- Quintana-Orti, G.¹ Quintana-Orti, E.² Chan, E.³ van Zee, F.⁴ van de Geijn, R.⁵

31
- 49249086142
- Larrabee: a many-core × 86 architecture for visual computing
- Seiler L., Carmean D., Sprangle E., Forsyth T., Abrash M., Dubey P., Junkins S., Lake A., Sugerman J., Cavin R., Espasa R., Grochowski E., Juan T., and Hanrahan P. Larrabee: a many-core × 86 architecture for visual computing. ACM Trans. Graph. 27 3 (2008) 1-15
- (2008) ACM Trans. Graph. , vol.27 , Issue.3 , pp. 1-15
- Seiler, L.¹ Carmean, D.² Sprangle, E.³ Forsyth, T.⁴ Abrash, M.⁵ Dubey, P.⁶ Junkins, S.⁷ Lake, A.⁸ Sugerman, J.⁹ Cavin, R.¹⁰ Espasa, R.¹¹ Grochowski, E.¹² Juan, T.¹³ Hanrahan, P.¹⁴

32
- 77953135476
- Special-purpose hardware and algorithms for accelerating dense linear algebra
- Atlanta, March 12-14, 2008
- S. Tomov, M. Baboulin, J. Dongarra, S. Moore, V. Natoli, G. Peterson, D. Richie, Special-purpose hardware and algorithms for accelerating dense linear algebra, in: Parallel Processing for Scientific Computing, Atlanta, March 12-14, 2008. .
- Parallel Processing for Scientific Computing
- Tomov, S.¹ Baboulin, M.² Dongarra, J.³ Moore, S.⁴ Natoli, V.⁵ Peterson, G.⁶ Richie, D.⁷

33
- 77953126652
- Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing
- Technical Report 219, LAPACK Working Note, May
- S. Tomov, J. Dongarra, Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing, Technical Report 219, LAPACK Working Note, May 2009.
- (2009)
- Tomov, S.¹ Dongarra, J.²

34
- 70350771131
- Benchmarking gpus to tune dense linear algebra
- Piscataway, NJ, USA, IEEE Press
- V. Volkov, J. Demmel, Benchmarking gpus to tune dense linear algebra, in: SC '08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (Piscataway, NJ, USA), IEEE Press, 2008, pp. 1-11.
- (2008) SC '08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing , pp. 1-11
- Volkov, V.¹ Demmel, J.²

35
- 77953134650
- Cholesky factorizations using vector capabilities of GPUs, Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley
- May
- LU, QR, Cholesky factorizations using vector capabilities of GPUs, Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley, May 2008.
- (2008)
- LU, Q.R.¹

36
- 77953136414
- Using GPUs to accelerate linear algebra routines, Poster at PAR lab winter retreat, January 9, 2008. <>
- Using GPUs to accelerate linear algebra routines, Poster at PAR lab winter retreat, January 9, 2008. .

37
- 84871131547
- General-purpose computation using graphics hardware, .
- General-purpose computation using graphics hardware

38
- 77953131876
- Nvidia cuda zone, NVIDIA. .
- Nvidia cuda zone, NVIDIA

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.