메뉴 건너뛰기




Volumn 36, Issue 5-6, 2010, Pages 232-240

Towards dense linear algebra for hybrid GPU accelerated manycore systems

Author keywords

Dense linear algebra; Graphics processing units; Hybrid computing; Multicore processors; Parallel algorithms

Indexed keywords

DENSE LINEAR ALGEBRA; EFFICIENT ALGORITHM; GRAPHICS PROCESSING UNIT; GRAPHICS PROCESSOR; HIGH PERFORMANCE COMPUTING; HYBRID COMPONENT; HYBRID COMPUTING; LU FACTORIZATION; MANY-CORE; MULTI CORE; MULTI-CORE PROCESSOR;

EID: 77953130739     PISSN: 01678191     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.parco.2009.12.005     Document Type: Article
Times cited : (355)

References (38)
  • 2
    • 77953137041 scopus 로고    scopus 로고
    • M. Baboulin, J. Dongarra, S. Tomov, Some issues in dense linear algebra for multicore and special purpose architectures, Technical Report UT-CS-08-615, University of Tennessee, 2008, LAPACK Working Note 200.
    • M. Baboulin, J. Dongarra, S. Tomov, Some issues in dense linear algebra for multicore and special purpose architectures, Technical Report UT-CS-08-615, University of Tennessee, 2008, LAPACK Working Note 200.
  • 6
    • 35548992612 scopus 로고    scopus 로고
    • Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy
    • Buttari A., Dongarra J., Kurzak J., Luszczek P., and Tomov S. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy. ACM Trans. Math. Software 34 4 (2008)
    • (2008) ACM Trans. Math. Software , vol.34 , Issue.4
    • Buttari, A.1    Dongarra, J.2    Kurzak, J.3    Luszczek, P.4    Tomov, S.5
  • 7
    • 77953131658 scopus 로고    scopus 로고
    • A. Buttari, J. Langou, J. Kurzak, J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Technical Report UT-CS-07-600, University of Tennessee, 2007, LAPACK Working Note 191.
    • A. Buttari, J. Langou, J. Kurzak, J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Technical Report UT-CS-07-600, University of Tennessee, 2007, LAPACK Working Note 191.
  • 8
    • 77953131513 scopus 로고    scopus 로고
    • J. Demmel, J. Dongarra, B. Parlett, W. Kahan, M. Gu, D. Bindel, Y. Hida, X. Li, O. Marques, E. Riedy, C. Vömel, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Langou, S. Tomov, Prospectus for the next LAPACK and ScaLAPACK libraries, in: PARA'06: State-of-the-Art in Scientific and Parallel Computing (Umeå, Sweden), High Performance Computing Center North (HPC2N) and the Department of Computing Science, Umeå University, Springer, June 2006.
    • J. Demmel, J. Dongarra, B. Parlett, W. Kahan, M. Gu, D. Bindel, Y. Hida, X. Li, O. Marques, E. Riedy, C. Vömel, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Langou, S. Tomov, Prospectus for the next LAPACK and ScaLAPACK libraries, in: PARA'06: State-of-the-Art in Scientific and Parallel Computing (Umeå, Sweden), High Performance Computing Center North (HPC2N) and the Department of Computing Science, Umeå University, Springer, June 2006.
  • 9
    • 77953133837 scopus 로고    scopus 로고
    • J. Demmel, L. Grigori, M. Hoemmen, J. Langou, Communication-avoiding parallel and sequential QR factorizations, CoRR abs/0806.2159, 2008.
    • J. Demmel, L. Grigori, M. Hoemmen, J. Langou, Communication-avoiding parallel and sequential QR factorizations, CoRR abs/0806.2159, 2008.
  • 15
    • 77953136758 scopus 로고    scopus 로고
    • L. Grigori, J. Demmel, H. Xiang, Communication avoiding Gaussian elimination, Technical Report 6523, INRIA, 2008.
    • L. Grigori, J. Demmel, H. Xiang, Communication avoiding Gaussian elimination, Technical Report 6523, INRIA, 2008.
  • 16
    • 77953136682 scopus 로고    scopus 로고
    • Wolfgang Gruener, Larrabee, CUDA and the quest for the free lunch, TGDaily. .
    • Wolfgang Gruener, Larrabee, CUDA and the quest for the free lunch, TGDaily. .
  • 18
    • 77953127461 scopus 로고    scopus 로고
    • AMD fusion now pushed back to 2011
    • Hruska J. AMD fusion now pushed back to 2011. Art Technica (2008)
    • (2008) Art Technica
    • Hruska, J.1
  • 19
    • 0032155271 scopus 로고    scopus 로고
    • GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
    • Kågström B., Ling P., and van Loan C. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Software 24 3 (1998) 268-302
    • (1998) ACM Trans. Math. Software , vol.24 , Issue.3 , pp. 268-302
    • Kågström, B.1    Ling, P.2    van Loan, C.3
  • 20
    • 34548206782 scopus 로고    scopus 로고
    • Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems)
    • New York, NY, USA, ACM
    • Julie Langou, Julien Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Dongarra, Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems), in: SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (New York, NY, USA), ACM, 2006, p. 113.
    • (2006) SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing , pp. 113
    • Langou, J.1    Langou, J.2    Luszczek, P.3    Kurzak, J.4    Buttari, A.5    Dongarra, J.6
  • 21
    • 77953139684 scopus 로고    scopus 로고
    • A note on auto-tuning GEMM for GPUs, Technical Report
    • January
    • Y. Li, J. Dongarra, S. Tomov, A note on auto-tuning GEMM for GPUs, Technical Report, LAPACK Working Note 212, January 2009.
    • (2009) LAPACK Working Note , vol.212
    • Li, Y.1    Dongarra, J.2    Tomov, S.3
  • 22
    • 77953137832 scopus 로고    scopus 로고
    • NVIDIA, Nvidia Tesla doubles the performance for CUDA developers, Computer Graphics World (06/30/2008).
    • NVIDIA, Nvidia Tesla doubles the performance for CUDA developers, Computer Graphics World (06/30/2008).
  • 23
    • 77953139608 scopus 로고    scopus 로고
    • NVIDIA, NVIDIA CUDA Programming Guide, 6/07/2008, Version 2.0.
    • NVIDIA, NVIDIA CUDA Programming Guide, 6/07/2008, Version 2.0.
  • 26
    • 0038961419 scopus 로고
    • Random butterfly transformations with applications in computational linear algebra
    • Technical Report CSD-950023, Computer Science Department, UCLA
    • D. Parker, Random butterfly transformations with applications in computational linear algebra, Technical Report CSD-950023, Computer Science Department, UCLA, 1995.
    • (1995)
    • Parker, D.1
  • 27
    • 0004609383 scopus 로고
    • The randomizing FFT: An alternative to pivoting in Gaussian elimination
    • Technical Report CSD-950037, Computer Science Department, UCLA
    • D. Parker, B. Pierce, The randomizing FFT: an alternative to pivoting in Gaussian elimination, Technical Report CSD-950037, Computer Science Department, UCLA, 1995.
    • (1995)
    • Parker, D.1    Pierce, B.2
  • 30
    • 63249123336 scopus 로고    scopus 로고
    • Programming algorithms-by-blocks for matrix computations on multithreaded architectures
    • Technical Report TR-08-04, University of Texas at Austin, FLAME Working Note 29
    • G. Quintana-Orti, E. Quintana-Orti, E. Chan, F. van Zee, R. van de Geijn, Programming algorithms-by-blocks for matrix computations on multithreaded architectures, Technical Report TR-08-04, University of Texas at Austin, 2008, FLAME Working Note 29.
    • (2008)
    • Quintana-Orti, G.1    Quintana-Orti, E.2    Chan, E.3    van Zee, F.4    van de Geijn, R.5
  • 33
    • 77953126652 scopus 로고    scopus 로고
    • Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing
    • Technical Report 219, LAPACK Working Note, May
    • S. Tomov, J. Dongarra, Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing, Technical Report 219, LAPACK Working Note, May 2009.
    • (2009)
    • Tomov, S.1    Dongarra, J.2
  • 35
    • 77953134650 scopus 로고    scopus 로고
    • Cholesky factorizations using vector capabilities of GPUs, Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley
    • May
    • LU, QR, Cholesky factorizations using vector capabilities of GPUs, Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley, May 2008.
    • (2008)
    • LU, Q.R.1
  • 36
    • 77953136414 scopus 로고    scopus 로고
    • Using GPUs to accelerate linear algebra routines, Poster at PAR lab winter retreat, January 9, 2008. <>
    • Using GPUs to accelerate linear algebra routines, Poster at PAR lab winter retreat, January 9, 2008. .


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.