SCOPUS 정보 검색 플랫폼

Annual ACM Symposium on Parallelism in Algorithms and Architectures

Volumn , Issue , 2012, Pages 91-100

A scalable framework for heterogeneous GPU-based clusters

(2) Song, Fengguang a Dongarra, Jack a,b,c

a UNIVERSITY OF TENNESSEE (United States)

b OAK RIDGE NATIONAL LABORATORY (United States)

c UNIVERSITY OF MANCHESTER (United Kingdom)

Author keywords

Distributed runtime; Heterogeneous clusters; Hybrid CPU GPU architectures; Linear algebra; Manycore scheduling

Indexed keywords

CHOLESKY FACTORIZATIONS; COMPUTATIONAL PERFORMANCE; CPU CORES; DATA DEPENDENCIES; DATAFLOW PROGRAMMING; DISTRIBUTED DYNAMICS; DISTRIBUTED MEMORY; DISTRIBUTED MEMORY CLUSTERS; DYNAMIC SCHEDULING; ENTIRE SYSTEM; FASTER RATES; GPU CLUSTERS; HETEROGENEOUS CLUSTERS; HETEROGENEOUS SYSTEMS; HIGH ENERGY EFFICIENCY; MANY-CORE; MULTI-LEVEL PARTITIONING; PARALLEL SOFTWARE; PCI EXPRESS; PROCESSING UNITS; RUNTIME SYSTEMS; RUNTIMES;

COMMUNICATION; COMPUTER PERIPHERAL EQUIPMENT; COMPUTER PROGRAMMING; DATA FLOW ANALYSIS; ENERGY EFFICIENCY; LINEAR ALGEBRA; MATLAB; PROGRAM PROCESSORS; SCHEDULING;

CLUSTERING ALGORITHMS;

EID: 84864149777 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2312005.2312025 Document Type: Conference Paper

Times cited : (36)

References (25)

1
- 84864149419
- LU factorization for accelerator-based systems
- University of Tennessee
- E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaief, and S. Tomov. LU factorization for accelerator-based systems. ICL Technical Report ICL-UT-10-05, Innovative Computing Laboratory, University of Tennessee, 2010.
- (2010) ICL Technical Report ICL-UT-10-05, Innovative Computing Laboratory
- Agullo, E.¹ Augonnet, C.² Dongarra, J.³ Faverge, M.⁴ Langou, J.⁵ Ltaief, H.⁶ Tomov, S.⁷

2
- 80053251324
- QR factorization on a multicore node enhanced with multiple GPU accelerators
- Alaska, USA
- E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief, S. Thibault, and S. Tomov. QR factorization on a multicore node enhanced with multiple GPU accelerators. In IPDPS 2011, Alaska, USA, 2011.
- (2011) IPDPS 2011
- Agullo, E.¹ Augonnet, C.² Dongarra, J.³ Faverge, M.⁴ Ltaief, H.⁵ Thibault, S.⁶ Tomov, S.⁷

3
- 84655172176
- Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators
- Knoxville, USA
- E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst, J. Roman, S. Thibault, and S. Tomov. Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators. In Symposium on Application Accelerators in High Performance Computing (SAAHPC), Knoxville, USA, 2010.
- (2010) Symposium on Application Accelerators in High Performance Computing (SAAHPC)
- Agullo, E.¹ Augonnet, C.² Dongarra, J.³ Ltaief, H.⁴ Namyst, R.⁵ Roman, J.⁶ Thibault, S.⁷ Tomov, S.⁸

4
- 77953999902
- PLASMA users' guide
- E. Agullo, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, J. Langou, H. Ltaief, P. Luszczek, and A. YarKhan. PLASMA Users' Guide. Technical report, ICL, UTK, 2011.
- (2011) Technical Report, ICL, UTK
- Agullo, E.¹ Dongarra, J.² Hadri, B.³ Kurzak, J.⁴ Langou, J.⁵ Langou, J.⁶ Ltaief, H.⁷ Luszczek, P.⁸ Yarkhan, A.⁹

5
- 0003706460
- SIAM
- E. Anderson, Z. Bai, C. Bischof, L. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. SIAM, 1992.
- (1992) LAPACK Users' Guide
- Anderson, E.¹ Bai, Z.² Bischof, C.³ Blackford, L.⁴ Demmel, J.⁵ Dongarra, J.⁶ Croz, J.D.⁷ Greenbaum, A.⁸ Hammarling, S.⁹ McKenney, A.¹⁰ Sorensen, D.¹¹

6
- 78651103346
- StarPU: A unified platform for task scheduling on heterogeneous multicore architectures
- Feb.
- C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. : Pract. Exper., Special Issue: Euro-Par 2009, 23:187-198, Feb. 2011.
- (2011) Concurr. Comput. : Pract. Exper., Special Issue: Euro-Par 2009 , vol.23 , pp. 187-198
- Augonnet, C.¹ Thibault, S.² Namyst, R.³ Wacrenier, P.-A.⁴

7
- 70350635626
- An extension of the StarSs programming model for platforms with multiple GPUs
- Springer-Verlag
- E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo, and E. S. Quintana-Ortí. An extension of the StarSs programming model for platforms with multiple GPUs. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing, Euro-Par '09, pages 851-862. Springer-Verlag, 2009.
- (2009) Proceedings of the 15th International Euro-Par Conference on Parallel Processing, Euro-Par '09 , pp. 851-862
- Ayguadé, E.¹ Badia, R.M.² Igual, F.D.³ Labarta, J.⁴ Mayo, R.⁵ Quintana-Ortí, E.S.⁶

8
- 70449623419
- Communication-optimal parallel and sequential Cholesky decomposition
- ACM
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Communication-optimal parallel and sequential Cholesky decomposition. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, SPAA '09, pages 245-252. ACM, 2009.
- (2009) Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures, SPAA '09 , pp. 245-252
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Schwartz, O.⁴

9
- 0035481895
- A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
- DOI 10.1109/12.956091
- O. Beaumont, V. Boudet, A. Petitet, F. Rastello, and Y. Robert. A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers). IEEE Transactions on Computers, 50:1052-1070, 2001. (Pubitemid 33048369)
- (2001) IEEE Transactions on Computers , vol.50 , Issue.10 , pp. 1052-1070
- Beaumont, O.¹ Boudet, V.² Petitet, A.³ Rastello, F.⁴ Robert, Y.⁵

10
- 0003615167
- SIAM
- L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. Whaley. ScaLAPACK Users' Guide. SIAM, 1997.
- (1997) ScaLAPACK Users' Guide
- Blackford, L.S.¹ Choi, J.² Cleary, A.³ D'azevedo, E.⁴ Demmel, J.⁵ Dhillon, I.⁶ Dongarra, J.⁷ Hammarling, S.⁸ Henry, G.⁹ Petitet, A.¹⁰ Stanley, K.¹¹ Walker, D.¹² Whaley, R.¹³

11
- 0032648736
- Static tiling for heterogeneous computing platforms
- P. Boulet, J. Dongarra, Y. Robert, and F. Vivien. Static tiling for heterogeneous computing platforms. Parallel Computing, 25(5):547-568, 1999.
- (1999) Parallel Computing , vol.25 , Issue.5 , pp. 547-568
- Boulet, P.¹ Dongarra, J.² Robert, Y.³ Vivien, F.⁴

12
- 77953980008
- Communication-optimal parallel and sequential QR and LU factorizations
- August
- J. W. Demmel, L. Grigori, M. F. Hoemmen, and J. Langou. Communication-optimal parallel and sequential QR and LU factorizations. LAPACK Working Note 204, UTK, August 2008.
- (2008) LAPACK Working Note 204 UTK
- Demmel, J.W.¹ Grigori, L.² Hoemmen, M.F.³ Langou, J.⁴

13
- 0042674307
- The LINPACK Benchmark: Past, present, and future
- J. J. Dongarra, P. Luszczek, and A. Petitet. The LINPACK Benchmark: past, present, and future. Concurrency and Computation: Practice and Experience, 15:803-820, 2003.
- (2003) Concurrency and Computation: Practice and Experience , vol.15 , pp. 803-820
- Dongarra, J.J.¹ Luszczek, P.² Petitet, A.³

14
- 67650686517
- Accelerating Linpack with CUDA on heterogenous clusters
- ACM
- M. Fatica. Accelerating Linpack with CUDA on heterogenous clusters. In Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2, pages 46-51. ACM, 2009.
- (2009) Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2 , pp. 46-51
- Fatica, M.¹

15
- 84864040962
- Retargeting PLAPACK to clusters with hardware accelerators
- M. Fogué, F. D. Igual, E. S. Quintana-ortŠ, and R. V. D. Geijn. Retargeting PLAPACK to clusters with hardware accelerators. FLAME Working Note 42, 2010.
- (2010) FLAME Working Note 42
- Fogué, M.¹ Igual, F.D.² Quintana-Ortš, E.S.³ Geijn, R.V.D.⁴

16
- 77953483096
- CULA: Hybrid GPU accelerated linear algebra routines
- April
- J. R. Humphrey, D. K. Price, K. E. Spagnoli, A. L. Paolini, and E. J. Kelmelis. CULA: Hybrid GPU accelerated linear algebra routines. In SPIE Defense and Security Symposium (DSS), April 2010.
- (2010) SPIE Defense and Security Symposium (DSS)
- Humphrey, J.R.¹ Price, D.K.² Spagnoli, K.E.³ Paolini, A.L.⁴ Kelmelis, E.J.⁵

17
- 36248980362
- Data distribution for dense factorization on computers with memory heterogeneity
- DOI 10.1016/j.parco.2007.06.001, PII S0167819107000762
- A. Lastovetsky and R. Reddy. Data distribution for dense factorization on computers with memory heterogeneity. Parallel Comput., 33:757-779, December 2007. (Pubitemid 350122765)
- (2007) Parallel Computing , vol.33 , Issue.12 , pp. 757-779
- Lastovetsky, A.¹ Reddy, R.²

18
- 77954725202
- Overlapping communication and computation by using a hybrid MPI/SMPSs approach
- ACM
- V. Marjanović, J. Labarta, E. Ayguadé, and M. Valero. Overlapping communication and computation by using a hybrid MPI/SMPSs approach. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pages 5-16. ACM, 2010.
- (2010) Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10 , pp. 5-16
- Marjanović, V.¹ Labarta, J.² Ayguadé, E.³ Valero, M.⁴

19
- 84864153899
- CUDA Toolkit 4.0 CUBLAS Library
- NVIDIA. CUDA Toolkit 4.0 CUBLAS Library, 2011.
- (2011) NVIDIA

20
- 67650021816
- Solving dense linear systems on platforms with multiple hardware accelerators
- ACM
- G. Quintana-Ortí, F. D. Igual, E. S. Quintana-Ortí, and R. A. van de Geijn. Solving dense linear systems on platforms with multiple hardware accelerators. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '09, pages 121-130. ACM, 2009.
- (2009) Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '09 , pp. 121-130
- Quintana-Ortí, G.¹ Igual, F.D.² Quintana-Ortí, E.S.³ Van De Geijn, R.A.⁴

21
- 82655162782
- PTask: Operating system abstractions to manage GPUs as compute devices
- ACM
- C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 233-248. ACM, 2011.
- (2011) Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11 , pp. 233-248
- Rossbach, C.J.¹ Currey, J.² Silberstein, M.³ Ray, B.⁴ Witchel, E.⁵

22
- 33847103243
- Making a case for a Green500 list
- S. Sharma, C.-H. Hsu, and W. chun Feng. Making a case for a Green500 list. In IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006)/ Workshop on High Performance - Power Aware Computing, 2006.
- (2006) IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006)/ Workshop on High Performance - Power Aware Computing
- Sharma, S.¹ Hsu, C.-H.² Feng, W.C.³

23
- 84863925917
- Efficient support for matrix computations on heterogeneous multi-core and multi-GPU architectures
- June
- F. Song, S. Tomov, and J. Dongarra. Efficient support for matrix computations on heterogeneous multi-core and multi-GPU architectures. LAPACK Working Note 250, UTK, June 2011.
- (2011) LAPACK Working Note 250 UTK
- Song, F.¹ Tomov, S.² Dongarra, J.³

24
- 84863667764
- MAGMA users' guide
- S. Tomov, R. Nath, P. Du, and J. Dongarra. MAGMA Users' Guide. Technical report, ICL, UTK, 2011.
- (2011) Technical Report, ICL, UTK
- Tomov, S.¹ Nath, R.² Du, P.³ Dongarra, J.⁴

25
- 80052312080
- Keeneland: Bringing heterogeneous GPU computing to the computational science community
- sept.-oct.
- J. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, and S. Yalamanchili. Keeneland: Bringing heterogeneous GPU computing to the computational science community. Computing in Science Engineering, 13(5):90-95, sept.-oct. 2011.
- (2011) Computing in Science Engineering , vol.13 , Issue.5 , pp. 90-95
- Vetter, J.¹ Glassbrook, R.² Dongarra, J.³ Schwan, K.⁴ Loftis, B.⁵ McNally, S.⁶ Meredith, J.⁷ Rogers, J.⁸ Roth, P.⁹ Spafford, K.¹⁰ Yalamanchili, S.¹¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.