SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Parallel Processing

Volumn , Issue , 2010, Pages 40-50

Toward harnessing DOACROSS parallelism for multi-GPGPUs

(5) Di, Peng a Wan, Qing a Zhang, Xuemeng a Wu, Hui a Xue, Jingling a

a UNIVERSITY OF NEW SOUTH WALES (Australia)

Author keywords

DOACR parallelism; GPGPU; Loop tiling; SOR

Indexed keywords

COMPILER OPTIMIZATIONS; CONVERGENCE RATES; DATA DEPENDENCE; DATA PARALLEL; DATA REUSE; DOACR PARALLELISM; DOMAIN DECOMPOSITION TECHNIQUES; DOMAIN EXPERTS; GENERAL-PURPOSE COMPUTING; GPGPU; LOOP TILING; OPTIMIZATION PRINCIPLE; PARALLELIZATIONS; PARALLELIZING; PDE SOLVERS; PERFORMANCE TUNING; RED-BLACK SOR; SCIENTIFIC AND ENGINEERING APPLICATIONS; SOR; WORK FOCUS;

CACHE MEMORY; DOMAIN DECOMPOSITION METHODS; OPTIMIZATION; TUNING;

PROGRAM COMPILERS;

EID: 78649538491 PISSN: 01903918 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICPP.2010.13 Document Type: Conference Paper

Times cited : (9)

References (26)

1
- 0343486319
- A multi-color SOR method for parallel computation
- L. Adams and J. Ortega. A multi-color SOR method for parallel computation. In 1982 International Conference on Parallel Processing (ICPP'82), pages 53-56, 1982.
- (1982) 1982 International Conference on Parallel Processing (ICPP'82) , pp. 53-56
- Adams, L.¹ Ortega, J.²

2
- 57349180412
- A compiler framework for optimization of affine loop nests for GPGPUs
- Muthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for GPGPUs. In ICS '08: Proceedings of the 22nd annual international conference on Supercomputing, pages 225-234, 2008.
- (2008) ICS '08: Proceedings of the 22nd Annual International Conference on Supercomputing , pp. 225-234
- Baskaran, M.M.¹ Bondhugula, U.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

3
- 78649547021
- A block red-black SOR method for a two-dimensional parabolic equation using Hermite collocation
- Stephen H. Brill and George F. Pinder. A block red-black SOR method for a two-dimensional parabolic equation using Hermite collocation. The Mathematics of Finite Elements and Applications, 1997.
- (1997) The Mathematics of Finite Elements and Applications
- Brill, S.H.¹ Pinder, G.F.²

4
- 34548747985
- Coarse-grain parallel execution for 2-dimensional PDE problems
- Georgios Goumas, Nikolaos Drosinos, Vasileios Karakasis, and Nectarios Koziris. Coarse-grain parallel execution for 2-dimensional PDE problems. International Parallel and Distributed Processing Symposium, 0:381, 2007.
- (2007) International Parallel and Distributed Processing Symposium , pp. 381
- Goumas, G.¹ Drosinos, N.² Karakasis, V.³ Koziris, N.⁴

5
- 78649549591
- The Portland Group
- The Portland Group. PGI accelerate compiler, 2009.
- (2009) PGI Accelerate Compiler

6
- 70450231944
- An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
- Sunpyo Hong and Hyesoon Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput. Archit. News, 37(3):152-163, 2009.
- (2009) SIGARCH Comput. Archit. News , vol.37 , Issue.3 , pp. 152-163
- Hong, S.¹ Kim, H.²

7
- 84944744196
- Code tiling for improving the cache performance of PDE solvers
- Q. Huang, J. Xue, and X. Vera. Code tiling for improving the cache performance of PDE solvers. In 2003 International Conference on Parallel Processing (ICPP'03), pages 615 - 625, 2003.
- (2003) 2003 International Conference on Parallel Processing (ICPP'03) , pp. 615-625
- Huang, Q.¹ Xue, J.² Vera, X.³

8
- 33746706203
- Automatic tuning matrix multiplication performance on graphics hardware
- Changhao Jiang and Marc Snir. Automatic tuning matrix multiplication performance on graphics hardware. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 185-196, 2005.
- (2005) PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques , pp. 185-196
- Jiang, C.¹ Snir, M.²

9
- 0036396915
- The Imagine stream processor
- Ujval Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany. The Imagine stream processor. In ICCD '02: Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors, page 282, 2002.
- (2002) ICCD '02: Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors , pp. 282
- Kapasi, U.¹ Dally, W.J.² Rixner, S.³ Owens, J.D.⁴ Khailany, B.⁵

10
- 67650081010
- OpenMP to GPGPU: A compiler framework for automatic translation and optimization
- Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 101-110, 2009.
- (2009) PPoPP '09: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 101-110
- Lee, S.¹ Min, S.-J.² Eigenmann, R.³

11
- 70450103746
- A crossinput adaptive framework for GPU program optimizations
- Yixun Liu, Eddy Z. Zhang, and Xipeng Shen. A crossinput adaptive framework for GPU program optimizations. In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing, pages 1- 10, 2009.
- (2009) IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing , pp. 1-10
- Liu, Y.¹ Zhang, E.Z.² Shen, X.³

12
- 0024030170
- Multicolor reordering of sparse matrices resulting from irregular grids
- Rami G. Melhem and K. V. S. Ramarao. Multicolor reordering of sparse matrices resulting from irregular grids. ACM Trans. Math. Softw., 14(2):117-138, 1988.
- (1988) ACM Trans. Math. Softw. , vol.14 , Issue.2 , pp. 117-138
- Melhem, R.G.¹ Ramarao, K.V.S.²

13
- 67650671606
- 3d finite difference computation on GPUs using CUDA
- Paulius Micikevicius. 3d finite difference computation on GPUs using CUDA. In GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pages 79-84, 2009.
- (2009) GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units , pp. 79-84
- Micikevicius, P.¹

14
- 70449128610
- NVIDIA.
- NVIDIA. NVIDIA CUDA programming guide 2.2, 2009.
- (2009) NVIDIA CUDA Programming Guide 2.2

15
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
- Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-mei W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pages 73-82, 2008.
- (2008) PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 73-82
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, W.-M.W.⁶

16
- 43449094719
- Program optimization space pruning for a multithreaded GPU
- Shane Ryoo, Christopher I. Rodrigues, Sam S. Stone, Sara S. Baghsorkhi, Sain-Zee Ueng, John A. Stratton, and Wenmei W. Hwu. Program optimization space pruning for a multithreaded GPU. In CGO '08: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, pages 195-204, 2008.
- (2008) CGO '08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization , pp. 195-204
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Baghsorkhi, S.S.⁴ Ueng, S.-Z.⁵ Stratton, J.A.⁶ Hwu, W.W.⁷

17
- 0003656337
- Pergamon Press
- V. K. Saul'yev. Integration of Equations of Parabolic Type Equation by the Method of Net. Pergamon Press, 1964.
- (1964) Integration of Equations of Parabolic Type Equation by the Method of Net
- Saul'yev, V.K.¹

18
- 1542710739
- Sparse tiling for stationary iterative methods
- Michelle Mills Strout, Larry Carter, Jeanne Ferrante, and Barbara Kreaseck. Sparse tiling for stationary iterative methods. Int. J. High Perform. Comput. Appl., 18(1):95-113, 2004.
- (2004) Int. J. High Perform. Comput. Appl. , vol.18 , Issue.1 , pp. 95-113
- Strout, M.M.¹ Carter, L.² Ferrante, J.³ Kreaseck, B.⁴

19
- 0000778059
- Generating efficient tiled code for distributed memory machines
- P. Tang and J. Xue. Generating efficient tiled code for distributed memory machines. Parallel Computing, 26(11):1369-1410, 2000.
- (2000) Parallel Computing , vol.26 , Issue.11 , pp. 1369-1410
- Tang, P.¹ Xue, J.²

20
- 33750456975
- New stable group explicit finite difference method for solution of diffusion equation
- Rohallah Tavakoli and Parviz Davami. New stable group explicit finite difference method for solution of diffusion equation. Applied Mathematics and Computation, 181(2):1379-1386, 2006.
- (2006) Applied Mathematics and Computation , vol.181 , Issue.2 , pp. 1379-1386
- Tavakoli, R.¹ Davami, P.²

21
- 34547433110
- Multigrid and Gauss-Seidel smoothers revisited: Parallelization on chip multiprocessors
- Dan Wallin, Henrik Löf, Erik Hagersten, and Sverker Holmgren. Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors. In ICS '06: Proceedings of the 20th annual international conference on Supercomputing, pages 145-155, 2006.
- (2006) ICS '06: Proceedings of the 20th Annual International Conference on Supercomputing , pp. 145-155
- Wallin, D.¹ Löf, H.² Hagersten, E.³ Holmgren, S.⁴

22
- 33748798219
- A new block parallel SOR method and its analysis
- Dexuan Xie. A new block parallel SOR method and its analysis. SIAM J. Sci. Comput., 27(5):1513-1533, 2006.
- (2006) SIAM J. Sci. Comput. , vol.27 , Issue.5 , pp. 1513-1533
- Xie, D.¹

23
- 0000703719
- On tiling as a loop transformation
- Jingling Xue. On tiling as a loop transformation. Parallel Processing Letters, 7(4):409-424, 1997.
- (1997) Parallel Processing Letters , vol.7 , Issue.4 , pp. 409-424
- Xue, J.¹

24
- 0442303278
- Kluwer Academic Publishers
- Jingling Xue. Loop Tiling for Parallelism. Kluwer Academic Publishers, 2000.
- (2000) Loop Tiling for Parallelism
- Xue, J.¹

25
- 70350678845
- JCUDA: A programmer-friendly interface for accelerating Java programs with CUDA
- Yonghong Yan, Max Grossman, and Vivek Sarkar. JCUDA: A programmer-friendly interface for accelerating Java programs with CUDA. In Euro-Par, pages 887-899, 2009.
- (2009) Euro-Par , pp. 887-899
- Yan, Y.¹ Grossman, M.² Sarkar, V.³

26
- 14944383149
- A fast sweeping method for Eikonal equations
- Hongkai Zhao. A fast sweeping method for Eikonal equations. Math. Comp., 74:603-627, 2005.
- (2005) Math. Comp. , vol.74 , pp. 603-627
- Zhao, H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.