SCOPUS 정보 검색 플랫폼

International Journal of Parallel Programming

Volumn 39, Issue 1, 2011, Pages 115-142

A performance study for iterative stencil loops on GPUs with ghost zone optimizations

(2) Meng, Jiayuan a Skadron, Kevin a

a University of Virginia (United States)

Author keywords

Ghost zone; GPU; Halo; Iterative stencil loops; Performance model; Tiling

Indexed keywords

GHOST ZONE; GPU; HALO; ITERATIVE STENCIL LOOPS; PERFORMANCE MODEL; TILING;

ARCHITECTURE; MESSAGE PASSING; OPTIMIZATION; PARALLEL ARCHITECTURES; PROGRAM PROCESSORS;

SYNCHRONIZATION;

EID: 79551491518 PISSN: 08857458 EISSN: None Source Type: Journal
DOI: 10.1007/s10766-010-0142-5 Document Type: Article

Times cited : (45)

References (37)

1
- 30344463648
- Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus
- Allen, G., Dramlitsch, T., Foster, I., Karonis, N.T., Ripeanu, M., Seidel, E., Toonen, B.: Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus. In: SC'01, pp. 52-52 (2001)
- (2001) SC'01 , pp. 52-52
- Allen, G.¹ Dramlitsch, T.² Foster, I.³ Karonis, N.T.⁴ Ripeanu, M.⁵ Seidel, E.⁶ Toonen, B.⁷

2
- 70449722961
- April
- Alpert, M.: Not just fun and games. April (1999)
- (1999) Not Just Fun and Games
- Alpert, M.¹

3
- 84976690230
- Fortran at ten gigaflops: The connection machine convolution compiler
- 10.1145/113445.113458
- M. Bromley S. Heller T. McNerney G.L. Steele Jr 1991 Fortran at ten gigaflops: the connection machine convolution compiler PLDI '91 26 6 145 156 10.1145/113445.113458
- (1991) PLDI '91 , vol.26 , Issue.6 , pp. 145-156
- Bromley, M.¹ Heller, S.² McNerney, T.³ Steele Jr., G.L.⁴

4
- 0027763868
- Mobile and replicated alignment of arrays in data-parallel programs
- November
- Chatterjee, S., Gilbert, J.R., Schreiber, R.: Mobile and replicated alignment of arrays in data-parallel programs. In: SC'93, pp. 420-429 November (1993)
- (1993) SC'93 , pp. 420-429
- Chatterjee, S.¹ Gilbert, J.R.² Schreiber, R.³

5
- 70449701626
- June
- Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general purpose applications on graphics processors using CUDA, June (2008)
- (2008) A Performance Study of General Purpose Applications on Graphics Processors Using CUDA
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Skadron, K.⁶

6
- 79551499286
- NVIDIA Corporation. Geforce gtx 280 specifications. (2008)
- NVIDIA Corporation. Geforce gtx 280 specifications. (2008)

7
- 79551493358
- NVIDIA Corporation. NVIDIA CUDA visual profiler. June (2008)
- NVIDIA Corporation. NVIDIA CUDA visual profiler. June (2008)

8
- 70349937457
- October
- Dagum, L.: OpenMP: a proposed industry standard API for shared memory programming, October (1997)
- (1997) OpenMP: A Proposed Industry Standard API for Shared Memory Programming
- Dagum, L.¹

9
- 70350771127
- Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
- Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: SC '08. 1-12 (2008)
- (2008) SC '08 , pp. 1-12
- Datta, K.¹ Murphy, M.² Volkov, V.³ Williams, S.⁴ Carter, J.⁵ Oliker, L.⁶ Patterson, D.⁷ Shalf, J.⁸ Yelick, K.⁹

10
- 0034818853
- Eliminating redundancies in sum-of-product array computations
- Deitz, S.J., Chamberlain, B.L., Snyder, L.: Eliminating redundancies in sum-of-product array computations. In: ICS '01, pp. 65-77 (2001)
- (2001) ICS '01 , pp. 65-77
- Deitz, S.J.¹ Chamberlain, B.L.² Snyder, L.³

11
- 0003338620
- Partial differential equations
- Evans, L.C.: Partial differential equations. Am. Math. Soc. (1998)
- (1998) Am. Math. Soc.
- Evans, L.C.¹

12
- 84964699790
- Redundant computation partition on distributed-memory systems
- Chen, L., Zhang, Z.-Q., Feng, X.-B.: Redundant computation partition on distributed-memory systems. In: ICA3PP '02, pp. 252 (2002)
- (2002) ICA3PP '02 , pp. 252
- Chen, L.¹ Zhang, Z.-Q.² Feng, X.-B.³

13
- 32844463802
- Cache oblivious stencil computations
- Frigo, M., Strumpen, V.: Cache oblivious stencil computations. In: ICS'05, pp. 361-366 (2005)
- (2005) ICS'05 , pp. 361-366
- Frigo, M.¹ Strumpen, V.²

14
- 52049096671
- April
- Goodnight, N.: CUDA/OpenGL fluid simulation. April (2007)
- (2007) CUDA/OpenGL Fluid Simulation
- Goodnight, N.¹

15
- 34247376580
- Chip multiprocessing and the cell broadband engine
- Gschwind, M.: Chip multiprocessing and the cell broadband engine. In: CF'06 (2006)
- (2006) CF'06
- Gschwind, M.¹

16
- 38249000489
- Communication-free hyperplane partitioning of nested loops
- 0783.68027 10.1006/jpdc.1993.1094
- C.-H. Huang P. Sadayappan 1993 Communication-free hyperplane partitioning of nested loops J. Parallel Distrib. Comput. 19 2 90 102 0783.68027 10.1006/jpdc.1993.1094
- (1993) J. Parallel Distrib. Comput. , vol.19 , Issue.2 , pp. 90-102
- Huang, C.-H.¹ Sadayappan, P.²

17
- 4444374512
- Compact thermal modeling for temperature-aware design
- Huang, W., Stan, M.R., Skadron, K., Ghosh, S., Sankaranarayanan, K., Velusamy, S.: Compact thermal modeling for temperature-aware design. In: DAC'04. (2004)
- (2004) DAC'04
- Huang, W.¹ Stan, M.R.² Skadron, K.³ Ghosh, S.⁴ Sankaranarayanan, K.⁵ Velusamy, S.⁶

18
- 79551504950
- Electronic Educational Devices Inc. Watts up? electricity meter operator's manual. (2002)
- (2002) Electronic Educational Devices Inc. Watts Up? Electricity Meter Operator's Manual

19
- 0022901352
- Jalby, W., Meier, U.: Optimizing matrix operations on a parallel multiprocessor with a hierarchical memory system, pp. 429-432 (1986)
- (1986) Optimizing Matrix Operations on A Parallel Multiprocessor with A Hierarchical Memory System , pp. 429-432
- Jalby, W.¹ Meier, U.²

20
- 84958661690
- Impact of modern memory subsystems on cache optimizations for stencil computations
- Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: MSP'05, pp. 36-43 (2005)
- (2005) MSP'05 , pp. 36-43
- Kamil, S.¹ Husbands, P.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

21
- 0010828753
- Cache-aware multigrid methods for solving poisson's equation in two dimensions
- 0957.68002 10.1007/s006070070032 1783469
- M. Kowarschik C. Wei W. Karl U. Rüde 2000 Cache-aware multigrid methods for solving poisson's equation in two dimensions Computing 64 4 381 399 0957.68002 10.1007/s006070070032 1783469
- (2000) Computing , vol.64 , Issue.4 , pp. 381-399
- Kowarschik, M.¹ Wei, C.² Karl, W.³ Rüde, U.⁴

22
- 35448944792
- Effective automatic parallelization of stencil computations
- DOI 10.1145/1250734.1250761, PLDI'07: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation
- S. Krishnamoorthy M. Baskaran U. Bondhugula J. Ramanujam A. Rountev P. Sadayappan 2007 Effective automatic parallelization of stencil computations PLDI '07 42 6 235 244 10.1145/1250734.1250761 (Pubitemid 47630691)
- (2007) Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) , pp. 235-244
- Krishnamoorthy, S.¹ Baskaran, M.² Bondhugula, U.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

23
- 0029490313
- Techniques for compiling programs on distributed memory multicomputers
- 10.1016/0167-8191(95)00052-6 1369230
- P. Lee 1995 Techniques for compiling programs on distributed memory multicomputers Parallel Comput. 21 1895 1923 10.1016/0167-8191(95)00052-6 1369230
- (1995) Parallel Comput. , vol.21 , pp. 1895-1923
- Lee, P.¹

24
- 24644456455
- Automatic tiling of iterative stencil loops
- DOI 10.1145/1034774.1034777
- Z. Li Y. Song 2004 Automatic tiling of iterative stencil loops ACM Trans. Program. Lang. Syst. 26 6 975 1028 10.1145/1034774.1034777 (Pubitemid 41270296)
- (2004) ACM Transactions on Programming Languages and Systems , vol.26 , Issue.6 , pp. 975-1028
- Li, Z.¹ Song, Y.²

25
- 0031075726
- Fusion of loops for parallelism and locality
- N. Manjikian T.S. Abdelrahman 1997 Fusion of loops for parallelism and locality Parallel Distrib. Syst. 8 19 28
- (1997) Parallel Distrib. Syst. , vol.8 , pp. 19-28
- Manjikian, N.¹ Abdelrahman, T.S.²

26
- 70449723385
- Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus
- Meng, J., Skadron, K.: Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus. In: ICS '09, pp. 256-265 (2009)
- (2009) ICS '09 , pp. 256-265
- Meng, J.¹ Skadron, K.²

27
- 78651550268
- Scalable parallel programming with CUDA
- 10.1145/1365490.1365500
- J. Nickolls I. Buck M. Garland K. Skadron 2008 Scalable parallel programming with CUDA Queue 6 2 40 53 10.1145/1365490.1365500
- (2008) Queue , vol.6 , Issue.2 , pp. 40-53
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

28
- 34248593308
- Three-dimensional multi-relaxation time (mrt) lattice-Boltzmann models for multiphase flow
- 1116.76066 10.1016/j.jcp.2006.10.023 2330283
- K.N. Premnath J. Abraham 2007 Three-dimensional multi-relaxation time (mrt) lattice-Boltzmann models for multiphase flow J. Comput. Phys. 224 2 539 559 1116.76066 10.1016/j.jcp.2006.10.023 2330283
- (2007) J. Comput. Phys. , vol.224 , Issue.2 , pp. 539-559
- Premnath, K.N.¹ Abraham, J.²

29
- 0006106643
- Tiling of iteration spaces for multicomputers
- Ramanujam, J.: Tiling of iteration spaces for multicomputers. In: Proceedings International Conference Parallel Processing, pp. 179-186. (1990)
- (1990) Proceedings International Conference Parallel Processing , pp. 179-186
- Ramanujam, J.¹

30
- 34548752231
- Towards optimal multi-level tiling for stencil computations
- March
- Renganarayana, L., Harthikote-Matha, M., Dewri, R., Rajopadhye, S.: Towards optimal multi-level tiling for stencil computations. IPDPS'07, pp. 1-10, March (2007)
- (2007) IPDPS'07 , pp. 1-10
- Renganarayana, L.¹ Harthikote-Matha, M.² Dewri, R.³ Rajopadhye, S.⁴

31
- 70350786558
- Positivity, posynomials and tile size selection
- Renganarayana, L., Rajopadhye, S.: Positivity, posynomials and tile size selection. In: SC '08, pp. 1-12 (2008)
- (2008) SC '08 , pp. 1-12
- Renganarayana, L.¹ Rajopadhye, S.²

32
- 70449722959
- Cactus application: Performance predictions in a grid environment
- Ripeanu, M., Iamnitchi, A., Foster, I.: Cactus application: Performance predictions in a grid environment. In: EuroPar'01. (2001)
- (2001) EuroPar'01
- Ripeanu, M.¹ Iamnitchi, A.² Foster, I.³

33
- 33845574641
- Tiling optimizations for 3D scientific computations
- Rivera G., Tseng, C.-W.: Tiling optimizations for 3D scientific computations. In: SC '00, p. 32 (2000)
- (2000) SC '00 , pp. 32
- Rivera, G.¹ Tseng, C.-W.²

34
- 70449706048
- CUDA-lite: Reducing GPU programming complexity
- Ueng, S.-Z., Baghsorkhi, S., Lathara, M., Hwu, W.m.: CUDA-lite: Reducing GPU programming complexity. In: LCPC'08. (2008)
- (2008) LCPC'08
- Ueng, S.-Z.¹ Baghsorkhi, S.² Lathara, M.³ Hwu, W.M.⁴

35
- 84863436006
- Time skewing for parallel computers
- Wonnacott, D.: Time skewing for parallel computers. In: WLCPC'99, pp. 477-480 (1999)
- (1999) WLCPC'99 , pp. 477-480
- Wonnacott, D.¹

36
- 1542392248
- Achieving scalable locality with time skewing
- 1019.68024 10.1023/A:1015460304860
- D. Wonnacott 2002 Achieving scalable locality with time skewing Int. J. Parallel Program. 30 3 181 221 1019.68024 10.1023/A:1015460304860
- (2002) Int. J. Parallel Program. , vol.30 , Issue.3 , pp. 181-221
- Wonnacott, D.¹

37
- 79951471318
- Parallel image processing based on CUDA
- 10.1109/CSSE.2008.1448
- Z. Yang Y. Zhu Y. Pu 2008 Parallel image processing based on CUDA Int. Conf. Comput. Sci. Software Eng. 3 198 201 10.1109/CSSE.2008.1448
- (2008) Int. Conf. Comput. Sci. Software Eng. , vol.3 , pp. 198-201
- Yang, Z.¹ Zhu, Y.² Pu, Y.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.