메뉴 건너뛰기




Volumn , Issue , 2011, Pages 2-11

An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs

Author keywords

cuda; gpu; irregular reduction; partitioning

Indexed keywords

CACHE PERFORMANCE; CUDA; DATA ACCESS; DISTRIBUTED MEMORY MACHINES; DISTRIBUTED SHARED MEMORY; ENGINEERING CODES; EXECUTION STRATEGIES; GPU; HIGH PERFORMANCE COMPUTING; IRREGULAR REDUCTIONS; NUMBER OF THREADS; PARALLELIZATIONS; PARALLELIZING; PARTITIONING; PARTITIONING METHODS; RUNTIME MODULES; RUNTIME SUPPORT; RUNTIMES; SHARED MEMORIES; SHARED MEMORY MACHINES; SYSTEMATIC STUDY; UNIPROCESSORS; UNSTRUCTURED GRID;

EID: 79959575872     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1995896.1995900     Document Type: Conference Paper
Times cited : (12)

References (40)
  • 1
    • 0031139728 scopus 로고    scopus 로고
    • Interprocedural data flow based optimizations for distributed memory compilation
    • May
    • G. Agrawal and J. Saltz. Interprocedural data flow based optimizations for distributed memory compilation. Software Practice and Experience, 27(5):519-546, May 1997.
    • (1997) Software Practice and Experience , vol.27 , Issue.5 , pp. 519-546
    • Agrawal, G.1    Saltz, J.2
  • 3
    • 63549135938 scopus 로고    scopus 로고
    • Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
    • NY, USA, ACM
    • M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In PPoPP, pages 1-10, NY, USA, 2008. ACM.
    • (2008) PPoPP , pp. 1-10
    • Baskaran, M.M.1    Bondhugula, U.2    Krishnamoorthy, S.3    Ramanujam, J.4    Rountev, A.5    Sadayappan, P.6
  • 5
    • 77749340082 scopus 로고    scopus 로고
    • Model-driven Autotuning of Sparse Matrix-vector Multiply on GPUs
    • Feb.
    • J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven Autotuning of Sparse Matrix-vector Multiply on GPUs. In PPoPP, Feb. 2010.
    • (2010) PPoPP
    • Choi, J.W.1    Singh, A.2    Vuduc, R.W.3
  • 7
    • 0029430697 scopus 로고
    • Index array flattening through program transformation
    • IEEE Computer Society Press, Dec.
    • R. Das, , P. Havlak, J. Saltz, and K. Kennedy. Index array flattening through program transformation. In SC95. IEEE Computer Society Press, Dec. 1995.
    • (1995) SC95
    • Das, R.1    Havlak, P.2    Saltz, J.3    Kennedy, K.4
  • 8
    • 0028386843 scopus 로고
    • The design and implementation of a parallel unstructured Euler solver using software primitives
    • Mar.
    • R. Das, D. J. Mavriplis, J. Saltz, S. Gupta, and R. Ponnusamy. The design and implementation of a parallel unstructured Euler solver using software primitives. AIAA Journal, 32(3):489-496, Mar. 1994.
    • (1994) AIAA Journal , vol.32 , Issue.3 , pp. 489-496
    • Das, R.1    Mavriplis, D.J.2    Saltz, J.3    Gupta, S.4    Ponnusamy, R.5
  • 9
    • 79954630742 scopus 로고    scopus 로고
    • Improving cache performance of dynamic applications with computation and data layout transformations
    • May
    • C. Ding and K. Kennedy. Improving cache performance of dynamic applications with computation and data layout transformations. In PLDI99, May 1999.
    • (1999) PLDI99
    • Ding, C.1    Kennedy, K.2
  • 11
    • 51549093017 scopus 로고    scopus 로고
    • Sparse matrix computations on manycore GPUs
    • M. Garland. Sparse matrix computations on manycore GPUs. In DAC, 2008.
    • (2008) DAC
    • Garland, M.1
  • 12
    • 0033707876 scopus 로고    scopus 로고
    • A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors
    • ACM Press, May
    • E. Gutierrez, O. Plata, and E. L. Zapata. A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors. In ICS00, pages 78-87. ACM Press, May 2000.
    • (2000) ICS00 , pp. 78-87
    • Gutierrez, E.1    Plata, O.2    Zapata, E.L.3
  • 13
    • 0030380793 scopus 로고    scopus 로고
    • Maximizing multiprocessor performance with the SUIF compiler
    • Dec.
    • M. Hall, S. Amarsinghe, B. Murphy, S. Liao, and M. Lam. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, (12), Dec. 1996.
    • (1996) IEEE Computer , Issue.12
    • Hall, M.1    Amarsinghe, S.2    Murphy, B.3    Liao, S.4    Lam, M.5
  • 14
    • 79959622062 scopus 로고    scopus 로고
    • Improving compiler and runtime support for irregular reductions
    • Aug.
    • H. Han and C.-W. Tseng. Improving compiler and runtime support for irregular reductions. In LCPC98, Aug. 1998.
    • (1998) LCPC98
    • Han, H.1    Tseng, C.-W.2
  • 16
    • 0342622933 scopus 로고    scopus 로고
    • Handling irregular problems with Fortran D - A preliminary report
    • Also available as CRPC Technical Report CRPC-TR93339-S
    • R. v. Hanxleden. Handling irregular problems with Fortran D - a preliminary report. In CPC, Delft, The Netherlands, Dec. 1993. Also available as CRPC Technical Report CRPC-TR93339-S.
    • CPC, Delft, the Netherlands, Dec. 1993
    • Hanxleden, R.V.1
  • 17
    • 0029322399 scopus 로고
    • Parallelizing molecular dynamics programs for distributed memory machines
    • Summer Also available as University of Maryland Technical Report CS-TR-3374 and UMIACS-TR-94-125
    • Y.-S. Hwang, R. Das, J. H. Saltz, M. Hodoscek, and B. R. Brooks. Parallelizing molecular dynamics programs for distributed memory machines. IEEE Computational Science & Engineering, 2(2):18-29, Summer 1995. Also available as University of Maryland Technical Report CS-TR-3374 and UMIACS-TR-94-125.
    • (1995) IEEE Computational Science & Engineering , vol.2 , Issue.2 , pp. 18-29
    • Hwang, Y.-S.1    Das, R.2    Saltz, J.H.3    Hodoscek, M.4    Brooks, B.R.5
  • 18
    • 0029375750 scopus 로고
    • Partitioning unstructured computational graphs for nonuniform and adaptive environments
    • Fall
    • M. Kaddoura, C.-W. Ou, and S. Ranka. Partitioning unstructured computational graphs for nonuniform and adaptive environments. IEEE Parallel & Distributed Technology, 3(3):63-69, Fall 1995.
    • (1995) IEEE Parallel & Distributed Technology , vol.3 , Issue.3 , pp. 63-69
    • Kaddoura, M.1    Ou, C.-W.2    Ranka, S.3
  • 19
    • 84990479742 scopus 로고
    • An efficient heuristic procedure for partitioning graphs
    • Feb.
    • B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(2):291-307, Feb. 1970.
    • (1970) Bell System Technical Journal , vol.49 , Issue.2 , pp. 291-307
    • Kernighan, B.1    Lin, S.2
  • 22
    • 0029229672 scopus 로고
    • Exploiting spatial regularity in irregular iterative applications
    • IEEE Computer Society Press, Apr.
    • A. Lain and P. Banerjee. Exploiting spatial regularity in irregular iterative applications. In IPPS95, pages 820-826. IEEE Computer Society Press, Apr. 1995.
    • (1995) IPPS95 , pp. 820-826
    • Lain, A.1    Banerjee, P.2
  • 23
    • 78650802947 scopus 로고    scopus 로고
    • OpenMPC: Extended OpenMP Programming and Tuning for GPUs
    • S. Lee and R. Eigenmann. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. In SC, Nov 2010.
    • SC, Nov 2010
    • Lee, S.1    Eigenmann, R.2
  • 24
    • 67650081010 scopus 로고    scopus 로고
    • OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization
    • S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization. In PPoPP'09, 2009.
    • (2009) PPoPP'09
    • Lee, S.1    Min, S.-J.2    Eigenmann, R.3
  • 26
    • 38349105400 scopus 로고    scopus 로고
    • Molecular dynamics simulations on commodity gpus with cuda
    • W. Liu, B. Schmidt, G. Voss, and W. Müller-Wittig. Molecular dynamics simulations on commodity gpus with cuda. In HiPC, pages 185-196, 2007.
    • (2007) HiPC , pp. 185-196
    • Liu, W.1    Schmidt, B.2    Voss, G.3    Müller-Wittig, W.4
  • 27
    • 70449707774 scopus 로고    scopus 로고
    • A Translation System for Enabling Data Mining Applications on GPUs
    • June
    • W. Ma and G. Agrawal. A Translation System for Enabling Data Mining Applications on GPUs. In ICS, June 2009.
    • (2009) ICS
    • Ma, W.1    Agrawal, G.2
  • 28
    • 79952788812 scopus 로고    scopus 로고
    • An Integer Programming Framework for Optimizing Shared Memory Use on GPUs
    • Dec.
    • W. Ma and G. Agrawal. An Integer Programming Framework for Optimizing Shared Memory Use on GPUs. In HiPC, Dec. 2010.
    • (2010) HiPC
    • Ma, W.1    Agrawal, G.2
  • 29
    • 0032684978 scopus 로고    scopus 로고
    • Improving memory hierarchy performance of irregular applications
    • June
    • J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance of irregular applications. In ICS, June 1999.
    • (1999) ICS
    • Mellor-Crummey, J.1    Whalley, D.2    Kennedy, K.3
  • 30
    • 0033362479 scopus 로고    scopus 로고
    • Localizing non-affine array references
    • Oct.
    • N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In PACT, Oct. 1999.
    • (1999) PACT
    • Mitchell, N.1    Carter, L.2    Ferrante, J.3
  • 32
    • 0029192463 scopus 로고
    • Efficient support for irregular applications on distributed-memory machines
    • ACM Press, July
    • S. Mukherjee, S. Sharma, M. Hill, J. Larus, A. Rogers, and J. Saltz. Efficient support for irregular applications on distributed-memory machines. In PPOPP, pages 68-79. ACM Press, July 1995.
    • (1995) PPOPP , pp. 68-79
    • Mukherjee, S.1    Sharma, S.2    Hill, M.3    Larus, J.4    Rogers, A.5    Saltz, J.6
  • 34
    • 0029356841 scopus 로고
    • Runtime support and compilation methods for user-specified irregular data distributions
    • Aug.
    • R. Ponnusamy, J. Saltz, A. Choudhary, Y.-S. Hwang, and G. Fox. Runtime support and compilation methods for user-specified irregular data distributions. TPDS, 6(8):815-831, Aug. 1995.
    • (1995) TPDS , vol.6 , Issue.8 , pp. 815-831
    • Ponnusamy, R.1    Saltz, J.2    Choudhary, A.3    Hwang, Y.-S.4    Fox, G.5
  • 35
    • 0036505103 scopus 로고    scopus 로고
    • Parallel static and dynamic multi-constraint graph partitioning
    • DOI 10.1002/cpe.605
    • K. Schloegel, G. Karypis, and V. Kumar. Parallel static and dynamic multi-constraint graph partitioning. Concurrency and Computation: Practice and Experience, 14(3):219-240, 2002. (Pubitemid 34460007)
    • (2002) Concurrency Computation Practice and Experience , vol.14 , Issue.3 , pp. 219-240
    • Schloegel, K.1    Karypis, G.2    Kumar, V.3
  • 36
    • 70450029523 scopus 로고    scopus 로고
    • A framework for efficient and scalable execution of domain-specific templates on GPUs
    • N. Sundaram, A. Raghunathan, and S. Chakradhar. A framework for efficient and scalable execution of domain-specific templates on GPUs. In IPDPS, 2009.
    • (2009) IPDPS
    • Sundaram, N.1    Raghunathan, A.2    Chakradhar, S.3
  • 39
    • 77954691442 scopus 로고    scopus 로고
    • A GPGPU compiler for memory optimization and parallelism management
    • Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. In PLDI, 2010.
    • (2010) PLDI
    • Yang, Y.1    Xiang, P.2    Kong, J.3    Zhou, H.4
  • 40
    • 0033703286 scopus 로고    scopus 로고
    • Adaptive reduction parallelization techniques
    • ACM Press, May
    • H. Yu and L. Rauchwerger. Adaptive reduction parallelization techniques. In ICS00, pages 66-75. ACM Press, May 2000.
    • (2000) ICS00 , pp. 66-75
    • Yu, H.1    Rauchwerger, L.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.