메뉴 건너뛰기




Volumn , Issue , 2012, Pages 83-94

An accurate GPU performance model for effective control flow divergence optimization

Author keywords

control flow divergence; CUDA; GPGPU; performance estimation; performance metric

Indexed keywords

CONTROL FLOWS; CUDA; GPGPU; PERFORMANCE ESTIMATION; PERFORMANCE METRICES;

EID: 84866876242     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IPDPS.2012.18     Document Type: Conference Paper
Times cited : (38)

References (26)
  • 1
    • 84866846609 scopus 로고    scopus 로고
    • Available
    • NVIDIA. CUDA. Available: http://developer.nvidia.com/object/cuda.html
    • CUDA
  • 4
    • 77954724148 scopus 로고    scopus 로고
    • Streamlining GPU applications on the fly: Thread divergence elimination through runtime thread-data remapping
    • E. Z. Zhang, Y. Jiang, Z. Guo, and X. Shen, "Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping," ICS, 2010, pp. 115-126
    • (2010) ICS , pp. 115-126
    • Zhang, E.Z.1    Jiang, Y.2    Guo, Z.3    Shen, X.4
  • 5
    • 79953126288 scopus 로고    scopus 로고
    • On-the-fly elimination of dynamic irregularities for GPU computing
    • E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen, "On-the-fly elimination of dynamic irregularities for GPU computing," ASPLOS, 2011, pp. 369-380
    • (2011) ASPLOS , pp. 369-380
    • Zhang, E.Z.1    Jiang, Y.2    Guo, Z.3    Tian, K.4    Shen, X.5
  • 6
    • 47349104432 scopus 로고    scopus 로고
    • Dynamic warp formation and scheduling for efficient gpu control flow
    • W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic warp formation and scheduling for efficient gpu control flow," MICRO, 2007, pp. 407-420.
    • (2007) MICRO , pp. 407-420
    • Fung, W.W.L.1    Sham, I.2    Yuan, G.3    Aamodt, T.M.4
  • 7
    • 77954976292 scopus 로고    scopus 로고
    • Dynamic warp subdivision for integrated branch and memory divergence tolerance
    • J. Meng, D. Tarjan, and K. Skadron, "Dynamic warp subdivision for integrated branch and memory divergence tolerance," ISCA, 2010, pp. 236-246.
    • (2010) ISCA , pp. 236-246
    • Meng, J.1    Tarjan, D.2    Skadron, K.3
  • 8
    • 70450231944 scopus 로고    scopus 로고
    • An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
    • S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," ISCA, 2009, pp. 152-163.
    • (2009) ISCA , pp. 152-163
    • Hong, S.1    Kim, H.2
  • 9
    • 84862182989 scopus 로고    scopus 로고
    • Available
    • NVIDIA. (2011). Compute Visual Profiler User Guide. Available: http://developer.nvidia.com/nvidia-gpu-computing-documentation
    • (2011) Compute Visual Profiler User Guide
  • 10
    • 0035182089 scopus 로고    scopus 로고
    • Basic block distribution analysis to find periodic behavior and simulation points in applications
    • T. Sherwood, E. Perelman, and B. Calder, "Basic block distribution analysis to find periodic behavior and simulation points in applications," PACT 2001, pp. 3-14.
    • (2001) PACT , pp. 3-14
    • Sherwood, T.1    Perelman, E.2    Calder, B.3
  • 11
    • 70349169075 scopus 로고    scopus 로고
    • Analyzing CUDA workloads using a detailed GPU simulator
    • A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," ISPASS, 2009 pp. 163-174.
    • (2009) ISPASS , pp. 163-174
    • Bakhoda, A.1    Yuan, G.L.2    Fung, W.W.L.3    Wong, H.4    Aamodt, T.M.5
  • 12
    • 0024700878 scopus 로고
    • Determining average program execution times and their variance
    • V. Sarkar, "Determining average program execution times and their variance," PLDI, 1989 pp. 298-312
    • (1989) PLDI , pp. 298-312
    • Sarkar, V.1
  • 14
    • 79551704836 scopus 로고    scopus 로고
    • Available
    • NVIDIA. (2011). Nvidia cuda c programming guide. Available: http://developer.nvidia.com/nvidia-gpu-computing-documentation
    • (2011) Nvidia Cuda C Programming Guide
  • 15
    • 0036647190 scopus 로고    scopus 로고
    • An efficient-means clustering algorithm: Analysis and implementation
    • T. Kanungo, et al., "An efficient-means clustering algorithm: Analysis and implementation," IEEE TPAMI, pp. 881-892, 2002.
    • (2002) IEEE TPAMI , pp. 881-892
    • Kanungo, T.1
  • 16
    • 0023381475 scopus 로고
    • Marching cubes: A high resolution 3D surface construction algorithm
    • W. E. Lorensen H. E. Cline, "Marching cubes: A high resolution 3D surface construction algorithm," ACM Siggraph Computer Graphics, vol. 21, pp. 163-169, 1987.
    • (1987) ACM Siggraph Computer Graphics , vol.21 , pp. 163-169
    • Lorensen, W.E.1    Cline, H.E.2
  • 17
    • 70449844290 scopus 로고    scopus 로고
    • Sequence alignment with GPU: Performance and design challenges
    • G. M. Striemer and A. Akoglu, "Sequence alignment with GPU: Performance and design challenges," IPDPS, 2009, pp. 1-10
    • (2009) IPDPS , pp. 1-10
    • Striemer, G.M.1    Akoglu, A.2
  • 18
    • 38849131252 scopus 로고    scopus 로고
    • High-throughput sequence alignment using Graphics Processing Units
    • M. Schatz, C. Trapnell, A. Delcher, A. Varshney, "High-throughput sequence alignment using Graphics Processing Units," BMC bioinformatics, vol. 8, p. 474, 2007.
    • (2007) BMC Bioinformatics , vol.8 , pp. 474
    • Schatz, M.1    Trapnell, C.2    Delcher, A.3    Varshney, A.4
  • 19
    • 80052657491 scopus 로고    scopus 로고
    • NVIDIA. Occupancy Calculator. http://developer.nvidia.com/object/cuda-3- 2-tooklit-rc.html.
    • Occupancy Calculator
  • 20
    • 84866849151 scopus 로고    scopus 로고
    • CUDA-EC
    • CUDA-EC, NVIDIA Tesla Bio Workbench. http://www.nvidia.com/object/ec-on- tesla.html.
  • 21
    • 79952767559 scopus 로고    scopus 로고
    • Accelerating global sequence alignment using CUDA compatible multi-core GPU
    • Siriwardena, T.R.P. Ranasinghe, D.N., " Accelerating global sequence alignment using CUDA compatible multi-core GPU ", ICIAfS, 2010, pp. 201-206.
    • (2010) ICIAfS , pp. 201-206
    • Siriwardena, T.R.P.1    Ranasinghe, D.N.2
  • 22
    • 77952265152 scopus 로고    scopus 로고
    • Optimizing Matrix Transpose in CUDA
    • Ruetsch G. and Micikevicius P., "Optimizing Matrix Transpose in CUDA", NVIDIA. 2009.
    • (2009) NVIDIA
    • Ruetsch, G.1    Micikevicius, P.2
  • 24
    • 68549096107 scopus 로고    scopus 로고
    • Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware
    • Article 7
    • W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. "Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware". ACM TACO, Vol. 6, No.2, Article 7, 2009, pp. 1-37.
    • (2009) ACM TACO , vol.6 , Issue.2 , pp. 1-37
    • Fung, W.W.L.1    Sham, I.2    Yuan, G.3    Aamodt, T.M.4
  • 25
    • 84863011842 scopus 로고    scopus 로고
    • A Revisit to Cost Aggregation in Stereo Matching: How Far Can We Reduce Its Computational Redundancy?
    • D. Min, J. Lu, and M. Do. "A Revisit to Cost Aggregation in Stereo Matching: How Far Can We Reduce Its Computational Redundancy ?" ICCV, 2011.
    • (2011) ICCV
    • Min, D.1    Lu, J.2    Do, M.3
  • 26
    • 84862069040 scopus 로고    scopus 로고
    • Real-time Implemenation and Performance Optimization of 3D Sound Localization on GPUs
    • Y. Liang, Z. Cui, S. Zhao, K. Rupnow, Y. Zhang, D. L. Jones, and D. Chen. "Real-time Implemenation and Performance Optimization of 3D Sound Localization on GPUs." DATE, 2012.
    • (2012) DATE
    • Liang, Y.1    Cui, Z.2    Zhao, S.3    Rupnow, K.4    Zhang, Y.5    Jones, D.L.6    Chen, D.7


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.