SCOPUS 정보 검색 플랫폼

Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

Volumn , Issue , 2012, Pages 83-94

An accurate GPU performance model for effective control flow divergence optimization

(4) Cui, Zheng a Liang, Yun a Rupnow, Kyle a Chen, Deming b

a ADVANCED DIGITAL SCIENCES CENTER (Singapore)

b UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

Author keywords

control flow divergence; CUDA; GPGPU; performance estimation; performance metric

Indexed keywords

CONTROL FLOWS; CUDA; GPGPU; PERFORMANCE ESTIMATION; PERFORMANCE METRICES;

COMPUTER GRAPHICS; COSINE TRANSFORMS; DISTRIBUTED PARAMETER NETWORKS; ESTIMATION; OPTIMIZATION; PARALLEL ARCHITECTURES;

PROGRAM PROCESSORS;

EID: 84866876242 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IPDPS.2012.18 Document Type: Conference Paper

Times cited : (38)

References (26)

1
- 84866846609
- Available
- NVIDIA. CUDA. Available: http://developer.nvidia.com/object/cuda.html
- CUDA

2
- 84866863921
- K. O. W. Group. Available
- K. O. W. Group. (2008). The OpenCL Specification (version 1.0.29, 8 ed.). Available: http://khronos.org/registry/cl/specs/opencl-1.0.29.pdf
- (2008) The OpenCL Specification (Version 1.0.29, 8 Ed.)

3
- 77957561221
- An adaptive performance modeling tool for GPU architectures
- S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W.-m. W. Hwu, "An adaptive performance modeling tool for GPU architectures," PPoPP, 2010, pp. 105-114
- (2010) PPoPP , pp. 105-114
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Hwu, W.-M.W.⁵

4
- 77954724148
- Streamlining GPU applications on the fly: Thread divergence elimination through runtime thread-data remapping
- E. Z. Zhang, Y. Jiang, Z. Guo, and X. Shen, "Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping," ICS, 2010, pp. 115-126
- (2010) ICS , pp. 115-126
- Zhang, E.Z.¹ Jiang, Y.² Guo, Z.³ Shen, X.⁴

5
- 79953126288
- On-the-fly elimination of dynamic irregularities for GPU computing
- E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen, "On-the-fly elimination of dynamic irregularities for GPU computing," ASPLOS, 2011, pp. 369-380
- (2011) ASPLOS , pp. 369-380
- Zhang, E.Z.¹ Jiang, Y.² Guo, Z.³ Tian, K.⁴ Shen, X.⁵

6
- 47349104432
- Dynamic warp formation and scheduling for efficient gpu control flow
- W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic warp formation and scheduling for efficient gpu control flow," MICRO, 2007, pp. 407-420.
- (2007) MICRO , pp. 407-420
- Fung, W.W.L.¹ Sham, I.² Yuan, G.³ Aamodt, T.M.⁴

7
- 77954976292
- Dynamic warp subdivision for integrated branch and memory divergence tolerance
- J. Meng, D. Tarjan, and K. Skadron, "Dynamic warp subdivision for integrated branch and memory divergence tolerance," ISCA, 2010, pp. 236-246.
- (2010) ISCA , pp. 236-246
- Meng, J.¹ Tarjan, D.² Skadron, K.³

8
- 70450231944
- An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
- S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," ISCA, 2009, pp. 152-163.
- (2009) ISCA , pp. 152-163
- Hong, S.¹ Kim, H.²

9
- 84862182989
- Available
- NVIDIA. (2011). Compute Visual Profiler User Guide. Available: http://developer.nvidia.com/nvidia-gpu-computing-documentation
- (2011) Compute Visual Profiler User Guide

10
- 0035182089
- Basic block distribution analysis to find periodic behavior and simulation points in applications
- T. Sherwood, E. Perelman, and B. Calder, "Basic block distribution analysis to find periodic behavior and simulation points in applications," PACT 2001, pp. 3-14.
- (2001) PACT , pp. 3-14
- Sherwood, T.¹ Perelman, E.² Calder, B.³

11
- 70349169075
- Analyzing CUDA workloads using a detailed GPU simulator
- A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," ISPASS, 2009 pp. 163-174.
- (2009) ISPASS , pp. 163-174
- Bakhoda, A.¹ Yuan, G.L.² Fung, W.W.L.³ Wong, H.⁴ Aamodt, T.M.⁵

12
- 0024700878
- Determining average program execution times and their variance
- V. Sarkar, "Determining average program execution times and their variance," PLDI, 1989 pp. 298-312
- (1989) PLDI , pp. 298-312
- Sarkar, V.¹

13
- 70350775495
- Ph.D Thesis, University of Illinois at Urbana-Champaign
- S. Ryoo, "Program optimization strategies for data-parallel many-core processors," Ph.D Thesis, University of Illinois at Urbana-Champaign, 2008.
- (2008) Program Optimization Strategies for Data-parallel Many-core Processors
- Ryoo, S.¹

14
- 79551704836
- Available
- NVIDIA. (2011). Nvidia cuda c programming guide. Available: http://developer.nvidia.com/nvidia-gpu-computing-documentation
- (2011) Nvidia Cuda C Programming Guide

15
- 0036647190
- An efficient-means clustering algorithm: Analysis and implementation
- T. Kanungo, et al., "An efficient-means clustering algorithm: Analysis and implementation," IEEE TPAMI, pp. 881-892, 2002.
- (2002) IEEE TPAMI , pp. 881-892
- Kanungo, T.¹

16
- 0023381475
- Marching cubes: A high resolution 3D surface construction algorithm
- W. E. Lorensen H. E. Cline, "Marching cubes: A high resolution 3D surface construction algorithm," ACM Siggraph Computer Graphics, vol. 21, pp. 163-169, 1987.
- (1987) ACM Siggraph Computer Graphics , vol.21 , pp. 163-169
- Lorensen, W.E.¹ Cline, H.E.²

17
- 70449844290
- Sequence alignment with GPU: Performance and design challenges
- G. M. Striemer and A. Akoglu, "Sequence alignment with GPU: Performance and design challenges," IPDPS, 2009, pp. 1-10
- (2009) IPDPS , pp. 1-10
- Striemer, G.M.¹ Akoglu, A.²

18
- 38849131252
- High-throughput sequence alignment using Graphics Processing Units
- M. Schatz, C. Trapnell, A. Delcher, A. Varshney, "High-throughput sequence alignment using Graphics Processing Units," BMC bioinformatics, vol. 8, p. 474, 2007.
- (2007) BMC Bioinformatics , vol.8 , pp. 474
- Schatz, M.¹ Trapnell, C.² Delcher, A.³ Varshney, A.⁴

19
- 80052657491
- NVIDIA. Occupancy Calculator. http://developer.nvidia.com/object/cuda-3- 2-tooklit-rc.html.
- Occupancy Calculator

20
- 84866849151
- CUDA-EC
- CUDA-EC, NVIDIA Tesla Bio Workbench. http://www.nvidia.com/object/ec-on- tesla.html.

21
- 79952767559
- Accelerating global sequence alignment using CUDA compatible multi-core GPU
- Siriwardena, T.R.P. Ranasinghe, D.N., " Accelerating global sequence alignment using CUDA compatible multi-core GPU ", ICIAfS, 2010, pp. 201-206.
- (2010) ICIAfS , pp. 201-206
- Siriwardena, T.R.P.¹ Ranasinghe, D.N.²

22
- 77952265152
- Optimizing Matrix Transpose in CUDA
- Ruetsch G. and Micikevicius P., "Optimizing Matrix Transpose in CUDA", NVIDIA. 2009.
- (2009) NVIDIA
- Ruetsch, G.¹ Micikevicius, P.²

23
- 77952579552
- Demystifying GPU Microarchitecture through Microbenchmarking
- Wong, H., Papadopoulou, M.M, Sadooghi-Alvandi, M, and Moshovos. A "Demystifying GPU Microarchitecture through Microbenchmarking". ISPASS, 2010, pp. 235-246.
- (2010) ISPASS , pp. 235-246
- Wong, H.¹ Papadopoulou, M.M.² Sadooghi-Alvandi, M.³ Moshovos, A.⁴

24
- 68549096107
- Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware
- Article 7
- W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. "Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware". ACM TACO, Vol. 6, No.2, Article 7, 2009, pp. 1-37.
- (2009) ACM TACO , vol.6 , Issue.2 , pp. 1-37
- Fung, W.W.L.¹ Sham, I.² Yuan, G.³ Aamodt, T.M.⁴

25
- 84863011842
- A Revisit to Cost Aggregation in Stereo Matching: How Far Can We Reduce Its Computational Redundancy?
- D. Min, J. Lu, and M. Do. "A Revisit to Cost Aggregation in Stereo Matching: How Far Can We Reduce Its Computational Redundancy ?" ICCV, 2011.
- (2011) ICCV
- Min, D.¹ Lu, J.² Do, M.³

26
- 84862069040
- Real-time Implemenation and Performance Optimization of 3D Sound Localization on GPUs
- Y. Liang, Z. Cui, S. Zhao, K. Rupnow, Y. Zhang, D. L. Jones, and D. Chen. "Real-time Implemenation and Performance Optimization of 3D Sound Localization on GPUs." DATE, 2012.
- (2012) DATE
- Liang, Y.¹ Cui, Z.² Zhao, S.³ Rupnow, K.⁴ Zhang, Y.⁵ Jones, D.L.⁶ Chen, D.⁷

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.