SCOPUS 정보 검색 플랫폼

Proceedings - International Symposium on High-Performance Computer Architecture

Volumn , Issue , 2014, Pages 284-295

Warp-level divergence in GPUs: Characterization, impact, and mitigation

(3) Xiang, Ping a Yang, Yi b Zhou, Huiyang a

a North Carolina State University (United States)

b NEC LABORATORIES AMERICA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER ARCHITECTURE; COMPUTER GRAPHICS; HARDWARE; NATURAL RESOURCES MANAGEMENT; PROGRAM PROCESSORS; RESOURCE ALLOCATION; SUPERCOMPUTERS;

ARCHITECTURAL SUPPORT; CRITICAL RESOURCES; GRAPHICS PROCESSING UNITS; HARDWARE OVERHEADS; RESOURCE MANAGEMENT; RESOURCE MANAGEMENT SCHEMES; STREAMING MULTIPROCESSORS; THREAD-LEVEL PARALLELISM;

WEAVING;

EID: 84903999614 PISSN: 15300897 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/HPCA.2014.6835939 Document Type: Conference Paper

Times cited : (65)

References (26)

1
- 70349169075
- Analyzing CUDA Workloads Using a Detailed GPU Simulator
- A. Bakhoda, et al., "Analyzing CUDA Workloads Using a Detailed GPU Simulator, " ISPASS-2009, 2009
- (2009) ISPASS-2009
- Bakhoda, A.¹

2
- 84864834311
- Simultaneous branch and warp inter-weaving for sustained gpu performance
- N. Brunie, et al., "Simultaneous Branch and Warp Inter-weaving for Sustained GPU Performance"ISCA-39, 2012.
- (2012) ISCA-39
- Brunie, N.¹

3
- 84903953935
- CUDA programming guide
- CUDA programming guide

4
- 21644487687
- Control flow optimization via dynamic reconvergence prediction
- J. D. Collins, et al., "Control flow optimization via dynamic reconvergence prediction, " MICRO-37, 2004
- (2004) MICRO-37
- Collins, J.D.¹

5
- 70649092154
- Rodinia: A benchmark suite for hetero-geneous computing
- S. Che, et al., "Rodinia: A Benchmark Suite for Hetero-geneous Computing, " IISWC-2009, 2009.
- (2009) IISWC-2009
- Che, S.¹

6
- 84863351470
- SIMD re-convergence at thread frontiers
- G. Diamos, et al., "SIMD Re-Convergence at Thread Frontiers, " MICRO-44, 2011.
- (2011) MICRO-44
- Diamos, G.¹

7
- 79955923056
- Thread block compaction for efficient simt control flow
- W. W. Fung, et al., "Thread Block Compaction for Efficient SIMT Control Flow, " HPCA-17, 2011.
- (2011) HPCA-17
- Fung, W.W.¹

8
- 47349104432
- Dynamic warp formation and schedul-ing for efficient gpu control flow
- W. Fung, et al., "Dynamic Warp Formation and Schedul-ing for Efficient GPU Control Flow, " MICRO-40, 2007.
- (2007) MICRO-40
- Fung, W.¹

9
- 80052533471
- Energy-efficient mechanisms for managing thread context in throughput processors
- M. Gebhart, et al., "Energy-efficient mechanisms for managing thread context in throughput processors, " ISCA-38, 2011
- (2011) ISCA-38
- Gebhart, M.¹

10
- 84862154605
- Reducing branch divergence in GPU programs
- T. D. Han, et al., "Reducing Branch Divergence in GPU Programs, " GPGPU-4, 2011.
- (2011) GPGPU-4
- Han, T.D.¹

11
- 84903953925
- IMPACT Research Group. The Parboil Benchmark Suite
- IMPACT Research Group. The Parboil Benchmark Suite.

12
- 84881151222
- GPUWattch: Enabling energy optimizations in GPGPUs
- J. Leng, et al., "GPUWattch: Enabling Energy Optimizations in GPGPUs, " ISCA-40, 2013.
- (2013) ISCA-40
- Leng, J.¹

13
- 77951157944
- Morgan Kaufmann
- D. Kirk, et al., "Programming Massively Parallel Processors: A hand-on Approach" Morgan Kaufmann, 2012.
- (2012) Programming Massively Parallel Processors: A Hand-on Approach
- Kirk, D.¹

14
- 84903936214
- TLP-aware cache management schemes for a CPU-GPU heterogeneous architecture
- J. Lee, et al., "TLP-Aware Cache Management Schemes for a CPU-GPU Heterogeneous Architecture", HPCA-18, 2012.
- (2012) HPCA-18
- Lee, J.¹

15
- 84880287859
- Warped register file: A power efficient register file for GPGPUs
- A. Mohammad, et al., "Warped Register File: A Power Efficient Register File for GPGPUs, " HPCA-19, 2013.
- (2013) HPCA-19
- Mohammad, A.¹

16
- 77954976292
- Dynamic warp subdivision for integrated branch and memory divergence tolerance
- J. Meng, et al., "Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance"ISCA-37, 2010.
- (2010) ISCA-37
- Meng, J.¹

17
- 84863342255
- Improving GPU performance via large warps and two-level warp scheduling
- V. Narasiman, et al., " Improving GPU Performance via Large Warps and Two-Level Warp Scheduling, " MICRO-44, 2011.
- (2011) MICRO-44
- Narasiman, V.¹

18
- 84875636098
- NVIDIA
- NVIDIA. Fermi: NVIDIA's Next Generation CUDA Compute Architecture, 2011.
- (2011) Fermi: NVIDIA's Next Generation CUDA Compute Architecture

19
- 84903953927
- NVIDIA. CUDA C/C++ SDK Code Samples 2011
- NVIDIA. CUDA C/C++ SDK Code Samples, 2011. http://developer.nvidia.com/ gpu-computing-sdk, 2011
- (2011)

20
- 84880298026
- The dual-path execution model for efficient GPU control flow
- Minsoo Rhu, et al., "The Dual-Path Execution Model for Efficient GPU Control Flow, " HPCA-19, 2013.
- (2013) HPCA-19
- Rhu, M.¹

21
- 84864855982
- CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
- Minsoo Rhu, et al., "CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures, " ISCA-39, 2012.
- (2012) ISCA-39
- Rhu, M.¹

22
- 84876590572
- Cache-conscious wavefront scheduling
- T. Rogers, et al., "Cache-Conscious Wavefront Scheduling, " MICRO-45, 2012.
- (2012) MICRO-45
- Rogers, T.¹

23
- 84903953928
- US Patent 20110161616 A1
- D. Tarjan, et al., "On demand register allocation and deallocation for a multithreaded processor." US Patent 20110161616 A1. 2009
- (2009) On Demand Register Allocation and Deallocation for A Multithreaded Processor
- Tarjan, D.¹

24
- 84867547570
- RISE: Improving streaming processors reliability against soft errors in GPGPUs
- J. Tan, et al., "RISE: Improving Streaming Processors Reliability against Soft Errors in GPGPUs" PACT-21, 2012
- (2012) PACT-21
- Tan, J.¹

25
- 84862974517
- Analyzing soft-error vulnerability on GPGPU microarchitecture
- J. Tan, et al., "Analyzing Soft-Error Vulnerability on GPGPU Microarchitecture, " IISWC-2011, 2011.
- (2011) IISWC-2011
- Tan, J.¹

26
- 84867509598
- Shared memory multiplexing: A novel way to improve GPGPU throughput
- Y. Yang, et al., "Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput, " PACT-21, 2012.
- (2012) PACT-21
- Yang, Y.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.