메뉴 건너뛰기




Volumn , Issue , 2014, Pages 284-295

Warp-level divergence in GPUs: Characterization, impact, and mitigation

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER ARCHITECTURE; COMPUTER GRAPHICS; HARDWARE; NATURAL RESOURCES MANAGEMENT; PROGRAM PROCESSORS; RESOURCE ALLOCATION; SUPERCOMPUTERS;

EID: 84903999614     PISSN: 15300897     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/HPCA.2014.6835939     Document Type: Conference Paper
Times cited : (65)

References (26)
  • 1
    • 70349169075 scopus 로고    scopus 로고
    • Analyzing CUDA Workloads Using a Detailed GPU Simulator
    • A. Bakhoda, et al., "Analyzing CUDA Workloads Using a Detailed GPU Simulator, " ISPASS-2009, 2009
    • (2009) ISPASS-2009
    • Bakhoda, A.1
  • 2
    • 84864834311 scopus 로고    scopus 로고
    • Simultaneous branch and warp inter-weaving for sustained gpu performance
    • N. Brunie, et al., "Simultaneous Branch and Warp Inter-weaving for Sustained GPU Performance"ISCA-39, 2012.
    • (2012) ISCA-39
    • Brunie, N.1
  • 3
    • 84903953935 scopus 로고    scopus 로고
    • CUDA programming guide
    • CUDA programming guide
  • 4
    • 21644487687 scopus 로고    scopus 로고
    • Control flow optimization via dynamic reconvergence prediction
    • J. D. Collins, et al., "Control flow optimization via dynamic reconvergence prediction, " MICRO-37, 2004
    • (2004) MICRO-37
    • Collins, J.D.1
  • 5
    • 70649092154 scopus 로고    scopus 로고
    • Rodinia: A benchmark suite for hetero-geneous computing
    • S. Che, et al., "Rodinia: A Benchmark Suite for Hetero-geneous Computing, " IISWC-2009, 2009.
    • (2009) IISWC-2009
    • Che, S.1
  • 6
    • 84863351470 scopus 로고    scopus 로고
    • SIMD re-convergence at thread frontiers
    • G. Diamos, et al., "SIMD Re-Convergence at Thread Frontiers, " MICRO-44, 2011.
    • (2011) MICRO-44
    • Diamos, G.1
  • 7
    • 79955923056 scopus 로고    scopus 로고
    • Thread block compaction for efficient simt control flow
    • W. W. Fung, et al., "Thread Block Compaction for Efficient SIMT Control Flow, " HPCA-17, 2011.
    • (2011) HPCA-17
    • Fung, W.W.1
  • 8
    • 47349104432 scopus 로고    scopus 로고
    • Dynamic warp formation and schedul-ing for efficient gpu control flow
    • W. Fung, et al., "Dynamic Warp Formation and Schedul-ing for Efficient GPU Control Flow, " MICRO-40, 2007.
    • (2007) MICRO-40
    • Fung, W.1
  • 9
    • 80052533471 scopus 로고    scopus 로고
    • Energy-efficient mechanisms for managing thread context in throughput processors
    • M. Gebhart, et al., "Energy-efficient mechanisms for managing thread context in throughput processors, " ISCA-38, 2011
    • (2011) ISCA-38
    • Gebhart, M.1
  • 10
    • 84862154605 scopus 로고    scopus 로고
    • Reducing branch divergence in GPU programs
    • T. D. Han, et al., "Reducing Branch Divergence in GPU Programs, " GPGPU-4, 2011.
    • (2011) GPGPU-4
    • Han, T.D.1
  • 11
    • 84903953925 scopus 로고    scopus 로고
    • IMPACT Research Group. The Parboil Benchmark Suite
    • IMPACT Research Group. The Parboil Benchmark Suite.
  • 12
    • 84881151222 scopus 로고    scopus 로고
    • GPUWattch: Enabling energy optimizations in GPGPUs
    • J. Leng, et al., "GPUWattch: Enabling Energy Optimizations in GPGPUs, " ISCA-40, 2013.
    • (2013) ISCA-40
    • Leng, J.1
  • 14
    • 84903936214 scopus 로고    scopus 로고
    • TLP-aware cache management schemes for a CPU-GPU heterogeneous architecture
    • J. Lee, et al., "TLP-Aware Cache Management Schemes for a CPU-GPU Heterogeneous Architecture", HPCA-18, 2012.
    • (2012) HPCA-18
    • Lee, J.1
  • 15
    • 84880287859 scopus 로고    scopus 로고
    • Warped register file: A power efficient register file for GPGPUs
    • A. Mohammad, et al., "Warped Register File: A Power Efficient Register File for GPGPUs, " HPCA-19, 2013.
    • (2013) HPCA-19
    • Mohammad, A.1
  • 16
    • 77954976292 scopus 로고    scopus 로고
    • Dynamic warp subdivision for integrated branch and memory divergence tolerance
    • J. Meng, et al., "Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance"ISCA-37, 2010.
    • (2010) ISCA-37
    • Meng, J.1
  • 17
    • 84863342255 scopus 로고    scopus 로고
    • Improving GPU performance via large warps and two-level warp scheduling
    • V. Narasiman, et al., " Improving GPU Performance via Large Warps and Two-Level Warp Scheduling, " MICRO-44, 2011.
    • (2011) MICRO-44
    • Narasiman, V.1
  • 19
    • 84903953927 scopus 로고    scopus 로고
    • NVIDIA. CUDA C/C++ SDK Code Samples 2011
    • NVIDIA. CUDA C/C++ SDK Code Samples, 2011. http://developer.nvidia.com/ gpu-computing-sdk, 2011
    • (2011)
  • 20
    • 84880298026 scopus 로고    scopus 로고
    • The dual-path execution model for efficient GPU control flow
    • Minsoo Rhu, et al., "The Dual-Path Execution Model for Efficient GPU Control Flow, " HPCA-19, 2013.
    • (2013) HPCA-19
    • Rhu, M.1
  • 21
    • 84864855982 scopus 로고    scopus 로고
    • CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
    • Minsoo Rhu, et al., "CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures, " ISCA-39, 2012.
    • (2012) ISCA-39
    • Rhu, M.1
  • 22
    • 84876590572 scopus 로고    scopus 로고
    • Cache-conscious wavefront scheduling
    • T. Rogers, et al., "Cache-Conscious Wavefront Scheduling, " MICRO-45, 2012.
    • (2012) MICRO-45
    • Rogers, T.1
  • 24
    • 84867547570 scopus 로고    scopus 로고
    • RISE: Improving streaming processors reliability against soft errors in GPGPUs
    • J. Tan, et al., "RISE: Improving Streaming Processors Reliability against Soft Errors in GPGPUs" PACT-21, 2012
    • (2012) PACT-21
    • Tan, J.1
  • 25
    • 84862974517 scopus 로고    scopus 로고
    • Analyzing soft-error vulnerability on GPGPU microarchitecture
    • J. Tan, et al., "Analyzing Soft-Error Vulnerability on GPGPU Microarchitecture, " IISWC-2011, 2011.
    • (2011) IISWC-2011
    • Tan, J.1
  • 26
    • 84867509598 scopus 로고    scopus 로고
    • Shared memory multiplexing: A novel way to improve GPGPU throughput
    • Y. Yang, et al., "Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput, " PACT-21, 2012.
    • (2012) PACT-21
    • Yang, Y.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.