메뉴 건너뛰기




Volumn , Issue , 2011, Pages 25-36

Thread block compaction for efficient SIMT control flow

Author keywords

[No Author keywords available]

Indexed keywords

COMPACTION MECHANISMS; CONTROL FLOWS; GRAPHICS PROCESSOR UNITS; HARDWARE COST; LARGE GROUPS; MANY-CORE; MULTIPLE DATA; PER UNIT; PROCESSING UNITS; PROGRAMMING MODELS; RECONVERGENCE; SCRATCH PAD MEMORY; SIMULATION RESULT;

EID: 79955923056     PISSN: 15300897     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/HPCA.2011.5749714     Document Type: Conference Paper
Times cited : (155)

References (26)
  • 1
    • 70450183916 scopus 로고    scopus 로고
    • Understanding the Efficiency of Ray Traversal on GPUs
    • T. Aila and S. Laine. Understanding the Efficiency of Ray Traversal on GPUs. In HPG '09, 2009.
    • (2009) HPG '09
    • Aila, T.1    Laine, S.2
  • 4
    • 0015330108 scopus 로고
    • The Illiac IV System
    • apr.
    • W. Bouknight et al. The Illiac IV System. Proceedings of the IEEE, 60(4):369-388, apr. 1972.
    • (1972) Proceedings of the IEEE , vol.60 , Issue.4 , pp. 369-388
    • Bouknight, W.1
  • 5
  • 6
    • 49249135216 scopus 로고    scopus 로고
    • Convergence of Recognition, Mining, and Synthesis Workloads and its Implications
    • May
    • Y.-K. Chen et al. Convergence of Recognition, Mining, and Synthesis Workloads and its Implications. Proceedings of the IEEE, 96(5), May 2008.
    • (2008) Proceedings of the IEEE , vol.96 , Issue.5
    • Chen, Y.-K.1
  • 7
    • 79955888325 scopus 로고    scopus 로고
    • United States Patent #7,353,369: System and Method for Managing Divergent Threads in a SIMD Architecture (Assignee NVIDIA Corp.), April
    • B. W. Coon et al. United States Patent #7,353,369: System and Method for Managing Divergent Threads in a SIMD Architecture (Assignee NVIDIA Corp.), April 2008.
    • (2008)
    • Coon, B.W.1
  • 8
    • 79955922544 scopus 로고    scopus 로고
    • United States Patent #7,434,032: Tracking Register Usage DuringMultithreaded Processing Using a Scorebard having Separate Memory Regions and Storing Sequential Register Size Indicators (Assignee NVIDIA Corp.), October
    • B. W. Coon et al. United States Patent #7,434,032: Tracking Register Usage DuringMultithreaded Processing Using a Scorebard having Separate Memory Regions and Storing Sequential Register Size Indicators (Assignee NVIDIA Corp.), October 2008.
    • (2008)
    • Coon, B.W.1
  • 10
    • 68549096107 scopus 로고    scopus 로고
    • Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware
    • W. Fung et al. Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware. ACM Trans. Archit. Code Optim., 6(2):1-37, 2009.
    • (2009) ACM Trans. Archit. Code Optim. , vol.6 , Issue.2 , pp. 1-37
    • Fung, W.1
  • 11
    • 78650817529 scopus 로고    scopus 로고
    • Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance
    • A. Gharaibeh and M. Ripeanu. Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance. In IEEE/ACM Supercomputing (SC 2010), 2010.
    • (2010) IEEE/ACM Supercomputing (SC 2010)
    • Gharaibeh, A.1    Ripeanu, M.2
  • 12
  • 13
    • 70450237431 scopus 로고    scopus 로고
    • Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator
    • J. H. Kelm et al. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator. In Proc. 36th Int'l Symp. on Computer Arch. (ISCA), pages 140-151, 2009.
    • (2009) Proc. 36th Int'l. Symp. on Computer Arch. (ISCA) , pp. 140-151
    • Kelm, J.H.1
  • 16
    • 44849137198 scopus 로고    scopus 로고
    • NVIDIA Tesla: A Unified Graphics and Computing Architecture
    • March-April
    • E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE, 28(2):39-55, March-April 2008.
    • (2008) Micro, IEEE , vol.28 , Issue.2 , pp. 39-55
    • Lindholm, E.1    Nickolls, J.2    Oberman, S.3    Montrym, J.4
  • 17
    • 79955893984 scopus 로고    scopus 로고
    • United States Patent Application #2010/0122067: Across-Thread Out-of-Order Instruction Dispatch in a Multithreaded Microprocessor (Assignee NVIDIA Corp.), May
    • E. Lindholm et al. United States Patent Application #2010/0122067: Across-Thread Out-of-Order Instruction Dispatch in a Multithreaded Microprocessor (Assignee NVIDIA Corp.), May 2010.
    • (2010)
    • Lindholm, E.1
  • 19
    • 66749170578 scopus 로고    scopus 로고
    • Tradeoffs in designing accelerator architectures for visual computing
    • A. Mahesri et al. Tradeoffs in designing accelerator architectures for visual computing. In Proc. 41st IEEE/ACM Int'l Symp. on Microarchitecture, pages 164-175, 2008.
    • (2008) Proc. 41st IEEE/ACM Int'l. Symp. on Microarchitecture , pp. 164-175
    • Mahesri, A.1
  • 20
    • 77954976292 scopus 로고    scopus 로고
    • Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance
    • J. Meng et al. Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance. In Proc. 37th Int'l Symp. on Computer Architecture (ISCA), pages 235- 246, 2010.
    • (2010) Proc. 37th Int'l. Symp. on Computer Architecture (ISCA) , pp. 235-246
    • Meng, J.1
  • 21
    • 78651550268 scopus 로고    scopus 로고
    • Scalable Parallel Programming with CUDA
    • Mar.-Apr.
    • J. Nickolls et al. Scalable Parallel Programming with CUDA. ACM Queue, 6(2):40-53, Mar.-Apr. 2008.
    • (2008) ACM Queue , vol.6 , Issue.2 , pp. 40-53
    • Nickolls, J.1
  • 24
    • 35948991669 scopus 로고    scopus 로고
    • NVIDIA Corporation. 3.1 edition
    • NVIDIA Corporation. NVIDIA CUDA Programming Guide, 3.1 edition, 2010.
    • (2010) NVIDIA CUDA Programming Guide
  • 26
    • 38849131252 scopus 로고    scopus 로고
    • High-Throughput Sequence Alignment Using Graphics Processing Units
    • M. Schatz et al. High-Throughput Sequence Alignment Using Graphics Processing Units. BMC Bioinformatics, 8(1):474, 2007.
    • (2007) BMC Bioinformatics , vol.8 , Issue.1 , pp. 474
    • Schatz, M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.