SCOPUS 정보 검색 플랫폼

Proceedings - International Symposium on High-Performance Computer Architecture

Volumn , Issue , 2011, Pages 25-36

Thread block compaction for efficient SIMT control flow

(2) Fung, Wilson W L a Aamodt, Tor M a

a UNIVERSITY OF BRITISH COLUMBIA (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

COMPACTION MECHANISMS; CONTROL FLOWS; GRAPHICS PROCESSOR UNITS; HARDWARE COST; LARGE GROUPS; MANY-CORE; MULTIPLE DATA; PER UNIT; PROCESSING UNITS; PROGRAMMING MODELS; RECONVERGENCE; SCRATCH PAD MEMORY; SIMULATION RESULT;

COMPACTION; COMPUTER ARCHITECTURE; COMPUTER HARDWARE; COMPUTER PROGRAMMING; PROGRAM PROCESSORS;

WEAVING;

EID: 79955923056 PISSN: 15300897 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/HPCA.2011.5749714 Document Type: Conference Paper

Times cited : (155)

References (26)

1
- 70450183916
- Understanding the Efficiency of Ray Traversal on GPUs
- T. Aila and S. Laine. Understanding the Efficiency of Ray Traversal on GPUs. In HPG '09, 2009.
- (2009) HPG '09
- Aila, T.¹ Laine, S.²

2
- 74049132971
- AMD. 1.0 edition, March
- AMD. R700-Family Instruction Set Architecture, 1.0 edition, March 2009.
- (2009) R700-Family Instruction Set Architecture

3
- 70349169075
- Analyzing CUDA Workloads Using a Detailed GPU Simulator
- April
- A. Bakhoda et al. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In Int'l Symp. on Perf. Analysis of Systems and Software (ISPASS), pages 163-174, April 2009.
- (2009) Int'l. Symp. on Perf. Analysis of Systems and Software (ISPASS) , pp. 163-174
- Bakhoda, A.¹

4
- 0015330108
- The Illiac IV System
- apr.
- W. Bouknight et al. The Illiac IV System. Proceedings of the IEEE, 60(4):369-388, apr. 1972.
- (1972) Proceedings of the IEEE , vol.60 , Issue.4 , pp. 369-388
- Bouknight, W.¹

5
- 70649092154
- Rodinia: A Benchmark Suite for Heterogeneous Computing
- S. Che et al. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Int'l Symp. on Workload Characterization (IISWC), pages 44-54, 2009.
- (2009) Int'l. Symp. on Workload Characterization (IISWC) , pp. 44-54
- Che, S.¹

6
- 49249135216
- Convergence of Recognition, Mining, and Synthesis Workloads and its Implications
- May
- Y.-K. Chen et al. Convergence of Recognition, Mining, and Synthesis Workloads and its Implications. Proceedings of the IEEE, 96(5), May 2008.
- (2008) Proceedings of the IEEE , vol.96 , Issue.5
- Chen, Y.-K.¹

7
- 79955888325
- United States Patent #7,353,369: System and Method for Managing Divergent Threads in a SIMD Architecture (Assignee NVIDIA Corp.), April
- B. W. Coon et al. United States Patent #7,353,369: System and Method for Managing Divergent Threads in a SIMD Architecture (Assignee NVIDIA Corp.), April 2008.
- (2008)
- Coon, B.W.¹

8
- 79955922544
- United States Patent #7,434,032: Tracking Register Usage DuringMultithreaded Processing Using a Scorebard having Separate Memory Regions and Storing Sequential Register Size Indicators (Assignee NVIDIA Corp.), October
- B. W. Coon et al. United States Patent #7,434,032: Tracking Register Usage DuringMultithreaded Processing Using a Scorebard having Separate Memory Regions and Storing Sequential Register Size Indicators (Assignee NVIDIA Corp.), October 2008.
- (2008)
- Coon, B.W.¹

9
- 47349104432
- Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
- W. Fung et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In Proc. 40th IEEE/ACM Int'l Symp. on Microarchitecture, 2007.
- Proc. 40th IEEE/ACM Int'l. Symp. on Microarchitecture, 2007
- Fung, W.¹

10
- 68549096107
- Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware
- W. Fung et al. Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware. ACM Trans. Archit. Code Optim., 6(2):1-37, 2009.
- (2009) ACM Trans. Archit. Code Optim. , vol.6 , Issue.2 , pp. 1-37
- Fung, W.¹

11
- 78650817529
- Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance
- A. Gharaibeh and M. Ripeanu. Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance. In IEEE/ACM Supercomputing (SC 2010), 2010.
- (2010) IEEE/ACM Supercomputing (SC 2010)
- Gharaibeh, A.¹ Ripeanu, M.²

12
- 0034459255
- Efficient Conditional Operations for Data- Parallel Architectures
- U. J. Kapasi et al. Efficient Conditional Operations for Data- Parallel Architectures. In Proc. 33rd IEEE/ACM Int'l Symp. on Microarchitecture, pages 159-170, 2000.
- (2000) Proc. 33rd IEEE/ACM Int'l. Symp. on Microarchitecture , pp. 159-170
- Kapasi, U.J.¹

13
- 70450237431
- Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator
- J. H. Kelm et al. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator. In Proc. 36th Int'l Symp. on Computer Arch. (ISCA), pages 140-151, 2009.
- (2009) Proc. 36th Int'l. Symp. on Computer Arch. (ISCA) , pp. 140-151
- Kelm, J.H.¹

14
- 4644337990
- The Vector-Thread Architecture
- R. Krashinsky et al. The Vector-Thread Architecture. In Proc. 31st Int'l Symp. on Computer Arch. (ISCA), pages 52-63, 2004.
- (2004) Proc. 31st Int'l. Symp. on Computer Arch. (ISCA) , pp. 52-63
- Krashinsky, R.¹

15
- 0021458622
- Chap - A SIMD Graphics Processor
- A. Levinthal and T. Porter. Chap - A SIMD Graphics Processor. In 11th Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH), pages 77-82, 1984.
- (1984) 11th Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH) , pp. 77-82
- Levinthal, A.¹ Porter, T.²

16
- 44849137198
- NVIDIA Tesla: A Unified Graphics and Computing Architecture
- March-April
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE, 28(2):39-55, March-April 2008.
- (2008) Micro, IEEE , vol.28 , Issue.2 , pp. 39-55
- Lindholm, E.¹ Nickolls, J.² Oberman, S.³ Montrym, J.⁴

17
- 79955893984
- United States Patent Application #2010/0122067: Across-Thread Out-of-Order Instruction Dispatch in a Multithreaded Microprocessor (Assignee NVIDIA Corp.), May
- E. Lindholm et al. United States Patent Application #2010/0122067: Across-Thread Out-of-Order Instruction Dispatch in a Multithreaded Microprocessor (Assignee NVIDIA Corp.), May 2010.
- (2010)
- Lindholm, E.¹

18
- 79955897911
- PhD thesis, University of Illinois at Urbana-Champaign
- A. Mahesri. Tradeoffs in Designing Massively Parallel Accelerator Architectures. PhD thesis, University of Illinois at Urbana-Champaign, 2009.
- (2009) Tradeoffs in Designing Massively Parallel Accelerator Architectures
- Mahesri, A.¹

19
- 66749170578
- Tradeoffs in designing accelerator architectures for visual computing
- A. Mahesri et al. Tradeoffs in designing accelerator architectures for visual computing. In Proc. 41st IEEE/ACM Int'l Symp. on Microarchitecture, pages 164-175, 2008.
- (2008) Proc. 41st IEEE/ACM Int'l. Symp. on Microarchitecture , pp. 164-175
- Mahesri, A.¹

20
- 77954976292
- Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance
- J. Meng et al. Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance. In Proc. 37th Int'l Symp. on Computer Architecture (ISCA), pages 235- 246, 2010.
- (2010) Proc. 37th Int'l. Symp. on Computer Architecture (ISCA) , pp. 235-246
- Meng, J.¹

21
- 78651550268
- Scalable Parallel Programming with CUDA
- Mar.-Apr.
- J. Nickolls et al. Scalable Parallel Programming with CUDA. ACM Queue, 6(2):40-53, Mar.-Apr. 2008.
- (2008) ACM Queue , vol.6 , Issue.2 , pp. 40-53
- Nickolls, J.¹

22
- 77951900491
- NVIDIA. October
- NVIDIA. NVIDIA's Next Generation CUDA Compute Architecture: Fermi, October 2009.
- (2009) NVIDIA's next Generation CUDA Compute Architecture: Fermi

23
- 77951176023
- NVIDIA Corporation. CUDA Toolkit 2.3 edition
- NVIDIA Corporation. NVIDIA Compute PTX: Parallel Thread Execution ISA Version 1.4, CUDA Toolkit 2.3 edition, 2009.
- (2009) NVIDIA Compute PTX: Parallel Thread Execution ISA Version 1.4

24
- 35948991669
- NVIDIA Corporation. 3.1 edition
- NVIDIA Corporation. NVIDIA CUDA Programming Guide, 3.1 edition, 2010.
- (2010) NVIDIA CUDA Programming Guide

25
- 27344436659
- Scalable molecular dynamics with NAMD
- J. C. Phillips et al. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 2005.
- (2005) Journal of Computational Chemistry
- Phillips, J.C.¹

26
- 38849131252
- High-Throughput Sequence Alignment Using Graphics Processing Units
- M. Schatz et al. High-Throughput Sequence Alignment Using Graphics Processing Units. BMC Bioinformatics, 8(1):474, 2007.
- (2007) BMC Bioinformatics , vol.8 , Issue.1 , pp. 474
- Schatz, M.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.