SCOPUS 정보 검색 플랫폼

Proceedings of the Annual International Symposium on Microarchitecture, MICRO

Volumn , Issue , 2011, Pages 308-317

Improving GPU performance via large warps and two-level warp scheduling

(6) Narasiman, Veynu a Shebanow, Michael b Lee, Chang Joo c Miftakhutdinov, Rustam a Mutlu, Onur d Patt, Yale N a

a UNIVERSITY OF TEXAS AT AUSTIN (United States)

b NVIDIA (United States)

c INTEL CORPORATION (United States)

d CARNEGIE MELLON UNIVERSITY (United States)

Author keywords

divergence; GPGPU; SIMD; warp scheduling

Indexed keywords

COMPUTATIONAL POWER; COMPUTATIONAL RESOURCES; CONDITIONAL BRANCH; DIVERGENCE; GENERAL PURPOSE; GPGPU; GPU PROGRAMMING; GRAPHICS PROCESSING UNITS; MICRO ARCHITECTURES; PARALLEL APPLICATION; POPULAR PLATFORM; SIMD;

COMPUTER PROGRAMMING; PROGRAM PROCESSORS; SCHEDULING; SCHEDULING ALGORITHMS;

WEAVING;

EID: 84863342255 PISSN: 10724451 EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2155620.2155656 Document Type: Conference Paper

Times cited : (338)

References (27)

1
- 84882609297
- Advanced Micro Devices, Inc. ATI Stream Technology. http://www.amd.com/ stream.
- ATI Stream Technology

2
- 0025431380
- April: A processor architecture for multiprocessing
- A. Agarwal et al. April: a processor architecture for multiprocessing. In ISCA-17, 1990.
- (1990) ISCA-17
- Agarwal, A.¹

3
- 0033895964
- Speed and power scaling of SRAMs
- Feb.
- B. Amrutur and M. Horowitz. Speed and power scaling of SRAMs. IEEE JSCC, 35(2):175-185, Feb. 2000.
- (2000) IEEE JSCC , vol.35 , Issue.2 , pp. 175-185
- Amrutur, B.¹ Horowitz, M.²

4
- 0015330108
- The Illiac IV system
- Apr.
- W. J. Bouknight et al. The Illiac IV system. Proceedings of the IEEE, 60(4):369-388, Apr. 1972.
- (1972) Proceedings of the IEEE , vol.60 , Issue.4 , pp. 369-388
- Bouknight, W.J.¹

5
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009.
- (2009) IISWC
- Che, S.¹

6
- 79955923056
- Thread block compaction for efficient simt control flow
- W. W. L. Fung and T. Aamodt. Thread block compaction for efficient simt control flow. In HPCA-17, 2011.
- (2011) HPCA-17
- Fung, W.W.L.¹ Aamodt, T.²

7
- 47349104432
- Dynamic warp formation and scheduling for efficient GPU control flow
- W. W. L. Fung et al. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO-40, 2007.
- (2007) MICRO-40
- Fung, W.W.L.¹

8
- 68549096107
- Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
- June
- W. W. L. Fung et al. Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware. ACM TACO, 6(2):1-37, June 2009.
- (2009) ACM TACO , vol.6 , Issue.2 , pp. 1-37
- Fung, W.W.L.¹

9
- 65349159175
- Compute unified device architecture application suitability
- may-jun
- W.-M. Hwu et al. Compute unified device architecture application suitability. Computing in Science Engineering, may-jun 2009.
- (2009) Computing in Science Engineering
- Hwu, W.-M.¹

10
- 2342652812
- Stream register files with indexed access
- N. Jayasena et al. Stream register files with indexed access. In HPCA-10, 2004.
- (2004) HPCA-10
- Jayasena, N.¹

11
- 77954999879
- Efficient conditional operations for data-parallel architectures
- U. Kapasi et al. Efficient conditional operations for data-parallel architectures. In MICRO-33, 2000.
- (2000) MICRO-33
- Kapasi, U.¹

12
- 0036398375
- Vlsi design and verification of the imagine processor
- B. Khailany et al. Vlsi design and verification of the imagine processor. In ICCD, 2002.
- (2002) ICCD
- Khailany, B.¹

13
- 84863372818
- Khronos Group. OpenCL. http://www.khronos.org/opencl.
- OpenCL

14
- 77951157944
- Elsevier Science
- D. Kirk and W. W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Elsevier Science, 2010.
- (2010) Programming Massively Parallel Processors: A Hands-on Approach
- Kirk, D.¹ Hwu, W.W.²

15
- 4644337990
- The vector-thread architecture
- R. Krashinsky et al. The vector-thread architecture. In ISCA-31, 2004.
- (2004) ISCA-31
- Krashinsky, R.¹

16
- 84862910894
- Effect of instruction fetch and memory scheduling on gpu performance
- N. B. Lakshminarayana and H. Kim. Effect of instruction fetch and memory scheduling on gpu performance. In Workshop on Language, Compiler, and Architecture Support for GPGPU, 2010.
- Workshop on Language, Compiler, and Architecture Support for GPGPU, 2010
- Lakshminarayana, N.B.¹ Kim, H.²

17
- 77954976292
- Dynamic warp subdivision for integrated branch and memory divergence tolerance
- J. Meng et al. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In ISCA-37, 2010.
- (2010) ISCA-37
- Meng, J.¹

18
- 47349098275
- MineBench: A benchmark suite for data mining workloads
- R. Narayanan et al. MineBench: A benchmark suite for data mining workloads. In IISWC, 2006.
- (2006) IISWC
- Narayanan, R.¹

19
- 84863390635
- NVIDIA. CUDA GPU Computing SDK. http://developer.nvidia.com/gpu- computing-sdk.
- CUDA GPU Computing SDK

20
- 84863354507
- NVIDIA
- NVIDIA. CUDA Programming Guide Version 3.0, 2010.
- (2010) CUDA Programming Guide Version 3.0

21
- 84863373131
- NVIDIA
- NVIDIA. PTX ISA Version 2.0, 2010.
- (2010) PTX ISA Version 2.0

22
- 0017922490
- The CRAY-1 computer system
- Jan.
- R. M. Russell. The CRAY-1 computer system. Communications of the ACM, 21(1):63-72, Jan. 1978.
- (1978) Communications of the ACM , vol.21 , Issue.1 , pp. 63-72
- Russell, R.M.¹

23
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
- S. Ryoo et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP, 2008.
- PPoPP, 2008
- Ryoo, S.¹

24
- 0018282603
- A pipelined shared resource MIMD computer
- B. J. Smith. A pipelined shared resource MIMD computer. In ICPP, 1978.
- (1978) ICPP
- Smith, B.J.¹

25
- 0033727057
- Vector instruction set support for conditional operations
- J. E. Smith et al. Vector instruction set support for conditional operations. In ISCA-27, 2000.
- (2000) ISCA-27
- Smith, J.E.¹

26
- 84863352139
- Parallel operation in the control data 6600
- J. E. Thornton. Parallel operation in the control data 6600. In AFIPS, 1965.
- (1965) AFIPS
- Thornton, J.E.¹

27
- 0035696665
- Handling long-latency loads in a simultaneous multithreading processor
- D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In MICRO-34, 2001.
- (2001) MICRO-34
- Tullsen, D.M.¹ Brown, J.A.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.