SCOPUS 정보 검색 플랫폼

Proceedings - International Symposium on High-Performance Computer Architecture

Volumn , Issue , 2014, Pages 260-271

Improving GPGPU resource utilization through alternative thread block scheduling

(7) Lee, Minseok a Song, Seokwoo a Moon, Joosik a Kim, John a Seo, Woong b Cho, Yeongon b Ryu, Soojung b

a Korea Advanced Institute of Science and Technology (KAIST) (South Korea)

b SAMSUNG Electronics (South Korea)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER ARCHITECTURE; PROGRAM PROCESSORS; SUPERCOMPUTERS; WEAVING;

BLOCK SCHEDULING; IMPROVE PERFORMANCE; MULTIPLE KERNELS; NUMBER OF THREADS; OPTIMAL NUMBER; RESOURCE UTILIZATIONS;

SCHEDULING;

EID: 84903951085 PISSN: 15300897 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/HPCA.2014.6835937 Document Type: Conference Paper

Times cited : (156)

References (31)

1
- 84904009584
- US Patent US20130185725
- K. M. Abdalla et al. Scheduling and Execution of Compute Tasks, US Patent US20130185725, 2013.
- (2013) Scheduling and Execution of Compute Tasks
- Abdalla, K.M.¹

2
- 84880287859
- Warped register file: A power efficient register file for gpgpus
- Tel- Aviv, Israel
- M. Abdel-Majeed et al. Warped Register File: A Power Efficient Register File for GPGPUs. In International Symposium on High Performance Computer Architecture (HPCA), pages 344-355, Tel- Aviv, Israel, 2013.
- (2013) International Symposium on High Performance Computer Architecture (HPCA) , pp. 344-355
- Abdel-Majeed, M.¹

3
- 84860351763
- The case for gpgpu spatial multitasking
- New Orleans, LA, USA
- J. Adriaens et al. The Case for GPGPU Spatial Multitasking. In International Symposium on High Performance Computer Architecture (HPCA), pages 1-12, New Orleans, LA, USA, 2012.
- (2012) International Symposium on High Performance Computer Architecture (HPCA) , pp. 1-12
- Adriaens, J.¹

4
- 70349169075
- Analyzing cudaworkloads using a detailed GPU simulator
- Boston, Massachusetts, USA
- A. Bakhoda et al. Analyzing CUDAWorkloads using a Detailed GPU Simulator. In International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 163-174, Boston, Massachusetts, USA, 2009.
- (2009) International Symposium on Performance Analysis of Systems and Software (ISPASS) , pp. 163-174
- Bakhoda, A.¹

5
- 84904009586
- Programming CUDA
- I. A. Buck. Programming CUDA. In Supercomputing 2007 Tutorial Notes, 2007.
- (2007) Supercomputing 2007 Tutorial Notes
- Buck, I.A.¹

6
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- Austin, TX, USA
- S. Che et al. Rodinia: A Benchmark Suite for Heterogeneous Computing. In International Symposium on Workload Characterization (IISWC), pages 44-54, Austin, TX, USA, 2009.
- (2009) International Symposium on Workload Characterization (IISWC) , pp. 44-54
- Che, S.¹

7
- 79951707102
- Memory latency reduction via thread throttling
- Atlanta, Georgia, USA
- H.-Y. Cheng et al. Memory Latency Reduction via Thread Throttling. In International Symposium on Microarchitecture (MICRO), pages 53-64, Atlanta, Georgia, USA, 2010.
- (2010) International Symposium on Microarchitecture (MICRO) , pp. 53-64
- Cheng, H.-Y.¹

8
- 84904009578
- W. W. L. Fung et al. Thread block compaction for efficient simt control flow.
- Thread Block Compaction for Efficient Simt Control Flow
- Fung, W.W.L.¹

9
- 47349104432
- Dynamic warp formation and scheduling for efficient gpu control flow
- Chicago, Illinois, USA
- W. W. L. Fung et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In International Symposium on Microarchitecture (MICRO), pages 407-420, Chicago, Illinois, USA, 2007.
- (2007) International Symposium on Microarchitecture (MICRO) , pp. 407-420
- Fung, W.W.L.¹

10
- 53749092570
- Parallel computing experiences with cuda
- M. Garland et al. Parallel Computing Experiences with CUDA. Micro, IEEE, 28(4), 2008.
- (2008) Micro, IEEE , vol.28 , Issue.4
- Garland, M.¹

11
- 80052533471
- Energy-efficient mechanisms for managing thread context in throughput processors
- San Jose, California, USA
- M. Gebhart et al. Energy-Efficient Mechanisms for Managing Thread Context in Throughput Processors. In International Symposium on Computer architecture (ISCA), pages 235-246, San Jose, California, USA, 2011.
- (2011) International Symposium on Computer Architecture (ISCA) , pp. 235-246
- Gebhart, M.¹

12
- 84894883016
- Fine-grained resource sharing for concurrent gpgpu kernels
- Berkeley, CA, USA
- C. Gregg et al. Fine-Grained Resource Sharing for Concurrent GPGPU Kernels. In Proceedings of the 4th USENIX Conference on Hot Topics in Parallelism (HotPar), pages 10-10, Berkeley, CA, USA, 2012.
- (2012) Proceedings of the 4th USENIX Conference on Hot Topics in Parallelism (HotPar) , pp. 10-10
- Gregg, C.¹

13
- 67650635164
- Many-core vs
- Z. Guz et al. Many-Core vs. Many-Thread Machines: Stay Away From the Valley. IEEE Computer Architecture Letters, 8(1):25-28, 2009.
- (2009) Many-Thread Machines: Stay Away from the Valley. IEEE Computer Architecture Letters , vol.8 , Issue.1 , pp. 25-28
- Guz, Z.¹

14
- 84881126240
- Orchestrated scheduling and prefetching for gpgpus
- Tel-Aviv, Israel
- A. Jog et al. Orchestrated Scheduling and Prefetching for GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA), pages 332-343, Tel-Aviv, Israel, 2013.
- (2013) Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA) , pp. 332-343
- Jog, A.¹

15
- 84875640178
- OWL: Cooperative thread array aware scheduling techniques for improving gpgpu performance
- Houston, TX, USA
- A. Jog et al. OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 395-406, Houston, TX, USA, 2013.
- (2013) International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , pp. 395-406
- Jog, A.¹

16
- 84887477265
- Neither more nor less: Optimizing thread-level parallelism for gpgpus
- Edinburgh, Scotland, UK
- O. Kayiran et al. Neither More Nor Less: Optimizing Thread-Level Parallelism for GPGPUs. In International Conference on Parallel Architecture and Compiliation Techniques(PACT), pages 157-166, Edinburgh, Scotland, UK, 2013.
- (2013) International Conference on Parallel Architecture and Compiliation Techniques(PACT) , pp. 157-166
- Kayiran, O.¹

17
- 84862910894
- Effect of instruction fetch and memory scheduling on gpu performance
- N. B. Lakshminarayana et al. Effect of Instruction Fetch and Memory Scheduling on GPU Performance. Workshop on Language, Compiler, and Architecture Support for GPGPU, 2010.
- (2010) Workshop on Language, Compiler, and Architecture Support for GPGPU
- Lakshminarayana, N.B.¹

18
- 84881151222
- GPUWattch: Enabling energy optimizations in gpgpus
- Tel-Aviv, Israel
- J. Leng et al. GPUWattch: Enabling Energy Optimizations in GPGPUs. In International Symposium on Computer Architecture (ISCA), pages 487-498, Tel-Aviv, Israel, 2013.
- (2013) International Symposium on Computer Architecture (ISCA) , pp. 487-498
- Leng, J.¹

19
- 70349100958
- A. Munshi. The OpenCL Specification, 2011.
- (2011) The OpenCL Specification
- Munshi, A.¹

20
- 84863342255
- Improving gpu performance via large warps and two-level warp scheduling
- Porto Alegre, Brazil
- V. Narasiman et al. Improving GPU Performance via Large Warps and Two-Level Warp Scheduling. In International Symposium on Microarchitecture (MICRO), pages 308-317, Porto Alegre, Brazil, 2011.
- (2011) International Symposium on Microarchitecture (MICRO) , pp. 308-317
- Narasiman, V.¹

21
- 84879293298
- Micro, IEEE, March-April
- J. Nickolls et al. The GPU Computing Era. Micro, IEEE, March-April.
- The GPU Computing Era
- Nickolls, J.¹

22
- 84864861336
- NVIDIA
- NVIDIA. CUDA C/C++ SDK Code Samples, 2011.
- (2011) CUDA C/C++ SDK Code Samples

23
- 84904009579
- NVIDIA. Fermi: NVIDIA's Next Generation CUDA Compute Architecture
- NVIDIA. Fermi: NVIDIA's Next Generation CUDA Compute Architecture, 2011.
- (2011)

24
- 82955212653
- NVIDIA
- NVIDIA. CUDA C Programming Guide, 2012.
- (2012) CUDA C Programming Guide

25
- 85019266199
- NVIDIA
- NVIDIA. Kepler: The Fastest, Most Efficient HPC Architecture Ever Built, 2012.
- (2012) Kepler: The Fastest, Most Efficient HPC Architecture Ever Built

26
- 84904009580
- NVIDIA. NVIDIA PerfKit: NVIDIA Performance Toolkit, 2013
- NVIDIA. NVIDIA PerfKit: NVIDIA Performance Toolkit, 2013.

27
- 84902293994
- NVIDIA
- NVIDIA. Profiler User's Guide, 2013.
- (2013) Profiler User's Guide

28
- 84875669496
- Improving GPGPU concurrency with elastic kernels
- Houston, TX, USA
- S. Pai et al. Improving GPGPU Concurrency with Elastic Kernels. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 407-418, Houston, TX, USA, 2013.
- (2013) International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , pp. 407-418
- Pai, S.¹

29
- 84876590572
- Cache-conscious wavefront scheduling
- Vancouver, Canada
- T. Rogers et al. Cache-Conscious Wavefront Scheduling. In International Symposium on Microarchitecture (MICRO), pages 78-85, Vancouver, Canada, 2012.
- (2012) International Symposium on Microarchitecture (MICRO) , pp. 78-85
- Rogers, T.¹

30
- 84888866287
- Parboil: A revised benchmark suite for scientific and commercial throughput computing
- J. Stratton et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. Center for Reliable and High-Performance Computing, 2012.
- (2012) Center for Reliable and High-Performance Computing
- Stratton, J.¹

31
- 77957764904
- Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on cmps
- Seattle, WA, USA
- M. A. Suleman et al. Feedback-Driven Threading: Power-Efficient and High-Performance Execution of Multi-Threaded Workloads on CMPs. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 277-286, Seattle, WA, USA, 2008.
- (2008) International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , pp. 277-286
- Suleman, M.A.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.