메뉴 건너뛰기




Volumn , Issue , 2013, Pages 99-110

Divergence-aware warp scheduling

Author keywords

caches; divergence; GPU; scheduling

Indexed keywords

CACHES; DIVERGENCE; GPU; HARDWARE THREAD SCHEDULING; ON-LINE CHARACTERIZATION; PROACTIVE SCHEDULING; RUN-TIME INFORMATION; SPARSE MATRIX-VECTOR MULTIPLY;

EID: 84892547586     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2540708.2540718     Document Type: Conference Paper
Times cited : (127)

References (37)
  • 3
    • 70349169075 scopus 로고    scopus 로고
    • Analyzing CUDA Workloads Using a Detailed GPU Simulator
    • A. Bakhoda et al. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In ISPASS 2009, pages 163-174.
    • (2009) ISPASS , pp. 163-174
    • Bakhoda, A.1
  • 4
    • 77954705607 scopus 로고    scopus 로고
    • Tracing Garbage Collection on Highly Parallel Platforms
    • K. Barabash and E. Petrank. Tracing Garbage Collection on Highly Parallel Platforms. In ISMM 2010, pages 1-10.
    • (2010) ISMM , pp. 1-10
    • Barabash, K.1    Petrank, E.2
  • 6
    • 74049143158 scopus 로고    scopus 로고
    • Implementing sparse matrix-vector multiplication on throughput-oriented processors
    • N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In SC 2009.
    • (2009) SC
    • Bell, N.1    Garland, M.2
  • 7
    • 70649092154 scopus 로고    scopus 로고
    • Rodinia: A Benchmark Suite for Heterogeneous Computing
    • S. Che et al. Rodinia: A Benchmark Suite for Heterogeneous Computing. In IISWC 2009, pages 44-54.
    • (2009) IISWC , pp. 44-54
    • Che, S.1
  • 8
    • 79951707102 scopus 로고    scopus 로고
    • Memory Latency Reduction via Thread Throttling
    • H.-Y. Cheng et al. Memory Latency Reduction via Thread Throttling. In MICRO-43, pages 53-64, 2010.
    • (2010) MICRO-43 , pp. 53-64
    • Cheng, H.-Y.1
  • 9
    • 77954719557 scopus 로고    scopus 로고
    • The Scalable Heterogeneous Computing (SHOC) benchmark suite
    • A. Danalis et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite. In GPGPU 2010.
    • (2010) GPGPU
    • Danalis, A.1
  • 10
    • 80052528714 scopus 로고    scopus 로고
    • Dark Silicon and the End of Multicore Scaling
    • H. Esmaeilzadeh et al. Dark Silicon and the End of Multicore Scaling. In ISCA 2011, pages 365-376.
    • (2011) ISCA , pp. 365-376
    • Esmaeilzadeh, H.1
  • 11
    • 79955923056 scopus 로고    scopus 로고
    • Thread Block Compaction for Efficient SIMT Control Flow
    • W. Fung and T. Aamodt. Thread Block Compaction for Efficient SIMT Control Flow. In HPCA 2011, pages 25-36.
    • (2011) HPCA , pp. 25-36
    • Fung, W.1    Aamodt, T.2
  • 12
    • 47349104432 scopus 로고    scopus 로고
    • Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
    • W. W. L. Fung et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In MICRO-40.
    • MICRO-40
    • Fung, W.W.L.1
  • 13
    • 80052533471 scopus 로고    scopus 로고
    • Energy-Efficient Mechanisms for Managing Thread Context in Throughput Processors
    • M. Gebhart and D. R. Johnson et al. Energy-Efficient Mechanisms for Managing Thread Context in Throughput Processors. In ISCA 2011, pages 235-246.
    • (2011) ISCA , pp. 235-246
    • Gebhart, M.1    Johnson, D.R.2
  • 14
    • 67650635164 scopus 로고    scopus 로고
    • Many-Core vs. Many-Thread Machines: Stay Away From the Valley
    • jan.
    • Z. Guz et al. Many-Core vs. Many-Thread Machines: Stay Away From the Valley. Computer Architecture Letters, pages 25-28, jan. 2009.
    • (2009) Computer Architecture Letters , pp. 25-28
    • Guz, Z.1
  • 15
    • 84862107632 scopus 로고    scopus 로고
    • Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems
    • T. H. Hetherington et al. Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems. In ISPASS 2012, pages 88-98.
    • (2012) ISPASS , pp. 88-98
    • Hetherington, T.H.1
  • 16
    • 79952811127 scopus 로고    scopus 로고
    • Accelerating CUDA Graph Algorithms at Maximum Warp
    • S. Hong et al. Accelerating CUDA Graph Algorithms at Maximum Warp. In PPoPP 2011, pages 267-276.
    • (2011) PPoPP , pp. 267-276
    • Hong, S.1
  • 17
    • 84858767531 scopus 로고    scopus 로고
    • CRUISE: Cache Replacement and Utility-Aware Scheduling
    • A. Jaleel et al. CRUISE: Cache Replacement and Utility-Aware Scheduling. In ASPLOS 2012, pages 249-260.
    • (2012) ASPLOS , pp. 249-260
    • Jaleel, A.1
  • 18
    • 77954998134 scopus 로고    scopus 로고
    • High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP)
    • A. Jaleel et al. High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP). In ISCA 2010, pages 60-71.
    • (2010) ISCA , pp. 60-71
    • Jaleel, A.1
  • 19
    • 84864068497 scopus 로고    scopus 로고
    • Characterizing and Improving the use of Demand-Fetched Caches in GPUs
    • W. Jia, K. A. Shaw, and M. Martonosi. Characterizing and Improving the use of Demand-Fetched Caches in GPUs. In ICS 2012, pages 15-24.
    • (2012) ICS , pp. 15-24
    • Jia, W.1    Shaw, K.A.2    Martonosi, M.3
  • 20
    • 84875640178 scopus 로고    scopus 로고
    • OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance
    • A. Jog et al. OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance. In ASPLOS 2013.
    • (2013) ASPLOS
    • Jog, A.1
  • 21
    • 84881126240 scopus 로고    scopus 로고
    • Orchestrated Scheduling and Prefetching for GPGPUs
    • A. Jog et al. Orchestrated Scheduling and Prefetching for GPGPUs. In ISCA, 2013.
    • (2013) ISCA
    • Jog, A.1
  • 22
    • 84887477265 scopus 로고    scopus 로고
    • Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs
    • O. Kayiran et al. Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs. In PACT 2013.
    • (2013) PACT
    • Kayiran, O.1
  • 23
    • 84892519366 scopus 로고    scopus 로고
    • Khronos Group. OpenCL. http://www.khronos.org/opencl/.
    • OpenCL
  • 25
    • 79951719035 scopus 로고    scopus 로고
    • Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
    • J. Lee, N. B. Lakshminarayana, H. Kim, and R. Vuduc. Many-Thread Aware Prefetching Mechanisms for GPGPU Applications. In MICRO-43, pages 213-224, 2010.
    • (2010) MICRO-43 , pp. 213-224
    • Lee, J.1    Lakshminarayana, N.B.2    Kim, H.3    Vuduc, R.4
  • 26
    • 84881151222 scopus 로고    scopus 로고
    • GPUWattch: Enabling Energy Optimizations in GPGPUs
    • J. Leng et al. GPUWattch: Enabling Energy Optimizations in GPGPUs. In ISCA 2013.
    • (2013) ISCA
    • Leng, J.1
  • 27
    • 44849137198 scopus 로고    scopus 로고
    • NVIDIA Tesla: A Unified Graphics and Computing Architecture
    • March-April
    • E. Lindholm et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE, 28(2):39-55, March-April 2008.
    • (2008) Micro, IEEE , vol.28 , Issue.2 , pp. 39-55
    • Lindholm, E.1
  • 28
    • 84881440334 scopus 로고    scopus 로고
    • How a Single Chip Causes Massive Power Bills GPUSimPow: A GPGPU Power Simulator
    • M. Maas et al. How a Single Chip Causes Massive Power Bills GPUSimPow: A GPGPU Power Simulator. In ISPASS 2013.
    • (2013) ISPASS
    • Maas, M.1
  • 29
    • 77954976292 scopus 로고    scopus 로고
    • Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance
    • J. Meng, D. Tarjan, and K. Skadron. Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance. In ISCA 2010, pages 235-246.
    • (2010) ISCA , pp. 235-246
    • Meng, J.1    Tarjan, D.2    Skadron, K.3
  • 30
    • 84863342255 scopus 로고    scopus 로고
    • Improving GPU Performance via Large Warps and Two-Level Warp Scheduling
    • V. Narasiman et al. Improving GPU Performance via Large Warps and Two-Level Warp Scheduling. In MICRO-44, pages 308-317, 2011.
    • (2011) MICRO-44 , pp. 308-317
    • Narasiman, V.1
  • 31
    • 35348920021 scopus 로고    scopus 로고
    • Adaptive Insertion Policies for High Performance Caching
    • M. K. Qureshi et al. Adaptive Insertion Policies for High Performance Caching. In ISCA 2007, pages 381-391.
    • (2007) ISCA , pp. 381-391
    • Qureshi, M.K.1
  • 37
    • 0030149507 scopus 로고    scopus 로고
    • CACTI: An Enhanced Cache Access and Cycle Time Model
    • May
    • S. Wilton and N. Jouppi. CACTI: An Enhanced Cache Access and Cycle Time Model. Solid-State Circuits, IEEE Journal of, 31(5):677-688, May 1996.
    • (1996) Solid-State Circuits, IEEE Journal of , vol.31 , Issue.5 , pp. 677-688
    • Wilton, S.1    Jouppi, N.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.