메뉴 건너뛰기




Volumn , Issue , 2010, Pages 235-246

Dynamic warp subdivision for integrated branch and memory divergence tolerance

Author keywords

Branch divergence; Latency hiding; Memory divergence; SIMD; Warp

Indexed keywords

AREA OVERHEAD; CACHE HIERARCHIES; DATA PARALLEL; IDLE CYCLES; L2 CACHE; LATENCY HIDING; MAXIMIZE THROUGHPUT; MEMORY ACCESS; MEMORY LEVEL PARALLELISMS; MULTI-THREADING; MULTIPLE PROCESSING; REGISTER FILES;

EID: 77954976292     PISSN: 10636897     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1815961.1815992     Document Type: Conference Paper
Times cited : (207)

References (32)
  • 1
    • 77955005234 scopus 로고    scopus 로고
    • NVIDIA's next generation CUDA compute architecture: Fermi
    • NVIDIA's next generation CUDA compute architecture: Fermi. NVIDIA Corporation, 2009.
    • (2009) NVIDIA Corporation
  • 2
    • 77954969653 scopus 로고    scopus 로고
    • ATI. Radeon 9700 Pro. http://mirror.ati.com/products/pc/radeon9700pro, 2002.
    • (2002) Radeon 9700 Pro
  • 4
    • 51449118065 scopus 로고    scopus 로고
    • A performance study of general purpose applications on graphics processors using CUDA
    • S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general purpose applications on graphics processors using CUDA. JPDC, 2008.
    • (2008) JPDC
    • Che, S.1    Boyer, M.2    Meng, J.3    Tarjan, D.4    Sheaffer, J.W.5    Skadron, K.6
  • 5
    • 33845901233 scopus 로고    scopus 로고
    • Learning-based SMT processor resource distribution via hill-climbing
    • S. Choi and D. Yeung. Learning-based SMT processor resource distribution via hill-climbing. In ISCA, 2006.
    • (2006) ISCA
    • Choi, S.1    Yeung, D.2
  • 10
    • 47349104432 scopus 로고    scopus 로고
    • Dynamic warp formation and scheduling for efficient GPU control flow
    • W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO, 2007.
    • (2007) MICRO
    • Fung, W.W.L.1    Sham, I.2    Yuan, G.3    Aamodt, T.M.4
  • 11
    • 34247376580 scopus 로고    scopus 로고
    • Chip multiprocessing and the cell broadband engine
    • M. Gschwind. Chip multiprocessing and the Cell Broadband Engine. In CF, 2006.
    • (2006) CF
    • Gschwind, M.1
  • 16
    • 77954020709 scopus 로고    scopus 로고
    • Exploiting inter-thread temporal locality for chip multithreading
    • J. Meng, J. W. Sheaffer, and K. Skadron. Exploiting inter-thread temporal locality for chip multithreading. In PDPS, 2010.
    • (2010) PDPS
    • Meng, J.1    Sheaffer, J.W.2    Skadron, K.3
  • 17
    • 77955007736 scopus 로고    scopus 로고
    • Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
    • J. Meng and K. Skadron. Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling. In ICCD, 2007.
    • (2007) ICCD
    • Meng, J.1    Skadron, K.2
  • 18
    • 77954994930 scopus 로고    scopus 로고
    • Dynamic warp subdivision for integrated branch and memory divergence tolerance: Extended results
    • J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance: Extended results. U.Va. Tech. Report CS-2010-2015, 2010.
    • (2010) U.Va. Tech. Report CS-2010-5
    • Meng, J.1    Tarjan, D.2    Skadron, K.3
  • 21
    • 0016994364 scopus 로고
    • Implementation of permutation functions in illiac iv-type computers
    • S. E. Orcutt. Implementation of permutation functions in illiac iv-type computers. IEEE Trans. Comput., 25(9), 1976.
    • (1976) IEEE Trans. Comput. , vol.25 , pp. 9
    • Orcutt, S.E.1
  • 26
    • 0017922490 scopus 로고
    • The CRAY-1 computer system
    • R. M. Russell. The CRAY-1 computer system. Commun. ACM, 21(1), 1978.
    • (1978) Commun. ACM , vol.21 , pp. 1
    • Russell, R.M.1
  • 28
    • 0030644231 scopus 로고    scopus 로고
    • A mechanism for SIMD execution of SPMD programs
    • Y. Takahashi. A mechanism for SIMD execution of SPMD programs. In HPC-ASIA, 1997.
    • (1997) HPC-ASIA
    • Takahashi, Y.1
  • 29
    • 0035178105 scopus 로고    scopus 로고
    • Cost-effective hardware acceleration of multimedia applications
    • D. Talla and L. K. John. Cost-effective hardware acceleration of multimedia applications. In ICCD, 2001.
    • (2001) ICCD
    • Talla, D.1    John, L.K.2
  • 30
    • 74049151553 scopus 로고    scopus 로고
    • Increasing memory miss tolerance for SIMD cores
    • D. Tarjan, J. Meng, and K. Skadron. Increasing memory miss tolerance for SIMD cores. In SC, 2009.
    • (2009) SC
    • Tarjan, D.1    Meng, J.2    Skadron, K.3
  • 31
    • 0035696665 scopus 로고    scopus 로고
    • Handling long-latency loads in a simultaneous multithreading processor
    • D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In MICRO 34, 2001.
    • (2001) MICRO , vol.34
    • Tullsen, D.M.1    Brown, J.A.2
  • 32
    • 0029194459 scopus 로고
    • The SPLASH-2 programs: Characterization and methodological considerations
    • S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. ISCA, 1995.
    • (1995) ISCA
    • Woo, S.C.1    Ohara, M.2    Torrie, E.3    Singh, J.P.4    Gupta, A.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.