메뉴 건너뛰기




Volumn , Issue , 2010, Pages 213-224

Many-thread aware prefetching mechanisms for GPGPU applications

Author keywords

GPGPU; Prefetch throttling; Prefetching

Indexed keywords

CPU SYSTEMS; GPGPU; HARDWARE AND SOFTWARE; MEMORY ACCESS; MEMORY LATENCIES; MULTITHREADED; PERFORMANCE BENEFITS; PREFETCH THROTTLING; PREFETCHES; PREFETCHING; THREAD LEVEL PARALLELISM; TRAINING ALGORITHMS;

EID: 79951719035     PISSN: 10724451     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/MICRO.2010.44     Document Type: Conference Paper
Times cited : (116)

References (35)
  • 4
    • 0029308368 scopus 로고
    • Effective hardware based data prefetching for highperformance processors
    • T.-F. Chen and J.-L. Baer. Effective hardware based data prefetching for highperformance processors. IEEE Trans. Computers, 44(5):609-623, 1995.
    • (1995) IEEE Trans. Computers , vol.44 , Issue.5 , pp. 609-623
    • Chen, T.-F.1    Baer, J.-L.2
  • 6
    • 0032640138 scopus 로고    scopus 로고
    • Minimizing conflicts between vector streams in interleaved memory systems
    • A. Dal Corral and J. Llaberia. Minimizing conflicts between vector streams in interleaved memory systems. IEEE Transactions on Computers, 48(4):449-456, 1999.
    • (1999) IEEE Transactions on Computers , vol.48 , Issue.4 , pp. 449-456
    • Dal Corral, A.1    Llaberia, J.2
  • 7
    • 78149233155 scopus 로고    scopus 로고
    • Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems
    • G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark. Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems. In PACT-19, 2010.
    • (2010) PACT-19
    • Diamos, G.1    Kerr, A.2    Yalamanchili, S.3    Clark, N.4
  • 8
    • 76749142994 scopus 로고    scopus 로고
    • Coordinated control of multiple prefetchers in multi-core systems
    • E. Ebrahimi, O. Mutlu, C. J. Lee, and Y. N. Patt. Coordinated control of multiple prefetchers in multi-core systems. In MICRO-42, 2009.
    • (2009) MICRO-42
    • Ebrahimi, E.1    Mutlu, O.2    Lee, C.J.3    Patt, Y.N.4
  • 9
    • 64949179220 scopus 로고    scopus 로고
    • Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems
    • E. Ebrahimi, O. Mutlu, and Y. N. Patt. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA-15, 2009.
    • (2009) HPCA-15
    • Ebrahimi, E.1    Mutlu, O.2    Patt, Y.N.3
  • 10
    • 0026157234 scopus 로고
    • Data prefetching in multiprocessor vector cache memories
    • J. Fu and J. Patel. Data prefetching in multiprocessor vector cache memories. In ISCA-18, 1991.
    • (1991) ISCA-18
    • Fu, J.1    Patel, J.2
  • 11
    • 77956977035 scopus 로고
    • Stride directed prefetching in scalar processors
    • W. C. Fu, J. H. Patel, and B. L. Janssens. Stride directed prefetching in scalar processors. In MICRO-25, 1992.
    • (1992) MICRO-25
    • Fu, W.C.1    Patel, J.H.2    Janssens, B.L.3
  • 12
    • 70450231944 scopus 로고    scopus 로고
    • An analytical model for a gpu architecture with memorylevel and thread-level parallelism awareness
    • S. Hong and H. Kim. An analytical model for a gpu architecture with memorylevel and thread-level parallelism awareness. In ISCA, 2009.
    • (2009) ISCA
    • Hong, S.1    Kim, H.2
  • 14
    • 2342644731 scopus 로고    scopus 로고
    • Data cache prefetching using a global history buffer
    • K. J.Nesbit and J. E.Smith. Data cache prefetching using a global history buffer. In HPCA-10, 2004.
    • (2004) HPCA-10
    • Nesbit, K.J.1    Smith, J.E.2
  • 15
    • 0025429331 scopus 로고
    • Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
    • N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA-17, 1990.
    • (1990) ISCA-17
    • Jouppi, N.P.1
  • 17
    • 0023586486 scopus 로고
    • Data prefetching in shared memory multiprocessors
    • R. L. Lee, P.-C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In ICPP-16, 1987.
    • (1987) ICPP-16
    • Lee, R.L.1    Yew, P.-C.2    Lawrie, D.H.3
  • 18
    • 68149168035 scopus 로고    scopus 로고
    • Merge: A programming model for heterogeneous multi-core systems
    • M. D. Linderman, J. D. Collins, H.Wang, and T. H.Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS XIII, 2008.
    • (2008) ASPLOS , vol.13
    • Linderman, M.D.1    Collins, J.D.2    Wang, H.3    Meng, T.H.4
  • 19
    • 77954976292 scopus 로고    scopus 로고
    • Dynamic warp subdivision for integrated branch and memory divergence tolerance
    • J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In ISCA-37, 2010.
    • (2010) ISCA-37
    • Meng, J.1    Tarjan, D.2    Skadron, K.3
  • 20
    • 0002031606 scopus 로고
    • Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
    • T.Mowry and A. Gupta. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J. Parallel Distrib. Comput., 12(2):87-106, 1991.
    • (1991) J. Parallel Distrib. Comput. , vol.12 , Issue.2 , pp. 87-106
    • Mowry, T.1    Gupta, A.2
  • 22
    • 79951716394 scopus 로고    scopus 로고
    • NVIDIA. CUDA SDK 3.0. http://developer.download.nvidia.com/object/cuda-3- 1-downloads.html.
    • CUDA SDK 3.0.
  • 26
    • 0028294834 scopus 로고
    • Evaluating stream buffers as a secondary cache replacement
    • S. Palacharla and R. E. Kessler. Evaluating stream buffers as a secondary cache replacement. In ISCA-21, 1994.
    • (1994) ISCA-21
    • Palacharla, S.1    Kessler, R.E.2
  • 27
    • 0029203824 scopus 로고
    • Vector multiprocessors with arbitrated memory access
    • M. Peiron, M. Valero, E. Ayguade, and T. Lang. Vector multiprocessors with arbitrated memory access. In ISCA-22, 1995.
    • (1995) ISCA-22
    • Peiron, M.1    Valero, M.2    Ayguade, E.3    Lang, T.4
  • 30
    • 0027311457 scopus 로고
    • High-bandwidth interleaved memories for vector processors-a simulation study
    • jan
    • G. Sohi. High-bandwidth interleaved memories for vector processors-a simulation study. IEEE Transactions on Computers, 42(1):34-44, jan 1993.
    • (1993) IEEE Transactions on Computers , vol.42 , Issue.1 , pp. 34-44
    • Sohi, G.1
  • 31
    • 34547655822 scopus 로고    scopus 로고
    • Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers
    • S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007.
    • (2007) HPCA-13
    • Srinath, S.1    Mutlu, O.2    Kim, H.3    Patt, Y.N.4
  • 32
    • 74049151553 scopus 로고    scopus 로고
    • Increasing memory miss tolerance for simd cores
    • D. Tarjan, J. Meng, and K. Skadron. Increasing memory miss tolerance for simd cores. In SC, 2009.
    • (2009) SC
    • Tarjan, D.1    Meng, J.2    Skadron, K.3
  • 33
    • 0026867328 scopus 로고
    • A novel cache design for vector processing
    • Q. Yang and L.W. Yang. A novel cache design for vector processing. In ISCA-19, 1992.
    • (1992) ISCA-19
    • Yang, Q.1    Yang, L.W.2
  • 34
    • 77954691442 scopus 로고    scopus 로고
    • A gpgpu compiler for memory optimization and parallelism management
    • Y. Yang, P. Xiang, J. Kong, and H. Zhou. A gpgpu compiler for memory optimization and parallelism management. In PLDI-10, 2010.
    • (2010) PLDI-10
    • Yang, Y.1    Xiang, P.2    Kong, J.3    Zhou, H.4
  • 35
    • 84944748972 scopus 로고    scopus 로고
    • A hardware-based cache pollution filtering mechanism for aggressive prefetches
    • X. Zhuang and H.-H. S. Lee. A hardware-based cache pollution filtering mechanism for aggressive prefetches. In ICPP-32, 2003.
    • (2003) ICPP-32
    • Zhuang, X.1    Lee, H.-H.S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.