SCOPUS 정보 검색 플랫폼

Proceedings of the Annual International Symposium on Microarchitecture, MICRO

Volumn , Issue , 2010, Pages 213-224

Many-thread aware prefetching mechanisms for GPGPU applications

(4) Lee, Jaekyu a Lakshminarayana, Nagesh B a Kim, Hyesoon a Vuduc, Richard a

a Georgia Institute of Technology (United States)

Author keywords

GPGPU; Prefetch throttling; Prefetching

Indexed keywords

CPU SYSTEMS; GPGPU; HARDWARE AND SOFTWARE; MEMORY ACCESS; MEMORY LATENCIES; MULTITHREADED; PERFORMANCE BENEFITS; PREFETCH THROTTLING; PREFETCHES; PREFETCHING; THREAD LEVEL PARALLELISM; TRAINING ALGORITHMS;

HARDWARE;

PROGRAM PROCESSORS;

EID: 79951719035 PISSN: 10724451 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/MICRO.2010.44 Document Type: Conference Paper

Times cited : (116)

References (35)

1
- 67650692011
- Parboil benchmark suite. http://impact.crhc.illinois.edu/parboil.php.
- Parboil Benchmark Suite

2
- 77956435385
- Resource-aware compiler prefetching for many-cores
- G. C. Caragea, A. Tzannes, F. Keceli, R. Barua, and U. Vishkin. Resource-aware compiler prefetching for many-cores. In ISPDC-9, 2010.
- (2010) ISPDC-9
- Caragea, G.C.¹ Tzannes, A.² Keceli, F.³ Barua, R.⁴ Vishkin, U.⁵

3
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC'09, 2009.
- (2009) IISWC'09
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Lee, S.-H.⁶ Skadron, K.⁷

4
- 0029308368
- Effective hardware based data prefetching for highperformance processors
- T.-F. Chen and J.-L. Baer. Effective hardware based data prefetching for highperformance processors. IEEE Trans. Computers, 44(5):609-623, 1995.
- (1995) IEEE Trans. Computers , vol.44 , Issue.5 , pp. 609-623
- Chen, T.-F.¹ Baer, J.-L.²

5
- 0029341212
- Sequential hardware prefetching in shared-memory multiprocessors
- F. Dahlgren, M. Dubois, and P. Stenström. Sequential hardware prefetching in shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 6(7):733-746, 1995.
- (1995) IEEE Transactions on Parallel and Distributed Systems , vol.6 , Issue.7 , pp. 733-746
- Dahlgren, F.¹ Dubois, M.² Stenström, P.³

6
- 0032640138
- Minimizing conflicts between vector streams in interleaved memory systems
- A. Dal Corral and J. Llaberia. Minimizing conflicts between vector streams in interleaved memory systems. IEEE Transactions on Computers, 48(4):449-456, 1999.
- (1999) IEEE Transactions on Computers , vol.48 , Issue.4 , pp. 449-456
- Dal Corral, A.¹ Llaberia, J.²

7
- 78149233155
- Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems
- G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark. Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems. In PACT-19, 2010.
- (2010) PACT-19
- Diamos, G.¹ Kerr, A.² Yalamanchili, S.³ Clark, N.⁴

8
- 76749142994
- Coordinated control of multiple prefetchers in multi-core systems
- E. Ebrahimi, O. Mutlu, C. J. Lee, and Y. N. Patt. Coordinated control of multiple prefetchers in multi-core systems. In MICRO-42, 2009.
- (2009) MICRO-42
- Ebrahimi, E.¹ Mutlu, O.² Lee, C.J.³ Patt, Y.N.⁴

9
- 64949179220
- Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems
- E. Ebrahimi, O. Mutlu, and Y. N. Patt. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA-15, 2009.
- (2009) HPCA-15
- Ebrahimi, E.¹ Mutlu, O.² Patt, Y.N.³

10
- 0026157234
- Data prefetching in multiprocessor vector cache memories
- J. Fu and J. Patel. Data prefetching in multiprocessor vector cache memories. In ISCA-18, 1991.
- (1991) ISCA-18
- Fu, J.¹ Patel, J.²

11
- 77956977035
- Stride directed prefetching in scalar processors
- W. C. Fu, J. H. Patel, and B. L. Janssens. Stride directed prefetching in scalar processors. In MICRO-25, 1992.
- (1992) MICRO-25
- Fu, W.C.¹ Patel, J.H.² Janssens, B.L.³

12
- 70450231944
- An analytical model for a gpu architecture with memorylevel and thread-level parallelism awareness
- S. Hong and H. Kim. An analytical model for a gpu architecture with memorylevel and thread-level parallelism awareness. In ISCA, 2009.
- (2009) ISCA
- Hong, S.¹ Kim, H.²

13
- 8344236686
- Effective stream-based and execution-based data prefetching
- S. Iacobovici, L. Spracklen, S. Kadambi, Y. Chou, and S. G. Abraham. Effective stream-based and execution-based data prefetching. In ICS-18, 2004.
- (2004) ICS-18
- Iacobovici, S.¹ Spracklen, L.² Kadambi, S.³ Chou, Y.⁴ Abraham, S.G.⁵

14
- 2342644731
- Data cache prefetching using a global history buffer
- K. J.Nesbit and J. E.Smith. Data cache prefetching using a global history buffer. In HPCA-10, 2004.
- (2004) HPCA-10
- Nesbit, K.J.¹ Smith, J.E.²

15
- 0025429331
- Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
- N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA-17, 1990.
- (1990) ISCA-17
- Jouppi, N.P.¹

16
- 66749189125
- Prefetch-aware dram controllers
- C. J. Lee, O. Mutlu, V. Narasiman, and Y. N. Patt. Prefetch-aware dram controllers. In MICRO-41, 2008.
- (2008) MICRO-41
- Lee, C.J.¹ Mutlu, O.² Narasiman, V.³ Patt, Y.N.⁴

17
- 0023586486
- Data prefetching in shared memory multiprocessors
- R. L. Lee, P.-C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In ICPP-16, 1987.
- (1987) ICPP-16
- Lee, R.L.¹ Yew, P.-C.² Lawrie, D.H.³

18
- 68149168035
- Merge: A programming model for heterogeneous multi-core systems
- M. D. Linderman, J. D. Collins, H.Wang, and T. H.Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS XIII, 2008.
- (2008) ASPLOS , vol.13
- Linderman, M.D.¹ Collins, J.D.² Wang, H.³ Meng, T.H.⁴

19
- 77954976292
- Dynamic warp subdivision for integrated branch and memory divergence tolerance
- J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In ISCA-37, 2010.
- (2010) ISCA-37
- Meng, J.¹ Tarjan, D.² Skadron, K.³

20
- 0002031606
- Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
- T.Mowry and A. Gupta. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J. Parallel Distrib. Comput., 12(2):87-106, 1991.
- (1991) J. Parallel Distrib. Comput. , vol.12 , Issue.2 , pp. 87-106
- Mowry, T.¹ Gupta, A.²

21
- 10444284911
- AC/DC: An adaptive data cache prefetcher
- K. J. Nesbit, A. S. Dhodapkar, and J. E. Smith. AC/DC: An adaptive data cache prefetcher. In PACT-13, 2004.
- (2004) PACT-13
- Nesbit, K.J.¹ Dhodapkar, A.S.² Smith, J.E.³

22
- 79951716394
- NVIDIA. CUDA SDK 3.0. http://developer.download.nvidia.com/object/cuda-3- 1-downloads.html.
- CUDA SDK 3.0.

23
- 77955858732
- NVIDIA. Fermi: Nvidia's next generation cuda compute architecture. http://www.nvidia.com/fermi.
- Fermi: Nvidia's Next Generation Cuda Compute Architecture

24
- 84868177142
- NVIDIA. Geforce 8800 graphics processors. http://www.nvidia.com/page/ geforce-8800.html.
- Geforce 8800 Graphics Processors

25
- 82955212653
- NVIDIA Corporation
- NVIDIA Corporation. CUDA Programming Guide, V3.0.
- CUDA Programming Guide, V3.0

26
- 0028294834
- Evaluating stream buffers as a secondary cache replacement
- S. Palacharla and R. E. Kessler. Evaluating stream buffers as a secondary cache replacement. In ISCA-21, 1994.
- (1994) ISCA-21
- Palacharla, S.¹ Kessler, R.E.²

27
- 0029203824
- Vector multiprocessors with arbitrated memory access
- M. Peiron, M. Valero, E. Ayguade, and T. Lang. Vector multiprocessors with arbitrated memory access. In ISCA-22, 1995.
- (1995) ISCA-22
- Peiron, M.¹ Valero, M.² Ayguade, E.³ Lang, T.⁴

28
- 43449094719
- Program optimization space pruning for a multithreaded gpu
- S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S. Ueng, J. Stratton, and W. Hwu. Program optimization space pruning for a multithreaded gpu. In CGO, 2008.
- (2008) CGO
- Ryoo, S.¹ Rodrigues, C.² Stone, S.³ Baghsorkhi, S.⁴ Ueng, S.⁵ Stratton, J.⁶ Hwu, W.⁷

29
- 25844437046
- Power5 system microarchitecture
- B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. Power5 system microarchitecture. IBM Journal of Research and Development, 49(4-5):505-522, 2005.
- (2005) IBM Journal of Research and Development , vol.49 , Issue.4-5 , pp. 505-522
- Sinharoy, B.¹ Kalla, R.N.² Tendler, J.M.³ Eickemeyer, R.J.⁴ Joyner, J.B.⁵

30
- 0027311457
- High-bandwidth interleaved memories for vector processors-a simulation study
- jan
- G. Sohi. High-bandwidth interleaved memories for vector processors-a simulation study. IEEE Transactions on Computers, 42(1):34-44, jan 1993.
- (1993) IEEE Transactions on Computers , vol.42 , Issue.1 , pp. 34-44
- Sohi, G.¹

31
- 34547655822
- Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers
- S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007.
- (2007) HPCA-13
- Srinath, S.¹ Mutlu, O.² Kim, H.³ Patt, Y.N.⁴

32
- 74049151553
- Increasing memory miss tolerance for simd cores
- D. Tarjan, J. Meng, and K. Skadron. Increasing memory miss tolerance for simd cores. In SC, 2009.
- (2009) SC
- Tarjan, D.¹ Meng, J.² Skadron, K.³

33
- 0026867328
- A novel cache design for vector processing
- Q. Yang and L.W. Yang. A novel cache design for vector processing. In ISCA-19, 1992.
- (1992) ISCA-19
- Yang, Q.¹ Yang, L.W.²

34
- 77954691442
- A gpgpu compiler for memory optimization and parallelism management
- Y. Yang, P. Xiang, J. Kong, and H. Zhou. A gpgpu compiler for memory optimization and parallelism management. In PLDI-10, 2010.
- (2010) PLDI-10
- Yang, Y.¹ Xiang, P.² Kong, J.³ Zhou, H.⁴

35
- 84944748972
- A hardware-based cache pollution filtering mechanism for aggressive prefetches
- X. Zhuang and H.-H. S. Lee. A hardware-based cache pollution filtering mechanism for aggressive prefetches. In ICPP-32, 2003.
- (2003) ICPP-32
- Zhuang, X.¹ Lee, H.-H.S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.