-
2
-
-
77956435385
-
Resource-aware compiler prefetching for many-cores
-
G. C. Caragea, A. Tzannes, F. Keceli, R. Barua, and U. Vishkin. Resource-aware compiler prefetching for many-cores. In ISPDC-9, 2010.
-
(2010)
ISPDC-9
-
-
Caragea, G.C.1
Tzannes, A.2
Keceli, F.3
Barua, R.4
Vishkin, U.5
-
3
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC'09, 2009.
-
(2009)
IISWC'09
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Lee, S.-H.6
Skadron, K.7
-
4
-
-
0029308368
-
Effective hardware based data prefetching for highperformance processors
-
T.-F. Chen and J.-L. Baer. Effective hardware based data prefetching for highperformance processors. IEEE Trans. Computers, 44(5):609-623, 1995.
-
(1995)
IEEE Trans. Computers
, vol.44
, Issue.5
, pp. 609-623
-
-
Chen, T.-F.1
Baer, J.-L.2
-
5
-
-
0029341212
-
Sequential hardware prefetching in shared-memory multiprocessors
-
F. Dahlgren, M. Dubois, and P. Stenström. Sequential hardware prefetching in shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 6(7):733-746, 1995.
-
(1995)
IEEE Transactions on Parallel and Distributed Systems
, vol.6
, Issue.7
, pp. 733-746
-
-
Dahlgren, F.1
Dubois, M.2
Stenström, P.3
-
6
-
-
0032640138
-
Minimizing conflicts between vector streams in interleaved memory systems
-
A. Dal Corral and J. Llaberia. Minimizing conflicts between vector streams in interleaved memory systems. IEEE Transactions on Computers, 48(4):449-456, 1999.
-
(1999)
IEEE Transactions on Computers
, vol.48
, Issue.4
, pp. 449-456
-
-
Dal Corral, A.1
Llaberia, J.2
-
7
-
-
78149233155
-
Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems
-
G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark. Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems. In PACT-19, 2010.
-
(2010)
PACT-19
-
-
Diamos, G.1
Kerr, A.2
Yalamanchili, S.3
Clark, N.4
-
8
-
-
76749142994
-
Coordinated control of multiple prefetchers in multi-core systems
-
E. Ebrahimi, O. Mutlu, C. J. Lee, and Y. N. Patt. Coordinated control of multiple prefetchers in multi-core systems. In MICRO-42, 2009.
-
(2009)
MICRO-42
-
-
Ebrahimi, E.1
Mutlu, O.2
Lee, C.J.3
Patt, Y.N.4
-
9
-
-
64949179220
-
Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems
-
E. Ebrahimi, O. Mutlu, and Y. N. Patt. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA-15, 2009.
-
(2009)
HPCA-15
-
-
Ebrahimi, E.1
Mutlu, O.2
Patt, Y.N.3
-
10
-
-
0026157234
-
Data prefetching in multiprocessor vector cache memories
-
J. Fu and J. Patel. Data prefetching in multiprocessor vector cache memories. In ISCA-18, 1991.
-
(1991)
ISCA-18
-
-
Fu, J.1
Patel, J.2
-
11
-
-
77956977035
-
Stride directed prefetching in scalar processors
-
W. C. Fu, J. H. Patel, and B. L. Janssens. Stride directed prefetching in scalar processors. In MICRO-25, 1992.
-
(1992)
MICRO-25
-
-
Fu, W.C.1
Patel, J.H.2
Janssens, B.L.3
-
12
-
-
70450231944
-
An analytical model for a gpu architecture with memorylevel and thread-level parallelism awareness
-
S. Hong and H. Kim. An analytical model for a gpu architecture with memorylevel and thread-level parallelism awareness. In ISCA, 2009.
-
(2009)
ISCA
-
-
Hong, S.1
Kim, H.2
-
13
-
-
8344236686
-
Effective stream-based and execution-based data prefetching
-
S. Iacobovici, L. Spracklen, S. Kadambi, Y. Chou, and S. G. Abraham. Effective stream-based and execution-based data prefetching. In ICS-18, 2004.
-
(2004)
ICS-18
-
-
Iacobovici, S.1
Spracklen, L.2
Kadambi, S.3
Chou, Y.4
Abraham, S.G.5
-
14
-
-
2342644731
-
Data cache prefetching using a global history buffer
-
K. J.Nesbit and J. E.Smith. Data cache prefetching using a global history buffer. In HPCA-10, 2004.
-
(2004)
HPCA-10
-
-
Nesbit, K.J.1
Smith, J.E.2
-
15
-
-
0025429331
-
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
-
N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA-17, 1990.
-
(1990)
ISCA-17
-
-
Jouppi, N.P.1
-
17
-
-
0023586486
-
Data prefetching in shared memory multiprocessors
-
R. L. Lee, P.-C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In ICPP-16, 1987.
-
(1987)
ICPP-16
-
-
Lee, R.L.1
Yew, P.-C.2
Lawrie, D.H.3
-
18
-
-
68149168035
-
Merge: A programming model for heterogeneous multi-core systems
-
M. D. Linderman, J. D. Collins, H.Wang, and T. H.Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS XIII, 2008.
-
(2008)
ASPLOS
, vol.13
-
-
Linderman, M.D.1
Collins, J.D.2
Wang, H.3
Meng, T.H.4
-
19
-
-
77954976292
-
Dynamic warp subdivision for integrated branch and memory divergence tolerance
-
J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In ISCA-37, 2010.
-
(2010)
ISCA-37
-
-
Meng, J.1
Tarjan, D.2
Skadron, K.3
-
20
-
-
0002031606
-
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
-
T.Mowry and A. Gupta. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J. Parallel Distrib. Comput., 12(2):87-106, 1991.
-
(1991)
J. Parallel Distrib. Comput.
, vol.12
, Issue.2
, pp. 87-106
-
-
Mowry, T.1
Gupta, A.2
-
22
-
-
79951716394
-
-
NVIDIA. CUDA SDK 3.0. http://developer.download.nvidia.com/object/cuda-3- 1-downloads.html.
-
CUDA SDK 3.0.
-
-
-
26
-
-
0028294834
-
Evaluating stream buffers as a secondary cache replacement
-
S. Palacharla and R. E. Kessler. Evaluating stream buffers as a secondary cache replacement. In ISCA-21, 1994.
-
(1994)
ISCA-21
-
-
Palacharla, S.1
Kessler, R.E.2
-
28
-
-
43449094719
-
Program optimization space pruning for a multithreaded gpu
-
S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S. Ueng, J. Stratton, and W. Hwu. Program optimization space pruning for a multithreaded gpu. In CGO, 2008.
-
(2008)
CGO
-
-
Ryoo, S.1
Rodrigues, C.2
Stone, S.3
Baghsorkhi, S.4
Ueng, S.5
Stratton, J.6
Hwu, W.7
-
29
-
-
25844437046
-
Power5 system microarchitecture
-
B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. Power5 system microarchitecture. IBM Journal of Research and Development, 49(4-5):505-522, 2005.
-
(2005)
IBM Journal of Research and Development
, vol.49
, Issue.4-5
, pp. 505-522
-
-
Sinharoy, B.1
Kalla, R.N.2
Tendler, J.M.3
Eickemeyer, R.J.4
Joyner, J.B.5
-
30
-
-
0027311457
-
High-bandwidth interleaved memories for vector processors-a simulation study
-
jan
-
G. Sohi. High-bandwidth interleaved memories for vector processors-a simulation study. IEEE Transactions on Computers, 42(1):34-44, jan 1993.
-
(1993)
IEEE Transactions on Computers
, vol.42
, Issue.1
, pp. 34-44
-
-
Sohi, G.1
-
31
-
-
34547655822
-
Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers
-
S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007.
-
(2007)
HPCA-13
-
-
Srinath, S.1
Mutlu, O.2
Kim, H.3
Patt, Y.N.4
-
32
-
-
74049151553
-
Increasing memory miss tolerance for simd cores
-
D. Tarjan, J. Meng, and K. Skadron. Increasing memory miss tolerance for simd cores. In SC, 2009.
-
(2009)
SC
-
-
Tarjan, D.1
Meng, J.2
Skadron, K.3
-
33
-
-
0026867328
-
A novel cache design for vector processing
-
Q. Yang and L.W. Yang. A novel cache design for vector processing. In ISCA-19, 1992.
-
(1992)
ISCA-19
-
-
Yang, Q.1
Yang, L.W.2
-
34
-
-
77954691442
-
A gpgpu compiler for memory optimization and parallelism management
-
Y. Yang, P. Xiang, J. Kong, and H. Zhou. A gpgpu compiler for memory optimization and parallelism management. In PLDI-10, 2010.
-
(2010)
PLDI-10
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Zhou, H.4
-
35
-
-
84944748972
-
A hardware-based cache pollution filtering mechanism for aggressive prefetches
-
X. Zhuang and H.-H. S. Lee. A hardware-based cache pollution filtering mechanism for aggressive prefetches. In ICPP-32, 2003.
-
(2003)
ICPP-32
-
-
Zhuang, X.1
Lee, H.-H.S.2
|