-
6
-
-
0003605996
-
-
Technical Report RNR-94-1007, NASA Ames Research Center
-
D. H. Bailey et al. NAS parallel benchmarks. Technical Report RNR-94-1007, NASA Ames Research Center, 1994.
-
(1994)
NAS Parallel Benchmarks
-
-
Bailey, D.H.1
-
7
-
-
70450245578
-
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors
-
A. Bhattacharjee and M. Martonosi. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In ISCA, 2009.
-
(2009)
ISCA
-
-
Bhattacharjee, A.1
Martonosi, M.2
-
8
-
-
63549095070
-
The PARSEC benchmark suite: Characterization and architectural implications
-
C. Bienia et al. The PARSEC benchmark suite: Characterization and architectural implications. In PACT, 2008.
-
(2008)
PACT
-
-
Bienia, C.1
-
9
-
-
84976783312
-
Implementing remote procedure calls
-
A. D. Birrell and B. J. Nelson. Implementing remote procedure calls. ACM TOCS, 2(1):39-59, 1984.
-
(1984)
ACM TOCS
, vol.2
, Issue.1
, pp. 39-59
-
-
Birrell, A.D.1
Nelson, B.J.2
-
10
-
-
0029191296
-
Cilk: An efficient multithreaded runtime system
-
R. D. Blumofe et al. Cilk: an efficient multithreaded runtime system. In PPoPP, 1995.
-
(1995)
PPoPP
-
-
Blumofe, R.D.1
-
11
-
-
84872973735
-
Reinventing scheduling for multicore systems
-
S. Boyd-Wickizer et al. Reinventing scheduling for multicore systems. In HotOS-XII, 2009.
-
(2009)
HotOS-XII
-
-
Boyd-Wickizer, S.1
-
12
-
-
57549118941
-
The shared-thread multiprocessor
-
J. A. Brown and D. M. Tullsen. The shared-thread multiprocessor. In ICS, 2008.
-
(2008)
ICS
-
-
Brown, J.A.1
Tullsen, D.M.2
-
13
-
-
34547473118
-
Computation spreading: Employing hardware migration to specialize CMP cores on-the-fly
-
K. Chakraborty, P. M. Wells, and G. S. Sohi. Computation spreading: Employing hardware migration to specialize CMP cores on-the-fly. In ASPLOS-XII, 2006.
-
(2006)
ASPLOS-XII
-
-
Chakraborty, K.1
Wells, P.M.2
Sohi, G.S.3
-
14
-
-
0036949391
-
A stateless, content-directed data prefetching mechanism
-
R. Cooksey et al. A stateless, content-directed data prefetching mechanism. In ASPLOS, 2002.
-
(2002)
ASPLOS
-
-
Cooksey, R.1
-
15
-
-
33646144623
-
The OpenMP source code repository
-
A. J. Dorta et al. The OpenMP source code repository. In Euromicro, 2005.
-
(2005)
Euromicro
-
-
Dorta, A.J.1
-
16
-
-
64949179220
-
Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems.
-
E. Ebrahimi et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. HPCA, 2009.
-
(2009)
HPCA
-
-
Ebrahimi, E.1
-
17
-
-
34547423880
-
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
-
M. Gordon et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006.
-
(2006)
ASPLOS
-
-
Gordon, M.1
-
19
-
-
48249118853
-
Amdahl's law in the multicore era
-
M. Hill and M. Marty. Amdahl's law in the multicore era. IEEE Computer, 41(7), 2008.
-
(2008)
IEEE Computer
, vol.41
, pp. 7
-
-
Hill, M.1
Marty, M.2
-
20
-
-
70449669476
-
DDCache: Decoupled and delegable cache data and metadata
-
H. Hossain et al. DDCache: Decoupled and delegable cache data and metadata. In PACT, 2009.
-
(2009)
PACT
-
-
Hossain, H.1
-
23
-
-
0030677583
-
Prefetching using Markov predictors
-
D. Joseph and D. Grunwald. Prefetching using Markov predictors. In ISCA, 1997.
-
(1997)
ISCA
-
-
Joseph, D.1
Grunwald, D.2
-
24
-
-
0025429331
-
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
-
N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA-17, 1990.
-
(1990)
ISCA-17
-
-
Jouppi, N.P.1
-
25
-
-
0036949388
-
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
-
C. Kim et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In ASPLOS, 2002.
-
(2002)
ASPLOS
-
-
Kim, C.1
-
27
-
-
85084163748
-
Using cohort scheduling to enhance server performance
-
J. R. Larus and M. Parkes. Using cohort scheduling to enhance server performance. In USENIX, 2002.
-
(2002)
USENIX
-
-
Larus, J.R.1
Parkes, M.2
-
28
-
-
0030685588
-
The SGI Origin: A ccNUMA highly scalable server
-
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In ISCA, 1997.
-
(1997)
ISCA
-
-
Laudon, J.1
Lenoski, D.2
-
29
-
-
0033705677
-
Push vs. pull: Data movement for linked data structures
-
C. lin Yang and A. R. Lebeck. Push vs. pull: Data movement for linked data structures. In ICS, 2000.
-
(2000)
ICS
-
-
Lin Yang, C.1
Lebeck, A.R.2
-
31
-
-
33947328378
-
Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors
-
T. Morad et al. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. Comp Arch Letters, 2006.
-
(2006)
Comp Arch Letters
-
-
Morad, T.1
-
32
-
-
47349098275
-
MineBench: A benchmark suite for data mining workloads
-
R. Narayanan et al. MineBench: A benchmark suite for data mining workloads. In IISWC, 2006.
-
(2006)
IISWC
-
-
Narayanan, R.1
-
33
-
-
77954988455
-
-
NVIDIA Corporation
-
NVIDIA Corporation. CUDA SDK code samples, 2009.
-
(2009)
CUDA SDK Code Samples
-
-
-
34
-
-
64949187933
-
Adaptive spill-receive for robust high-performance caching in CMPs.
-
M. K. Qureshi. Adaptive spill-receive for robust high-performance caching in CMPs. HPCA, 2009.
-
(2009)
HPCA
-
-
Qureshi, M.K.1
-
35
-
-
70450253535
-
Thread motion: Fine-grained power management for multi-core systems
-
K. K. Rangan et al. Thread motion: Fine-grained power management for multi-core systems. In ISCA, 2009.
-
(2009)
ISCA
-
-
Rangan, K.K.1
-
36
-
-
0030672607
-
The interaction of software prefetching with ILP processors in shared-memory systems
-
P. Ranganathan et al. The interaction of software prefetching with ILP processors in shared-memory systems. ISCA, 1997.
-
(1997)
ISCA
-
-
Ranganathan, P.1
-
37
-
-
70450279104
-
Spatio-temporal memory streaming
-
S. Somogyi et al. Spatio-temporal memory streaming. ISCA, 2009.
-
(2009)
ISCA
-
-
Somogyi, S.1
-
38
-
-
77952284721
-
Fast switching of threads between cores
-
R. Strong et al. Fast switching of threads between cores. SIGOPS Oper. Syst. Rev., 43(2), 2009.
-
(2009)
SIGOPS Oper. Syst. Rev.
, vol.43
, pp. 2
-
-
Strong, R.1
-
41
-
-
67650033098
-
Accelerating critical section execution with asymmetric multi-core architectures
-
M. A. Suleman et al. Accelerating critical section execution with asymmetric multi-core architectures. ASPLOS, 2009.
-
(2009)
ASPLOS
-
-
Suleman, M.A.1
-
44
-
-
84957872108
-
The impact of speeding up critical sections with data prefetching and forwarding
-
P. Trancoso and J. Torrellas. The impact of speeding up critical sections with data prefetching and forwarding. In ICPP, 1996.
-
(1996)
ICPP
-
-
Trancoso, P.1
Torrellas, J.2
-
46
-
-
27544495466
-
Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors
-
M. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA, 2005.
-
(2005)
ISCA
-
-
Zhang, M.1
Asanovic, K.2
|