-
1
-
-
33947715600
-
IPC considered harmful for multiprocessor workloads
-
A. Alameldeen and D. Wood, "IPC considered harmful for multiprocessor workloads," IEEE Micro, vol. 26, no. 4, 2006.
-
(2006)
IEEE Micro
, vol.26
, Issue.4
-
-
Alameldeen, A.1
Wood, D.2
-
2
-
-
34548008288
-
ASR: Adaptive selective replication for cmp caches
-
B. Beckmann, M. Marty, and D. Wood, "ASR: Adaptive selective replication for CMP caches," in Proc. MICRO-39, 2006.
-
(2006)
Proc. MICRO-39
-
-
Beckmann, B.1
Marty, M.2
Wood, D.3
-
3
-
-
21644472427
-
Managing wire delay in large chipmultiprocessor caches
-
B. Beckmann and D. Wood, "Managing wire delay in large chipmultiprocessor caches," in Proc. MICRO-37, 2004.
-
(2004)
Proc. MICRO-37
-
-
Beckmann, B.1
Wood, D.2
-
4
-
-
84887440618
-
Jigsaw: Scalable software-defined caches
-
N. Beckmann and D. Sanchez, "Jigsaw: Scalable Software-Defined Caches," in Proc. PACT-22, 2013.
-
(2013)
Proc. PACT-22
-
-
Beckmann, N.1
Sanchez, D.2
-
5
-
-
84934268669
-
Talus: A simple way to remove cliffs in cache performance
-
N. Beckmann and D. Sanchez, "Talus: A Simple Way to Remove Cliffs in Cache Performance," in Proc. HPCA-21, 2015.
-
(2015)
Proc. HPCA-21
-
-
Beckmann, N.1
Sanchez, D.2
-
6
-
-
49549108733
-
TILE64 processor: A 64-core soc with mesh interconnect
-
S. Bell, B. Edwards, J. Amann et al., "TILE64 processor: A 64-core SoC with mesh interconnect," in Proc. ISSCC, 2008.
-
(2008)
Proc. ISSCC
-
-
Bell, S.1
Edwards, B.2
Amann, J.3
-
7
-
-
85077067285
-
A case for NUMA-aware contention management on multicore systems
-
S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova, "A case for NUMA-aware contention management on multicore systems," in Proc. USENIX ATC, 2011.
-
(2011)
Proc. USENIX ATC
-
-
Blagodurov, S.1
Zhuravlev, S.2
Dashti, M.3
Fedorova, A.4
-
8
-
-
33845903561
-
Cooperative caching for chip multiprocessors
-
J. Chang and G. Sohi, "Cooperative caching for chip multiprocessors," in Proc. ISCA-33, 2006.
-
(2006)
Proc. ISCA-33
-
-
Chang, J.1
Sohi, G.2
-
9
-
-
0033683314
-
Application-specific memory management for embedded systems using software-controlled caches
-
D. Chiou, P. Jain, L. Rudolph, and S. Devadas, "Application-specific memory management for embedded systems using software-controlled caches," in Proc. DAC-37, 2000.
-
(2000)
Proc. DAC-37
-
-
Chiou, D.1
Jain, P.2
Rudolph, L.3
Devadas, S.4
-
10
-
-
27544432313
-
Optimizing replication, communication, and capacity allocation in cmps
-
Z. Chishti, M. Powell, and T. Vijaykumar, "Optimizing replication, communication, and capacity allocation in cmps," in ISCA-32, 2005.
-
(2005)
ISCA-32
-
-
Chishti, Z.1
Powell, M.2
Vijaykumar, T.3
-
11
-
-
40349095122
-
Managing distributed, shared L2 caches through OS-level page allocation
-
S. Cho and L. Jin, "Managing distributed, shared L2 caches through OS-level page allocation," in Proc. MICRO-39, 2006.
-
(2006)
Proc. MICRO-39
-
-
Cho, S.1
Jin, L.2
-
12
-
-
84881160871
-
A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness
-
H. Cook, M. Moreto, S. Bird et al., "A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness," in ISCA-40, 2013.
-
(2013)
ISCA-40
-
-
Cook, H.1
Moreto, M.2
Bird, S.3
-
13
-
-
84906973650
-
GPU Computing: To exascale and beyond
-
W. J. Dally, "GPU Computing: To Exascale and Beyond," in SC Plenary Talk, 2010.
-
(2010)
SC Plenary Talk
-
-
Dally, W.J.1
-
14
-
-
84880278122
-
Application-to-core mapping policies to reduce memory system interference in multi-core systems
-
R. Das, R. Ausavarungnirun, O. Mutlu et al., "Application-to-core mapping policies to reduce memory system interference in multi-core systems," in Proc. HPCA-19, 2013.
-
(2013)
Proc. HPCA-19
-
-
Das, R.1
Ausavarungnirun, R.2
Mutlu, O.3
-
15
-
-
84875650624
-
Traffic management: A holistic approach to memory placement on NUMA systems
-
M. Dashti, A. Fedorova, J. Funston et al., "Traffic management: a holistic approach to memory placement on NUMA systems," in Proc. ASPLOS-18, 2013.
-
(2013)
Proc. ASPLOS-18
-
-
Dashti, M.1
Fedorova, A.2
Funston, J.3
-
16
-
-
84873622276
-
The tail at scale
-
J. Dean and L. Barroso, "The Tail at Scale," CACM, vol. 56, 2013.
-
(2013)
CACM
, vol.56
-
-
Dean, J.1
Barroso, L.2
-
18
-
-
47349085427
-
A framework for providing quality of service in chip multi-processors
-
F. Guo, Y. Solihin, L. Zhao, and R. Iyer, "A framework for providing quality of service in chip multi-processors," in Proc. MICRO-40, 2007.
-
(2007)
Proc. MICRO-40
-
-
Guo, F.1
Solihin, Y.2
Zhao, L.3
Iyer, R.4
-
20
-
-
70350601187
-
Reactive NUCA: Near-optimal block placement and replication in distributed caches
-
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Reactive NUCA: near-optimal block placement and replication in distributed caches," in Proc. ISCA-36, 2009.
-
(2009)
Proc. ISCA-36
-
-
Hardavellas, N.1
Ferdman, M.2
Falsafi, B.3
Ailamaki, A.4
-
21
-
-
79961040286
-
Toward dark silicon in servers
-
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Toward dark silicon in servers," IEEE Micro, vol. 31, no. 4, 2011.
-
(2011)
IEEE Micro
, vol.31
, Issue.4
-
-
Hardavellas, N.1
Ferdman, M.2
Falsafi, B.3
Ailamaki, A.4
-
22
-
-
84863354514
-
Database servers on chip multiprocessors: Limitations and opportunities
-
N. Hardavellas, I. Pandis, R. Johnson, and N. Mancheril, "Database Servers on Chip Multiprocessors: Limitations and Opportunities," in Proc. CIDR, 2007.
-
(2007)
Proc. CIDR
-
-
Hardavellas, N.1
Pandis, I.2
Johnson, R.3
Mancheril, N.4
-
24
-
-
0000800074
-
Geometrical cluster growth models and kinetic gelation
-
H. J. Herrmann, "Geometrical cluster growth models and kinetic gelation," Physics Reports, vol. 136, no. 3, pp. 153-224, 1986.
-
(1986)
Physics Reports
, vol.136
, Issue.3
, pp. 153-224
-
-
Herrmann, H.J.1
-
25
-
-
48249118853
-
Amdahl's law in the multicore era
-
M. D. Hill and M. R. Marty, "Amdahl's Law in the Multicore Era," Computer, vol. 41, no. 7, 2008.
-
(2008)
Computer
, vol.41
, Issue.7
-
-
Hill, M.D.1
Marty, M.R.2
-
26
-
-
84910129119
-
FIESTA: A sample-balanced multi-program workload methodology
-
A. Hilton, N. Eswaran, and A. Roth, "FIESTA: A sample-balanced multi-program workload methodology," in Proc. MoBS, 2009.
-
(2009)
Proc. MoBS
-
-
Hilton, A.1
Eswaran, N.2
Roth, A.3
-
27
-
-
84934313335
-
Knights Landing: Next Generation Intel Xeon Phi
-
Intel, "Knights Landing: Next Generation Intel Xeon Phi," in SC Presentation, 2013.
-
(2013)
SC Presentation
-
-
Intel1
-
28
-
-
34548225417
-
A NUCA substrate for flexible CMP cache sharing
-
J. Jaehyuk Huh, C. Changkyu Kim, H. Shafi et al., "A NUCA substrate for flexible CMP cache sharing," IEEE Trans. Par. Dist. Sys., vol. 18, no. 8, 2007.
-
(2007)
IEEE Trans. Par. Dist. Sys.
, vol.18
, Issue.8
-
-
Jaehyuk Huh, J.1
Changkyu Kim, C.2
Shafi, H.3
-
31
-
-
0032131147
-
A fast and high quality multilevel scheme for partitioning irregular graphs
-
G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for partitioning irregular graphs," SIAM J. Sci. Comput., vol. 20, 1998.
-
(1998)
SIAM J. Sci. Comput.
, vol.20
-
-
Karypis, G.1
Kumar, V.2
-
32
-
-
84897791436
-
Ubik: Efficient cache sharing with strict qos for latency-critical workloads
-
H. Kasture and D. Sanchez, "Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads," in Proc. ASPLOS-19, 2014.
-
(2014)
Proc. ASPLOS-19
-
-
Kasture, H.1
Sanchez, D.2
-
33
-
-
0028445155
-
A comparison of trace-sampling techniques for multi-megabyte caches
-
R. Kessler, M. Hill, and D. Wood, "A comparison of trace-sampling techniques for multi-megabyte caches," IEEE T. Comput., vol. 43, 1994.
-
(1994)
IEEE T. Comput.
, vol.43
-
-
Kessler, R.1
Hill, M.2
Wood, D.3
-
34
-
-
0036949388
-
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
-
C. Kim, D. Burger, and S. Keckler, "An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches," in ASPLOS, 2002.
-
(2002)
ASPLOS
-
-
Kim, C.1
Burger, D.2
Keckler, S.3
-
36
-
-
79955893556
-
CloudCache: Expanding and shrinking private caches
-
H. Lee, S. Cho, and B. R. Childers, "CloudCache: Expanding and shrinking private caches," in Proc. HPCA-17, 2011.
-
(2011)
Proc. HPCA-17
-
-
Lee, H.1
Cho, S.2
Childers, B.R.3
-
37
-
-
76749146060
-
McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures
-
S. Li, J. H. Ahn, R. Strong et al., "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO-42, 2009.
-
(2009)
MICRO-42
-
-
Li, S.1
Ahn, J.H.2
Strong, R.3
-
38
-
-
57749186047
-
Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems
-
J. Lin, Q. Lu, X. Ding et al., "Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems," in Proc. HPCA-14, 2008.
-
(2008)
Proc. HPCA-14
-
-
Lin, J.1
Lu, Q.2
Ding, X.3
-
39
-
-
84934313338
-
Memory system performance in a NUMA multicore multiprocessor
-
Z. Majo and T. R. Gross, "Memory system performance in a NUMA multicore multiprocessor," in Proc. ISMM, 2011.
-
(2011)
Proc. ISMM
-
-
Majo, Z.1
Gross, T.R.2
-
41
-
-
35348900723
-
Virtual hierarchies to support server consolidation
-
M. Marty and M. Hill, "Virtual hierarchies to support server consolidation," in Proc. ISCA-34, 2007.
-
(2007)
Proc. ISCA-34
-
-
Marty, M.1
Hill, M.2
-
42
-
-
77952573440
-
ESP-NUCA: A low-cost adaptive non-uniform cache architecture
-
J. Merino, V. Puente, and J. Gregorio, "ESP-NUCA: A low-cost adaptive non-uniform cache architecture," in Proc. HPCA-16, 2010.
-
(2010)
Proc. HPCA-16
-
-
Merino, J.1
Puente, V.2
Gregorio, J.3
-
44
-
-
70449655189
-
FlexDCP: A QoS framework for CMP architectures
-
M. Moreto, F. J. Cazorla, A. Ramirez et al., "FlexDCP: A QoS framework for CMP architectures," ACM SIGOPS Operating Systems Review, vol. 43, no. 2, 2009.
-
(2009)
ACM SIGOPS Operating Systems Review
, vol.43
, Issue.2
-
-
Moreto, M.1
Cazorla, F.J.2
Ramirez, A.3
-
45
-
-
84934313339
-
A general constraint-centric scheduling framework for spatial architectures
-
T. Nowatzki, M. Tarm, L. Carli et al., "A general constraint-centric scheduling framework for spatial architectures," in Proc. PLDI-34, 2013.
-
(2013)
Proc. PLDI-34
-
-
Nowatzki, T.1
Tarm, M.2
Carli, L.3
-
46
-
-
77954780208
-
The case for RAMClouds: Scalable high-performance storage entirely in DRAM
-
J. Ousterhout, P. Agrawal, D. Erickson et al., "The case for RAMClouds: scalable high-performance storage entirely in DRAM," ACM SIGOPS Operating Systems Review, vol. 43, no. 4, 2010.
-
(2010)
ACM SIGOPS Operating Systems Review
, vol.43
, Issue.4
-
-
Ousterhout, J.1
Agrawal, P.2
Erickson, D.3
-
47
-
-
38549120069
-
Partitioned cache architecture as a side-channel defence mechanism
-
2005/280
-
D. Page, "Partitioned cache architecture as a side-channel defence mechanism," IACR Cryptology ePrint archive, no. 2005/280, 2005.
-
(2005)
IACR Cryptology EPrint Archive
-
-
Page, D.1
-
49
-
-
77954949789
-
Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures
-
J. Park and W. Dally, "Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures," in Proc. SPAA-22, 2010.
-
(2010)
Proc. SPAA-22
-
-
Park, J.1
Dally, W.2
-
50
-
-
0000529292
-
SCOTCH: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs
-
F. Pellegrini and J. Roman, "SCOTCH: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs," in Proc. HPCN, 1996.
-
(1996)
Proc. HPCN
-
-
Pellegrini, F.1
Roman, J.2
-
51
-
-
64949187933
-
Adaptive spill-receive for robust high-performance caching in cmps
-
M. Qureshi, "Adaptive Spill-Receive for Robust High-Performance Caching in CMPs," in Proc. HPCA-10, 2009.
-
(2009)
Proc. HPCA-10
-
-
Qureshi, M.1
-
52
-
-
34548042910
-
Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches
-
M. Qureshi and Y. Patt, "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches," in Proc. MICRO-39, 2006.
-
(2006)
Proc. MICRO-39
-
-
Qureshi, M.1
Patt, Y.2
-
53
-
-
80052521720
-
Vantage: Scalable and efficient fine-grain cache partitioning
-
D. Sanchez and C. Kozyrakis, "Vantage: Scalable and Efficient Fine-Grain Cache Partitioning," in Proc. ISCA-38, 2011.
-
(2011)
Proc. ISCA-38
-
-
Sanchez, D.1
Kozyrakis, C.2
-
54
-
-
84881154274
-
Zsim: Fast and accurate microarchitectural simulation of thousand-core systems
-
D. Sanchez and C. Kozyrakis, "ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems," in ISCA-40, 2013.
-
(2013)
ISCA-40
-
-
Sanchez, D.1
Kozyrakis, C.2
-
55
-
-
0034443570
-
Symbiotic jobscheduling for a simultaneous multithreading processor
-
A. Snavely and D. M. Tullsen, "Symbiotic jobscheduling for a simultaneous multithreading processor," in Proc. ASPLOS-8, 2000.
-
(2000)
Proc. ASPLOS-8
-
-
Snavely, A.1
Tullsen, D.M.2
-
56
-
-
57749176037
-
Managing shared L2 caches on multicore systems in software
-
D. Tam, R. Azimi, L. Soares, and M. Stumm, "Managing shared L2 caches on multicore systems in software," in WIOSCA, 2007.
-
(2007)
WIOSCA
-
-
Tam, D.1
Azimi, R.2
Soares, L.3
Stumm, M.4
-
57
-
-
47249165359
-
Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors
-
D. Tam, R. Azimi, and M. Stumm, "Thread clustering: sharing-aware scheduling on smp-cmp-smt multiprocessors," in Proc. Eurosys, 2007.
-
(2007)
Proc. Eurosys
-
-
Tam, D.1
Azimi, R.2
Stumm, M.3
-
58
-
-
67649661466
-
CACTI 5. 1
-
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi, "CACTI 5. 1," HP Labs, Tech. Rep. HPL-2008-20, 2008.
-
(2008)
HP Labs, Tech. Rep. HPL-2008-20
-
-
Thoziyoor, S.1
Muralimanohar, N.2
Ahn, J.H.3
Jouppi, N.P.4
-
59
-
-
84934313342
-
Asymmetry-aware execution placement on manycore chips
-
A. Tumanov, J. Wise, O. Mutlu, and G. R. Ganger, "Asymmetry-aware execution placement on manycore chips," in SFMA-3, 2013.
-
(2013)
SFMA-3
-
-
Tumanov, A.1
Wise, J.2
Mutlu, O.3
Ganger, G.R.4
-
60
-
-
0001957806
-
Operating system support for improving data locality on CC-NUMA compute servers
-
B. Verghese, S. Devine, A. Gupta, and M. Rosenblum, "Operating system support for improving data locality on CC-NUMA compute servers," in Proc. ASPLOS, 1996.
-
(1996)
Proc. ASPLOS
-
-
Verghese, B.1
Devine, S.2
Gupta, A.3
Rosenblum, M.4
-
62
-
-
80052529677
-
A comparison of capacity management schemes for shared cmp caches
-
C. Wu and M. Martonosi, "A Comparison of Capacity Management Schemes for Shared CMP Caches," in WDDD-7, 2008.
-
(2008)
WDDD-7
-
-
Wu, C.1
Martonosi, M.2
-
63
-
-
27544495466
-
Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors
-
M. Zhang and K. Asanovic, "Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors," in ISCA, 2005.
-
(2005)
ISCA
-
-
Zhang, M.1
Asanovic, K.2
-
64
-
-
77952248898
-
Addressing shared resource contention in multicore processors via scheduling
-
S. Zhuravlev, S. Blagodurov, and A. Fedorova, "Addressing shared resource contention in multicore processors via scheduling," in Proc. ASPLOS, 2010
-
(2010)
Proc. ASPLOS
-
-
Zhuravlev, S.1
Blagodurov, S.2
Fedorova, A.3
|