-
1
-
-
47249103334
-
Using OS observations to improve performance in multicore systems
-
R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn, "Using OS observations to improve performance in multicore systems," IEEE Micro, vol. 38, no. 3, pp. 54-66, 2008.
-
(2008)
IEEE Micro
, vol.38
, Issue.3
, pp. 54-66
-
-
Knauerhase, R.1
Brett, P.2
Hohlt, B.3
Li, T.4
Hahn, S.5
-
3
-
-
77952248898
-
Addressing shared resource contention in multicore processors via scheduling
-
S. Zhuravlev, S. Blagodurov, and A. Fedorova, "Addressing shared resource contention in multicore processors via scheduling," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2010, pp. 129-142.
-
Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2010
, pp. 129-142
-
-
Zhuravlev, S.1
Blagodurov, S.2
Fedorova, A.3
-
4
-
-
21244474546
-
Predicting inter-thread cache contention on a chip multi-processor architecture
-
D. Chandra, F. Guo, S. Kim, and Y. Solihin, "Predicting inter-thread cache contention on a chip multi-processor architecture,"in Proceedings of the International Symposium on High-Performance Computer Architecture, 2005, pp. 340-351.
-
Proceedings of the International Symposium on High-Performance Computer Architecture, 2005
, pp. 340-351
-
-
Chandra, D.1
Guo, F.2
Kim, S.3
Solihin, Y.4
-
5
-
-
80053993064
-
All-window profiling and composable models of cache sharing
-
X. Xiang, B. Bao, T. Bai, C. Ding, and T. M. Chilimbi, "All-window profiling and composable models of cache sharing,"in Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011, pp. 91-102.
-
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011
, pp. 91-102
-
-
Xiang, X.1
Bao, B.2
Bai, T.3
Ding, C.4
Chilimbi, T.M.5
-
6
-
-
84863053984
-
Linear-time modeling of program working set in shared cache
-
X. Xiang, B. Bao, C. Ding, and Y. Gao, "Linear-time modeling of program working set in shared cache," in Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2011, pp. 350-360.
-
Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2011
, pp. 350-360
-
-
Xiang, X.1
Bao, B.2
Ding, C.3
Gao, Y.4
-
8
-
-
78149254514
-
Accelerating multicore reuse distance analysis with sampling and parallelization
-
D. L. Schuff, M. Kulkarni, and V. S. Pai, "Accelerating multicore reuse distance analysis with sampling and parallelization,"in Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2010, pp. 53-64.
-
Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2010
, pp. 53-64
-
-
Schuff, D.L.1
Kulkarni, M.2
Pai, V.S.3
-
10
-
-
33750304084
-
Discovery of locality-improving refactoring by reuse path analysis
-
Proceedings of HPCC. Springer
-
K. Beyls and E. D'Hollander, "Discovery of locality-improving refactoring by reuse path analysis," in Proceedings of HPCC. Springer. Lecture Notes in Computer Science Vol. 4208, 2006, pp. 220-229.
-
(2006)
Lecture Notes in Computer Science
, vol.4208
, pp. 220-229
-
-
Beyls, K.1
D'Hollander, E.2
-
11
-
-
33646073716
-
Multiple page size modeling and optimization
-
C. Cascaval, E. Duesterwald, P. F. Sweeney, and R. W. Wisniewski, "Multiple page size modeling and optimization,"in Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2005, pp. 339-349.
-
Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2005
, pp. 339-349
-
-
Cascaval, C.1
Duesterwald, E.2
Sweeney, P.F.3
Wisniewski, R.W.4
-
14
-
-
79952932476
-
Fast modeling of shared caches in multicore systems
-
best paper
-
D. Eklov, D. Black-Schaffer, and E. Hagersten, "Fast modeling of shared caches in multicore systems," in Proceedings of the International Conference on High Performance Embedded Architectures and Compilers, 2011, pp. 147-157, best paper.
-
Proceedings of the International Conference on High Performance Embedded Architectures and Compilers, 2011
, pp. 147-157
-
-
Eklov, D.1
Black-Schaffer, D.2
Hagersten, E.3
-
15
-
-
84863467234
-
-
Department of Computer Science, University of Rochester, Tech. Rep. URCS #972, December
-
X. Xiang, B. Bao, and C. Ding, "Program locality sampling in shared cache: A theory and a real-time solution," Department of Computer Science, University of Rochester, Tech. Rep. URCS #972, December 2011.
-
(2011)
Program Locality Sampling in Shared Cache: A Theory and A Real-time Solution
-
-
Xiang, X.1
Bao, B.2
Ding, C.3
-
18
-
-
0014701246
-
Evaluation techniques for storage hierarchies
-
R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger, "Evaluation techniques for storage hierarchies," IBM System Journal, vol. 9, no. 2, pp. 78-117, 1970.
-
(1970)
IBM System Journal
, vol.9
, Issue.2
, pp. 78-117
-
-
Mattson, R.L.1
Gecsei, J.2
Slutz, D.3
Traiger, I.L.4
-
20
-
-
0024903997
-
Evaluating associativity in CPU caches
-
M. D. Hill and A. J. Smith, "Evaluating associativity in CPU caches," IEEE Transactions on Computers, vol. 38, no. 12, pp. 1612-1630, 1989.
-
(1989)
IEEE Transactions on Computers
, vol.38
, Issue.12
, pp. 1612-1630
-
-
Hill, M.D.1
Smith, A.J.2
-
22
-
-
0034826142
-
Analytical cache models with applications to cache partitioning
-
G. E. Suh, S. Devadas, and L. Rudolph, "Analytical cache models with applications to cache partitioning." in Proceedings of the International Conference on Supercomputing, 2001, pp. 1-12.
-
Proceedings of the International Conference on Supercomputing, 2001
, pp. 1-12
-
-
Suh, G.E.1
Devadas, S.2
Rudolph, L.3
-
24
-
-
77951616746
-
Is reuse distance applicable to data locality analysis on chip multiprocessors?
-
Y. Jiang, E. Z. Zhang, K. Tian, and X. Shen, "Is reuse distance applicable to data locality analysis on chip multiprocessors?"in Proceedings of the International Conference on Compiler Construction, 2010, pp. 264-282.
-
Proceedings of the International Conference on Compiler Construction, 2010
, pp. 264-282
-
-
Jiang, Y.1
Zhang, E.Z.2
Tian, K.3
Shen, X.4
-
25
-
-
77749340037
-
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
-
E. Z. Zhang, Y. Jiang, and X. Shen, "Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?" in Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010, pp. 203-212.
-
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010
, pp. 203-212
-
-
Zhang, E.Z.1
Jiang, Y.2
Shen, X.3
-
26
-
-
70349743894
-
Program locality analysis using reuse distance
-
Aug.
-
Y. Zhong, X. Shen, and C. Ding, "Program locality analysis using reuse distance," ACM Transactions on Programming Languages and Systems, vol. 31, no. 6, pp. 1-39, Aug. 2009.
-
(2009)
ACM Transactions on Programming Languages and Systems
, vol.31
, Issue.6
, pp. 1-39
-
-
Zhong, Y.1
Shen, X.2
Ding, C.3
-
28
-
-
78349251674
-
Comparing scalability prediction strategies on an SMP of CMPs
-
K. Singh, M. Curtis-Maury, S. A. McKee, F. Blagojevic, D. S. Nikolopoulos, B. R. de Supinski, and M. Schulz, "Comparing scalability prediction strategies on an SMP of CMPs," in Proceedings of the Euro-Par Conference, 2010, pp. 143-155.
-
Proceedings of the Euro-Par Conference, 2010
, pp. 143-155
-
-
Singh, K.1
Curtis-Maury, M.2
McKee, S.A.3
Blagojevic, F.4
Nikolopoulos, D.S.5
De Supinski, B.R.6
Schulz, M.7
-
29
-
-
77954052050
-
Analyzing the trade-off between multiple memory controllers and memory channels on multi-core processor performance
-
J. C. Sancho, M. Lang, and D. J. Kerbyson, "Analyzing the trade-off between multiple memory controllers and memory channels on multi-core processor performance," in Proceedings of the LSPP Workshop, 2010, pp. 1-7.
-
Proceedings of the LSPP Workshop, 2010
, pp. 1-7
-
-
Sancho, J.C.1
Lang, M.2
Kerbyson, D.J.3
-
30
-
-
56349121765
-
A prediction based CMP cache migration policy
-
S. Hao, Z. Du, D. A. Bader, and M. Wang, "A prediction based CMP cache migration policy," in Proceedings of the IEEE International Conference on High Performance Computing and Communications, 2008, pp. 374-381.
-
Proceedings of the IEEE International Conference on High Performance Computing and Communications, 2008
, pp. 374-381
-
-
Hao, S.1
Du, Z.2
Bader, D.A.3
Wang, M.4
-
31
-
-
48849084701
-
Open - SpeedShop: An open source infrastructure for parallel performance analysis
-
M. Schulz, J. Galarowicz, D. Maghrak, W. Hachfeld, D. Montoya, and S. Cranford, "Open - SpeedShop: An open source infrastructure for parallel performance analysis," Scientific Programming, vol. 16, no. 2-3, pp. 105-121, 2008.
-
(2008)
Scientific Programming
, vol.16
, Issue.2-3
, pp. 105-121
-
-
Schulz, M.1
Galarowicz, J.2
Maghrak, D.3
Hachfeld, W.4
Montoya, D.5
Cranford, S.6
-
32
-
-
0036679608
-
HPCView: A tool for top-down analysis of node performance
-
J. Mellor-Crummey, R. Fowler, G. Marin, and N. Tallent, "HPCView: A tool for top-down analysis of node performance,"Journal of Supercomputing, pp. 81-104, 2002.
-
(2002)
Journal of Supercomputing
, pp. 81-104
-
-
Mellor-Crummey, J.1
Fowler, R.2
Marin, G.3
Tallent, N.4
-
33
-
-
77950611743
-
HPCToolkit: Tools for performance analysis of optimized parallel programs
-
L. Adhianto, S. Banerjee, M. W. Fagan, M. Krentel, G. Marin, J. M. Mellor-Crummey, and N. R. Tallent, "HPCToolkit: tools for performance analysis of optimized parallel programs,"Concurrency and Computation: Practice and Experience, vol. 22, no. 6, pp. 685-701, 2010.
-
(2010)
Concurrency and Computation: Practice and Experience
, vol.22
, Issue.6
, pp. 685-701
-
-
Adhianto, L.1
Banerjee, S.2
Fagan, M.W.3
Krentel, M.4
Marin, G.5
Mellor-Crummey, J.M.6
Tallent, N.R.7
-
34
-
-
63549085110
-
Analysis and approximation of optimal co-scheduling on chip multiprocessors
-
Y. Jiang, X. Shen, J. Chen, and R. Tripathi, "Analysis and approximation of optimal co-scheduling on chip multiprocessors,"in Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2008, pp. 220-229.
-
Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2008
, pp. 220-229
-
-
Jiang, Y.1
Shen, X.2
Chen, J.3
Tripathi, R.4
-
35
-
-
34548285855
-
Locality approximation using time
-
X. Shen, J. Shaw, B. Meeker, and C. Ding, "Locality approximation using time," in Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2007, pp. 55-61.
-
Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2007
, pp. 55-61
-
-
Shen, X.1
Shaw, J.2
Meeker, B.3
Ding, C.4
-
38
-
-
67650796123
-
RapidMRC: Approximating L2 miss rate curves on commodity systems for online optimizations
-
D. K. Tam, R. Azimi, L. Soares, and M. Stumm, "RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2009, pp. 121-132.
-
Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2009
, pp. 121-132
-
-
Tam, D.K.1
Azimi, R.2
Soares, L.3
Stumm, M.4
|