-
1
-
-
84900342836
-
Specomp: A new benchmark suite for measuring parallel computer performance
-
London, UK, UK, Springer-Verlag
-
V. Aslot, M. J. Domeika, R. Eigenmann, G. Gaertner, W. B. Jones, and B. Parady. Specomp: A new benchmark suite for measuring parallel computer performance. In Proc. Int'l Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming, WOMPAT '01, pages 1-10, London, UK, UK, 2001. Springer-Verlag.
-
(2001)
Proc. Int'l Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming, WOMPAT '01
, pp. 1-10
-
-
Aslot, V.1
Domeika, M.J.2
Eigenmann, R.3
Gaertner, G.4
Jones, W.B.5
Parady, B.6
-
2
-
-
61449142683
-
Refactoring for data locality
-
Feb.
-
K. Beyls and E. D'Hollander. Refactoring for data locality. Computer, 42(2):62-71, Feb. 2009.
-
(2009)
Computer
, vol.42
, Issue.2
, pp. 62-71
-
-
Beyls, K.1
D'Hollander, E.2
-
4
-
-
34548023929
-
Cooperative cache partitioning for chip multiprocessors
-
New York, NY, USA, ACM
-
J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In Proc. 21st Annual Int'l Conf. Supercomputing, ICS '07, pages 242-252, New York, NY, USA, 2007. ACM.
-
(2007)
Proc. 21st Annual Int'l Conf. Supercomputing, ICS '07
, pp. 242-252
-
-
Chang, J.1
Sohi, G.S.2
-
5
-
-
84944415710
-
Comparing program phase detection techniques
-
Washington, DC, USA, IEEE CS
-
A. S. Dhodapkar and J. E. Smith. Comparing program phase detection techniques. In Proc. 36th annual IEEE/ACM Int'l Symp. Microarchitecture, MICRO 36, pages 217-227, Washington, DC, USA, 2003. IEEE CS.
-
(2003)
Proc. 36th Annual IEEE/ACM Int'l Symp. Microarchitecture, MICRO 36
, pp. 217-227
-
-
Dhodapkar, A.S.1
Smith, J.E.2
-
8
-
-
84867497756
-
Full-system simulation from embedded to high-performance systems
-
R. Leupers and O. Temam, editors, chapter 3, Springer US
-
J. Engblom, D. Aarno, and B. Werner. Full-system simulation from embedded to high-performance systems. In R. Leupers and O. Temam, editors, Processor and System-on-Chip Simulation, chapter 3, pages 25-45. Springer US, 2010.
-
(2010)
Processor and System-on-Chip Simulation
, pp. 25-45
-
-
Engblom, J.1
Aarno, D.2
Werner, B.3
-
9
-
-
33745797169
-
Reuse-distance-based miss-rate prediction on a per instruction basis
-
ACM
-
C. Fang, S. Carr, S. Önder, and Z. Wang.Reuse-distance-based miss-rate prediction on a per instruction basis. In Proc. 2004 Workshop on Memory System Performance, MSP '04, pages 60-68. ACM, 2004.
-
(2004)
Proc. 2004 Workshop on Memory System Performance, MSP '04
, pp. 60-68
-
-
Fang, C.1
Carr, S.2
Önder, S.3
Wang, Z.4
-
10
-
-
33745793237
-
Path-basedreuse distance analysis
-
Compiler Construction, Springer Berlin Heidelberg
-
C. Fang, S. Carr, S. Önder, and Z. Wang. Path-basedreuse distance analysis. In Compiler Construction, Lecture Notes in Computer Science, pages 32-46. Springer Berlin Heidelberg, 2006.
-
(2006)
Lecture Notes in Computer Science
, pp. 32-46
-
-
Fang, C.1
Carr, S.2
Önder, S.3
Wang, Z.4
-
11
-
-
78651391009
-
Quality of service shared cache management in chip multiprocessor architecture
-
Dec.
-
F. Guo, Y. Solihin, L. Zhao, and R. Iyer. Quality of service shared cache management in chip multiprocessor architecture. ACM Trans. Archit. Code Optim., 7(3):14:1-14:33, Dec. 2010.
-
(2010)
ACM Trans. Archit. Code Optim.
, vol.7
, Issue.3
-
-
Guo, F.1
Solihin, Y.2
Zhao, L.3
Iyer, R.4
-
12
-
-
34247143442
-
Communist, utilitarian, and capitalist cache policies on cmps: Caches as a shared resource
-
New York, NY, USA, ACM
-
L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on cmps: caches as a shared resource. In Proc. 15th Int'l Conf. Parallel Architectures and Compilation Techniques, PACT '06, pages 13-22, New York, NY, USA, 2006. ACM.
-
(2006)
Proc. 15th Int'l Conf. Parallel Architectures and Compilation Techniques, PACT '06
, pp. 13-22
-
-
Hsu, L.R.1
Reinhardt, S.K.2
Iyer, R.3
Makineni, S.4
-
13
-
-
33847150616
-
Dynamic program phase detection in distributed shared-memory multiprocessors
-
IEEE CS
-
E. Ipek, J. Martinez, B. De Supinski, S. McKee, and M. Schulz. Dynamic program phase detection in distributed shared-memory multiprocessors. In Proc. 20th Int'l Parallel and Distributed Processing Symp., IPDPS 2006, IPDPS'06, pages 280-290. IEEE CS, 2006.
-
(2006)
Proc. 20th Int'l Parallel and Distributed Processing Symp., IPDPS 2006, IPDPS'06
, pp. 280-290
-
-
Ipek, E.1
Martinez, J.2
De Supinski, B.3
McKee, S.4
Schulz, M.5
-
14
-
-
8344246922
-
Cqos: A framework for enabling qos in shared caches of cmp platforms
-
New York, NY, USA, ACM
-
R. Iyer. Cqos: a framework for enabling qos in shared caches of cmp platforms. In Proc. 18th Annual Int'l Conf. Supercomputing, ICS '04, pages 257-266, New York, NY, USA, 2004. ACM.
-
(2004)
Proc. 18th Annual Int'l Conf. Supercomputing, ICS '04
, pp. 257-266
-
-
Iyer, R.1
-
15
-
-
63549149925
-
Adaptive insertion policies for managing shared caches
-
ACM
-
A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In Proc. 17th Int'l Conf. Parallel Architectures and Compilation Techniques, PACT '08, pages 208-219. ACM, 2008.
-
(2008)
Proc. 17th Int'l Conf. Parallel Architectures and Compilation Techniques, PACT '08
, pp. 208-219
-
-
Jaleel, A.1
Hasenplaugh, W.2
Qureshi, M.3
Sebot, J.4
Steely Jr., S.5
Emer, J.6
-
16
-
-
77951616746
-
Is reuse distance applicable to data locality analysis on chip multiprocessors?
-
Proc. 19th Joint European Conf. on Theory and Practice of Software, Int'l Conference on Compiler Construction, Springer-Verlag
-
Y. Jiang, E. Z. Zhang, K. Tian, and X. Shen. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proc. 19th Joint European Conf. on Theory and Practice of Software, Int'l Conference on Compiler Construction, volume 6011 of CC'10/ETAPS'10, pages 264-282. Springer-Verlag, 2010.
-
(2010)
CC'10/ETAPS'10
, vol.6011
, pp. 264-282
-
-
Jiang, Y.1
Zhang, E.Z.2
Tian, K.3
Shen, X.4
-
17
-
-
10444238444
-
Fair cache sharing and partitioning in a chip multiprocessor architecture
-
Washington, DC, USA, IEEE CS
-
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proc. 13th Int'l Conf. Parallel Architectures and Compilation Techniques, PACT '04, pages 111-122, Washington, DC, USA, 2004. IEEE CS.
-
(2004)
Proc. 13th Int'l Conf. Parallel Architectures and Compilation Techniques, PACT '04
, pp. 111-122
-
-
Kim, S.1
Chandra, D.2
Solihin, Y.3
-
18
-
-
70449652924
-
Soft-olp: Improving hardware cache performance through software-controlled object-level partitioning
-
IEEE CS
-
Q. Lu, J. Lin, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Soft-olp: Improving hardware cache performance through software-controlled object-level partitioning. In Proc. 18th Int'l Conf. Parallel Architectures and Compilation Techniques, pages 246-257. IEEE CS, 2009.
-
(2009)
Proc. 18th Int'l Conf. Parallel Architectures and Compilation Techniques
, pp. 246-257
-
-
Lu, Q.1
Lin, J.2
Ding, X.3
Zhang, Z.4
Zhang, X.5
Sadayappan, P.6
-
21
-
-
33748870886
-
Multifacet's general execution-driven multiprocessor simulator (gems) toolset
-
Nov.
-
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33:92-99, Nov. 2005.
-
(2005)
SIGARCH Comput. Archit. News
, vol.33
, pp. 92-99
-
-
Martin, M.M.K.1
Sorin, D.J.2
Beckmann, B.M.3
Marty, M.R.4
Xu, M.5
Alameldeen, A.R.6
Moore, K.E.7
Hill, M.D.8
Wood, D.A.9
-
22
-
-
0014701246
-
Evaluation techniques for storage hierarchies
-
June
-
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Syst. J., 9:78-117, June 1970.
-
(1970)
IBM Syst. J.
, vol.9
, pp. 78-117
-
-
Mattson, R.L.1
Gecsei, J.2
Slutz, D.R.3
Traiger, I.L.4
-
24
-
-
33847108982
-
Detecting phases in parallel applications on shared memory architectures
-
Apr.
-
E. Perelman, M. Polito, J.-Y. Bouguet, J. Sampson, B. Calder, and C. Dulong. Detecting phases in parallel applications on shared memory architectures. In Proc. 20th Int'l Parallel and Distributed Processing Symp., IPDPS 2006, page 10 pp., Apr. 2006.
-
(2006)
Proc. 20th Int'l Parallel and Distributed Processing Symp., IPDPS 2006
, pp. 10
-
-
Perelman, E.1
Polito, M.2
Bouguet, J.-Y.3
Sampson, J.4
Calder, B.5
Dulong, C.6
-
25
-
-
34548042910
-
Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches
-
IEEE CS
-
M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture, MICRO 39, pages 423-432. IEEE CS, 2006.
-
(2006)
Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture, MICRO 39
, pp. 423-432
-
-
Qureshi, M.K.1
Patt, Y.N.2
-
26
-
-
77954045353
-
Multicore-aware reuse distance analysis
-
Apr.
-
D. Schuff, B. Parsons, and V. Pai. Multicore-aware reuse distance analysis. In Proc. 2010 IEEE Int'l Symp. Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), pages 1 -8, Apr. 2010.
-
(2010)
Proc. 2010 IEEE Int'l Symp. Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW)
, pp. 1-8
-
-
Schuff, D.1
Parsons, B.2
Pai, V.3
-
27
-
-
78149254514
-
Accelerating multicore reuse distance analysis with sampling and parallelization
-
ACM
-
D. L. Schuff, M. Kulkarni, and V. S. Pai. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proc. 19th Int'l Conf. Parallel Architectures and Compilation Techniques, PACT '10, pages 53-64. ACM, 2010.
-
(2010)
Proc. 19th Int'l Conf. Parallel Architectures and Compilation Techniques, PACT '10
, pp. 53-64
-
-
Schuff, D.L.1
Kulkarni, M.2
Pai, V.S.3
-
28
-
-
12844275862
-
Locality phase prediction
-
ACM
-
X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Proc. 11th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ASPLOS-XI, pages 165-176. ACM, 2004.
-
(2004)
Proc. 11th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ASPLOS-XI
, pp. 165-176
-
-
Shen, X.1
Zhong, Y.2
Ding, C.3
-
29
-
-
0038345698
-
Phase tracking and prediction
-
New York, NY, USA, ACM
-
T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In Proc. 30th Annual Int'l Symp. Computer Architecture, ISCA '03, pages 336-349, New York, NY, USA, 2003. ACM.
-
(2003)
Proc. 30th Annual Int'l Symp. Computer Architecture, ISCA '03
, pp. 336-349
-
-
Sherwood, T.1
Sair, S.2
Calder, B.3
-
30
-
-
0026925878
-
Optimal partitioning of cache memory
-
DOI 10.1109/12.165388
-
H. Stone, J. Turek, and J. Wolf. Optimal partitioning of cache memory. IEEE Trans. Computers, 41:1054-1068, 1992. (Pubitemid 23573779)
-
(1992)
IEEE Transactions on Computers
, vol.41
, Issue.9
, pp. 1054-1068
-
-
Stone, H.S.1
Turek, J.2
Wolf, J.L.3
-
31
-
-
1642371317
-
Dynamic partitioning of shared cache memory
-
Apr.
-
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. J. Supercomput., 28:7-26, Apr. 2004.
-
(2004)
J. Supercomput.
, vol.28
, pp. 7-26
-
-
Suh, G.E.1
Rudolph, L.2
Devadas, S.3
-
32
-
-
84860335317
-
Cooperative partitioning: Energy-efficient cache partitioning for high-performance cmps
-
K. Sundararajan, V. Porpodas, T. Jones, N. Topham, and B. Franke. Cooperative partitioning: Energy-efficient cache partitioning for high-performance cmps. In Proc. 18th IEEE Int'l Symp. High Performance Computer Architecture(HPCA), pages 1-12, 2012.
-
(2012)
Proc. 18th IEEE Int'l Symp. High Performance Computer Architecture(HPCA)
, pp. 1-12
-
-
Sundararajan, K.1
Porpodas, V.2
Jones, T.3
Topham, N.4
Franke, B.5
-
35
-
-
84874865302
-
Efficient reuse distance analysis of multicore scaling for loop-based parallel programs
-
Feb.
-
M.-J. Wu and D. Yeung. Efficient reuse distance analysis of multicore scaling for loop-based parallel programs. ACM Trans. Comput. Syst., 31(1):1:1-1:37, Feb. 2013.
-
(2013)
ACM Trans. Comput. Syst.
, vol.31
, Issue.1
-
-
Wu, M.-J.1
Yeung, D.2
-
36
-
-
70450279102
-
Pipp: Promotion/insertion pseudo-partitioning of multi-core shared caches
-
New York, NY, USA, ACM
-
Y. Xie and G. H. Loh. Pipp: promotion/insertion pseudo-partitioning of multi-core shared caches. In Proc. 36th Annual Int'l Symp. on Computer Architecture, ISCA '09, pages 174-183, New York, NY, USA, 2009. ACM.
-
(2009)
Proc. 36th Annual Int'l Symp. On Computer Architecture, ISCA '09
, pp. 174-183
-
-
Xie, Y.1
Loh, G.H.2
-
37
-
-
77749340037
-
Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs?
-
ACM
-
E. Z. Zhang, Y. Jiang, and X. Shen. Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs? In Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, PPoPP '10, pages 203-212. ACM, 2010.
-
(2010)
Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, PPoPP '10
, pp. 203-212
-
-
Zhang, E.Z.1
Jiang, Y.2
Shen, X.3
-
38
-
-
33947360666
-
Miss rate prediction across program inputs and cache configurations
-
DOI 10.1109/TC.2007.50
-
Y. Zhong, S. Dropsho, X. Shen, A. Studer, and C. Ding. Miss Rate Prediction Across Program Inputs and Cache Configurations. IEEE Trans. Computers, 56(3):328-343, Mar. 2007. (Pubitemid 46443330)
-
(2007)
IEEE Transactions on Computers
, vol.56
, Issue.3
, pp. 328-343
-
-
Zhong, Y.1
Dropsho, S.G.2
Shen, X.3
Studer, A.4
Ding, C.5
-
39
-
-
84968739606
-
Miss rate prediction across all program inputs
-
IEEE CS
-
Y. Zhong, S. G. Dropsho, and C. Ding. Miss rate prediction across all program inputs. In Proc. 12th Int'l Conf. Parallel Architectures and Compilation Techniques, PACT '03, pages 79-. IEEE CS, 2003.
-
(2003)
Proc. 12th Int'l Conf. Parallel Architectures and Compilation Techniques, PACT '03
, pp. 79
-
-
Zhong, Y.1
Dropsho, S.G.2
Ding, C.3
-
40
-
-
8344272049
-
Array regrouping and structure splitting using whole-program reference affinity
-
ACM
-
Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proc. ACM SIGPLAN 2004 conf. Programming language design and implementation, PLDI '04, pages 255-266. ACM, 2004.
-
(2004)
Proc. ACM SIGPLAN 2004 Conf. Programming Language Design and Implementation, PLDI '04
, pp. 255-266
-
-
Zhong, Y.1
Orlovich, M.2
Shen, X.3
Ding, C.4
|