-
2
-
-
0030645124
-
Exploiting hardware performance counters with flow and context sensitive profiling
-
G. Ammons, T. Ball, and J. Larus, "Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling," Proc. ACM SIGPLAN '97 Conf. Programming Language Design and Implementation (PLDI '97), pp. 85-96, 1997.
-
(1997)
Proc. ACM SIGPLAN '97 Conf. Programming Language Design and Implementation (PLDI '97)
, pp. 85-96
-
-
Ammons, G.1
Ball, T.2
Larus, J.3
-
3
-
-
3242744876
-
Ictineo: A tool for research on ILP
-
E. Ayguadé, C. Barrado, A. González, J. Labarta, J. Llosa, D. López, S. Moreno, D. Padua, F. Reig, Q. Riera, and M. Valero, "Ictineo: A Tool for Research on ILP," Proc. Supercomputing '96, 1996.
-
(1996)
Proc. Supercomputing '96
-
-
Ayguadé, E.1
Barrado, C.2
González, A.3
Labarta, J.4
Llosa, J.5
López, D.6
Moreno, S.7
Padua, D.8
Reig, F.9
Riera, Q.10
Valero, M.11
-
4
-
-
1242313972
-
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness
-
Nov.
-
D. F. Bacon, J.-H. Chow, D.-C.R. Ju, K. Muthukumar, and V. Sarkar, "A Compiler Framework for Restructuring Data Declarations to Enhance Cache and TLB Effectiveness," Proc. IBM Centers for Advanced Studies Conf. (CASCON '94), pp. 270-282, Nov. 1994.
-
(1994)
Proc. IBM Centers for Advanced Studies Conf. (CASCON '94)
, pp. 270-282
-
-
Bacon, D.F.1
Chow, J.-H.2
Ju, D.-C.R.3
Muthukumar, K.4
Sarkar, V.5
-
5
-
-
84899747534
-
An efficient solver for cache miss equations
-
N. Bermudo, X. Vera, A. González, and J. Llosa, "An Efficient Solver for Cache Miss Equations," Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software (ISPASS'00), 2000.
-
Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software (ISPASS'00), 2000
-
-
Bermudo, N.1
Vera, X.2
González, A.3
Llosa, J.4
-
6
-
-
0026866013
-
Profile-guided automatic inline expansion for C programs
-
P.P. Chang, S.A. Mahlke, W.Y. Chen, and W.W. Hwu, "Profile-Guided Automatic Inline Expansion for C Programs," Software - Practice and Experience, vol. 25, pp. 249-369, 1992.
-
(1992)
Software - Practice and Experience
, vol.25
, pp. 249-369
-
-
Chang, P.P.1
Mahlke, S.A.2
Chen, W.Y.3
Hwu, W.W.4
-
7
-
-
0034832018
-
Exact analysis of the cache behavior of nested loops
-
S. Chatterjee, E. Parker, P.J. Hanlon, and A.R. Lebeck, "Exact Analysis of the Cache Behavior of Nested Loops," Proc. ACM SIGPLAN '01 Conf. Programming Language Design and Implementation (PLDI '01), pp. 286-297, 2001.
-
(2001)
Proc. ACM SIGPLAN '01 Conf. Programming Language Design and Implementation (PLDI '01)
, pp. 286-297
-
-
Chatterjee, S.1
Parker, E.2
Hanlon, P.J.3
Lebeck, A.R.4
-
9
-
-
0029717349
-
Counting solutions to linear and non-linear constraints through Ehrhart polynomials
-
P. Clauss, "Counting Solutions to Linear and Non-Linear Constraints through Ehrhart Polynomials," Proc. ACM Int'l Conf. Supercomputing (ICS '96), pp. 278-285, 1996.
-
(1996)
Proc. ACM Int'l Conf. Supercomputing (ICS '96)
, pp. 278-285
-
-
Clauss, P.1
-
12
-
-
0004007719
-
Improving effective bandwidth through compiler enhancement of global dynamic cache reuse
-
PhD thesis, Rice Univ.
-
C. Ding, "Improving Effective Bandwidth through Compiler Enhancement of Global Dynamic Cache Reuse," PhD thesis, Rice Univ., 2000.
-
(2000)
-
-
Ding, C.1
-
14
-
-
0028530861
-
The Polaris internal representation
-
Oct.
-
K.A. Faigin, J.P. Hoeflinger, D.A. Padua, P.M. Petersen, and S.A. Weatherford, "The Polaris Internal Representation," Int'l J. Parallel Programming, vol. 22, no. 5, pp. 553-586, Oct. 1994.
-
(1994)
Int'l J. Parallel Programming
, vol.22
, Issue.5
, pp. 553-586
-
-
Faigin, K.A.1
Hoeflinger, J.P.2
Padua, D.A.3
Petersen, P.M.4
Weatherford, S.A.5
-
15
-
-
0001023389
-
Parametric integer programming
-
P. Feautrier, "Parametric Integer Programming," Operations Research, vol. 22, pp. 243-268, 1988.
-
(1988)
Operations Research
, vol.22
, pp. 243-268
-
-
Feautrier, P.1
-
16
-
-
84957027384
-
Automatic parallelization in the polytope model
-
G.R. Perrin and A. Darte, eds.; Springer Verlag
-
P. Feautrier, "Automatic Parallelization in the Polytope Model," The Data Parallel Programming Model, G.R. Perrin and A. Darte, eds., pp. 79-103, Springer Verlag, 1996.
-
(1996)
The Data Parallel Programming Model
, pp. 79-103
-
-
Feautrier, P.1
-
17
-
-
0002461724
-
Applying compiler techniques to cache behavior prediction
-
C. Ferdinand, F. Martin, and R. Wilhelm, "Applying Compiler Techniques to Cache Behavior Prediction," Proc. ACM SIGPLAN Workshop Languages, Compilers, and Tools for Real-Time System (LCTRTS '97), pp. 37-46, 1997.
-
(1997)
Proc. ACM SIGPLAN Workshop Languages, Compilers, and Tools for Real-Time System (LCTRTS '97)
, pp. 37-46
-
-
Ferdinand, C.1
Martin, F.2
Wilhelm, R.3
-
18
-
-
85015240805
-
On estimating and enhancing cache effectiveness
-
J. Ferrante, V. Sarkar, and W. Thrash, "On Estimating and Enhancing Cache Effectiveness," Proc. Fourth Workshop Compilers for Parallel Computers, pp. 328-343, 1991.
-
(1991)
Proc. Fourth Workshop Compilers for Parallel Computers
, pp. 328-343
-
-
Ferrante, J.1
Sarkar, V.2
Thrash, W.3
-
19
-
-
0032089580
-
Modeling set associative caches behavior for irregular computations
-
June
-
B.B. Fraguela, R. Doallo, and E.L. Zapata, "Modeling Set Associative Caches Behavior for Irregular Computations," ACM Performance Evaluation Rev., vol. 26, no. 1, pp. 192-201, June 1998.
-
(1998)
ACM Performance Evaluation Rev.
, vol.26
, Issue.1
, pp. 192-201
-
-
Fraguela, B.B.1
Doallo, R.2
Zapata, E.L.3
-
21
-
-
0001366267
-
Strategies for cache and local memory management by global program transformations
-
D. Gannon, W. Jalby, and K. Gallivan, "Strategies for Cache and Local Memory Management by Global Program Transformations," J. Parallel and Distributed Computing, vol. 5, pp. 587-616, 1988.
-
(1988)
J. Parallel and Distributed Computing
, vol.5
, pp. 587-616
-
-
Gannon, D.1
Jalby, W.2
Gallivan, K.3
-
22
-
-
0001714824
-
Cache miss equations: A compiler framework for analyzing and tuning memory behavior
-
S. Ghosh, M. Martonosi, and S. Malik, "Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Memory Behavior," ACM Trans. Programming Languages and Systems, vol. 21, pp. 4, pp. 703-746, 1999.
-
(1999)
ACM Trans. Programming Languages and Systems
, vol.21
, Issue.4
, pp. 703-746
-
-
Ghosh, S.1
Martonosi, M.2
Malik, S.3
-
23
-
-
0005329615
-
Procedure placement using temporal-ordering information
-
N. Gloy and M.D. Smith, "Procedure Placement Using Temporal-Ordering Information," ACM Trans. Programming Languages and Systems, vol. 21, no. 5, pp. 1028-1075, 1999.
-
(1999)
ACM Trans. Programming Languages and Systems
, vol.21
, Issue.5
, pp. 1028-1075
-
-
Gloy, N.1
Smith, M.D.2
-
24
-
-
0003630067
-
A comparison of locality transformations for iregular codes
-
H. Han and C.-W. Tseng, "A Comparison of Locality Transformations for Iregular Codes," Proc. Fifth Workshop Languages, Compilers, and Run-Time Systems for Scalable Computers, May 2000.
-
Proc. Fifth Workshop Languages, Compilers, and Run-Time Systems for Scalable Computers, May 2000
-
-
Han, H.1
Tseng, C.-W.2
-
25
-
-
0033204190
-
Analytical modeling of set-associative caches
-
Oct.
-
J.S. Harper, D.J. Kerbyson, and G.R. Nudd, "Analytical Modeling of Set-Associative Caches," IEEE Trans. Computers, vol. 48, no. 10, pp. 1009-1024, Oct. 1999.
-
(1999)
IEEE Trans. Computers
, vol.48
, Issue.10
, pp. 1009-1024
-
-
Harper, J.S.1
Kerbyson, D.J.2
Nudd, G.R.3
-
27
-
-
12344315233
-
DineroIII: A uniprocessor cache simulator
-
M. Hill, "DineroIII: A Uniprocessor Cache Simulator," http://www.cs.wisc.edu/~larus/warts.html, 2004.
-
(2004)
-
-
Hill, M.1
-
28
-
-
0032652980
-
Nonlinear array layout for hierarchical memory systems
-
June
-
S.C.V.V. Jain, A.R. Lebeck, S. Mundhra, and M. Thottethodi, "Nonlinear Array Layout for Hierarchical Memory Systems," Proc. ACM Int'l Conf. Supercomputing (ICS '99), pp. 444-453, June 1999.
-
(1999)
Proc. ACM Int'l Conf. Supercomputing (ICS '99)
, pp. 444-453
-
-
Jain, S.C.V.V.1
Lebeck, A.R.2
Mundhra, S.3
Thottethodi, M.4
-
29
-
-
0033077834
-
A linear algebra framework for automatic determination of optimal data layouts
-
Feb.
-
M. Kandemir, A. Choudhary, P. Banerjee, and J. Ramanujam, "A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts," IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 2, pp. 115-135, Feb. 1999.
-
(1999)
IEEE Trans. Parallel and Distributed Systems
, vol.10
, Issue.2
, pp. 115-135
-
-
Kandemir, M.1
Choudhary, A.2
Banerjee, P.3
Ramanujam, J.4
-
30
-
-
84976736383
-
Page placement algorithms for large real-index caches
-
R.E. Kessler and M.D. Hill, "Page Placement Algorithms for Large Real-Index Caches," ACM Trans. Computer Systems, vol. 10, no. 4, pp. 338-359, 1992.
-
(1992)
ACM Trans. Computer Systems
, vol.10
, Issue.4
, pp. 338-359
-
-
Kessler, R.E.1
Hill, M.D.2
-
31
-
-
0030685988
-
Data-centric multi-level blocking
-
I. Kodukul, N. Ahmed, and K. Pingali, "Data-Centric Multi-Level Blocking," Proc. ACM SIGPLAN '97 Conf. Programming Language Design and Implementation (PLDI '97), pp. 346-357, 1997.
-
(1997)
Proc. ACM SIGPLAN '97 Conf. Programming Language Design and Implementation (PLDI '97)
, pp. 346-357
-
-
Kodukul, I.1
Ahmed, N.2
Pingali, K.3
-
32
-
-
0026137116
-
The cache performance and optimizations of blocked algorithms
-
Apr.
-
M.S. Lam, E.E. Rothberg, and M.E. Wolf, "The Cache Performance and Optimizations of Blocked Algorithms," Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), pp. 63-74, Apr. 1991.
-
(1991)
Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91)
, pp. 63-74
-
-
Lam, M.S.1
Rothberg, E.E.2
Wolf, M.E.3
-
34
-
-
84978485471
-
MemSpy: Analyzing memory system bottlenecks in programs
-
M. Martonosi, A. Gupta, and T. Anderson, "MemSpy: Analyzing Memory System Bottlenecks in Programs," Proc. ACM SIGMETRICS '92 Conf. Measurement and Modeling of Computer Systems, pp. 1-12, 1992.
-
(1992)
Proc. ACM SIGMETRICS '92 Conf. Measurement and Modeling of Computer Systems
, pp. 1-12
-
-
Martonosi, M.1
Gupta, A.2
Anderson, T.3
-
35
-
-
3042676705
-
Solving systems of affine (In)equalities: PIP's user's guide
-
The PIP System, "Solving Systems of Affine (In)Equalities: PIP's User's Guide," http://www.prism.uvsq.fr/~paf, 2002.
-
(2002)
-
-
-
36
-
-
3042532547
-
SUIF: An infrastructure for research on parallelizing and optimizing compilers
-
The SUIF Compiler Group, "SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers," http://suif.stanford.edu, 2004.
-
(2004)
-
-
-
38
-
-
0030190854
-
Improving data locality with loop transformations
-
July
-
K. McKinley, S. Carr, and C.-W. Tseng, "Improving Data Locality with Loop Transformations," ACM Trans. Programming Languages and Systems, vol. 18, no. 4, pp. 424-453, July 1996.
-
(1996)
ACM Trans. Programming Languages and Systems
, vol.18
, Issue.4
, pp. 424-453
-
-
McKinley, K.1
Carr, S.2
Tseng, C.-W.3
-
39
-
-
0003665539
-
Quantifying loop nest locality using SPEC '95 and the perfect benchmarks
-
Sept.
-
K.S. McKinley and O. Temam, "Quantifying Loop Nest Locality Using SPEC '95 and the Perfect Benchmarks," ACM Trans. Computer Systems, vol. 17, no. 4, pp. 288-336, Sept. 1999.
-
(1999)
ACM Trans. Computer Systems
, vol.17
, Issue.4
, pp. 288-336
-
-
McKinley, K.S.1
Temam, O.2
-
40
-
-
1542601822
-
Improving memory hierarchy performance for irregular applications using data and computation reorderings
-
J.M. Mellor-Crummey, D.B. Whalley, and K. Kennedy, "Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings," Int'l J. Parallel Programming, vol. 29, no. 3, pp. 217-247, 2001.
-
(2001)
Int'l J. Parallel Programming
, vol.29
, Issue.3
, pp. 217-247
-
-
Mellor-Crummey, J.M.1
Whalley, D.B.2
Kennedy, K.3
-
41
-
-
0003690936
-
Software methods for improvements of cache performance on supercomputer applications
-
PhD thesis, Dept. of Computer Science, Rice Univ., May
-
A.K. Porterfield, "Software Methods for Improvements of Cache Performance on Supercomputer Applications," PhD thesis, Dept. of Computer Science, Rice Univ., May 1989.
-
(1989)
-
-
Porterfield, A.K.1
-
42
-
-
84976676720
-
The omega test: A fast and practical integer programming algorithm for dependence analysis
-
Aug.
-
W. Pugh, "The Omega Test: A Fast and Practical Integer Programming Algorithm for Dependence Analysis," Comm. ACM, vol. 35, no. 8, pp. 102-114, Aug. 1992.
-
(1992)
Comm. ACM
, vol.35
, Issue.8
, pp. 102-114
-
-
Pugh, W.1
-
47
-
-
0028429842
-
Cache interference phenomena
-
O. Temam, C. Fricker, and W. Jalby, "Cache Interference Phenomena," Proc. ACM SIGMETRICS '94 Conf. Measurement and Modeling of Computer Systems, pp. 261-271, 1994.
-
(1994)
Proc. ACM SIGMETRICS '94 Conf. Measurement and Modeling of Computer Systems
, pp. 261-271
-
-
Temam, O.1
Fricker, C.2
Jalby, W.3
-
48
-
-
0027764718
-
To copy or not to copy: A compile-time technique for accessing when data copying should be used to eliminate cache conflicts
-
O. Temam, E. Granston, and W. Jalby, "To Copy or Not to Copy: A Compile-Time Technique for Accessing when Data Copying Should Be Used to Eliminate Cache Conflicts," Proc. Supercomputing '93, pp. 410-419, 1993.
-
(1993)
Proc. Supercomputing '93
, pp. 410-419
-
-
Temam, O.1
Granston, E.2
Jalby, W.3
-
49
-
-
85031661900
-
Characterizing the behavior of sparse algorithms on caches
-
O. Temam and W. Jalby, "Characterizing the Behavior of Sparse Algorithms on Caches," Proc. Supercomputing '92, pp. 578-587, 1992.
-
(1992)
Proc. Supercomputing '92
, pp. 578-587
-
-
Temam, O.1
Jalby, W.2
-
50
-
-
0032304622
-
Optimizing the instruction cache performance of the operating system
-
J. Torrellas, C. Xia, and R.L. Daigle, "Optimizing the Instruction Cache Performance of the Operating System," IEEE Trans. Computers, vol. 47, no. 12, pp. 1363-1381, 1998.
-
(1998)
IEEE Trans. Computers
, vol.47
, Issue.12
, pp. 1363-1381
-
-
Torrellas, J.1
Xia, C.2
Daigle, R.L.3
-
51
-
-
0031153459
-
Trace-driven memory simulation: A survey
-
Sept.
-
R.A. Uhlig, and T.N. Mudge, "Trace-Driven Memory Simulation: A Survey," ACM Computing Surveys, vol. 29, no. 3, pp. 128-170, Sept. 1997.
-
(1997)
ACM Computing Surveys
, vol.29
, Issue.3
, pp. 128-170
-
-
Uhlig, R.A.1
Mudge, T.N.2
-
53
-
-
33646187750
-
A fast and accurate approach to analyze cache memory behavior
-
X. Vera, J. Llosa, A. González, and N. Bermudo, "A Fast and Accurate Approach to Analyze Cache Memory Behavior," Proc. European Conf. Parallel Computing (Europar '00), 2000.
-
Proc. European Conf. Parallel Computing (Europar '00), 2000
-
-
Vera, X.1
Llosa, J.2
González, A.3
Bermudo, N.4
-
55
-
-
0031369396
-
Timing analysis of data caches and set-associative caches
-
R. White, F. Mueller, C. Healy, D. Whalley, and M.G. Harmon, "Timing Analysis of Data Caches and Set-Associative Caches," Proc. Third IEEE Real-Time Technology and Applications Symp. (RTAS '97), June 1997.
-
Proc. Third IEEE Real-Time Technology and Applications Symp. (RTAS '97), June 1997
-
-
White, R.1
Mueller, F.2
Healy, C.3
Whalley, D.4
Harmon, M.G.5
-
56
-
-
0004005802
-
A library for doing polyhedral operations
-
Technical Report 785, Oregon State Univ.
-
D. Wilde, "A Library for Doing Polyhedral Operations," Technical Report 785, Oregon State Univ., 1993.
-
(1993)
-
-
Wilde, D.1
-
58
-
-
0031079360
-
Unimodular transformations of non-perfectly nested loops
-
J. Xue, "Unimodular Transformations of Non-Perfectly Nested Loops," Parallel Computing, vol. 22, no. 12, pp. 1621-1645, 1997.
-
(1997)
Parallel Computing
, vol.22
, Issue.12
, pp. 1621-1645
-
-
Xue, J.1
-
60
-
-
0032315190
-
Reuse-driven tiling for improving data locality
-
J. Xue and C.-H. Huang, "Reuse-Driven Tiling for Improving Data Locality," Int'l J. Parallel Programming, vol. 26, no. 6, pp. 671-696, 1998.
-
(1998)
Int'l J. Parallel Programming
, vol.26
, Issue.6
, pp. 671-696
-
-
Xue, J.1
Huang, C.-H.2
|