-
2
-
-
70350767611
-
-
R. Schreiber and J. Dongarra, Automatic blocking of nested loops, RIACS, NASA Ames Research Center, Tech. Rep. 90.38, Aug 1990.
-
R. Schreiber and J. Dongarra, "Automatic blocking of nested loops," RIACS, NASA Ames Research Center, Tech. Rep. 90.38, Aug 1990.
-
-
-
-
3
-
-
35449000510
-
A data locality optimizing algorithm (with retrospective)
-
M. S. Lam and M. E. Wolf, "A data locality optimizing algorithm (with retrospective)," in Best of PLDI, 1991, pp. 442-459.
-
(1991)
Best of PLDI
, pp. 442-459
-
-
Lam, M.S.1
Wolf, M.E.2
-
5
-
-
0029235623
-
Hierarchical tiling for improved superscalar performance
-
Washington, DC, USA: IEEE Computer Society
-
L. Carter, J. Ferrante, and S. F. Hummel, "Hierarchical tiling for improved superscalar performance," in IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing. Washington, DC, USA: IEEE Computer Society, 1995, pp. 239-245.
-
(1995)
IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing
, pp. 239-245
-
-
Carter, L.1
Ferrante, J.2
Hummel, S.F.3
-
6
-
-
20744459570
-
Is search really necessary to generate high-performance BLAS?
-
K. Yotov, X. Li, G. Ren, M. J. S. Garzaran, D. Padua, K. Pingali, and P. Stodghill, "Is search really necessary to generate high-performance BLAS?" Proceedings of the IEEE, vol. 93, pp. 358-386, 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, pp. 358-386
-
-
Yotov, K.1
Li, X.2
Ren, G.3
Garzaran, M.J.S.4
Padua, D.5
Pingali, K.6
Stodghill, P.7
-
7
-
-
34548752231
-
Towards optimal multi-level tiling for stencil computations
-
L. Renganarayanan, M. Harthi-kote, R. Dewri, and S. Rajopadhye, "Towards optimal multi-level tiling for stencil computations," in 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS) (to appear), 2007.
-
(2007)
21st IEEE International Parallel and Distributed Processing Symposium (IPDPS) (to appear)
-
-
Renganarayanan, L.1
Harthi-kote, M.2
Dewri, R.3
Rajopadhye, S.4
-
8
-
-
79959456077
-
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
-
New York, NY, USA: ACM
-
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories," in PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. New York, NY, USA: ACM, 2008, pp. 1-10.
-
(2008)
PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
, pp. 1-10
-
-
Baskaran, M.M.1
Bondhugula, U.2
Krishnamoorthy, S.3
Ramanujam, J.4
Rountev, A.5
Sadayappan, P.6
-
9
-
-
0029373981
-
Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors
-
A. Agarwal, D. A. Kranz, and V. Natarajan, "Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors," IEEE Trans. Parallel Distrib. Syst., vol. 6, no. 9, pp. 943-962, 1995.
-
(1995)
IEEE Trans. Parallel Distrib. Syst
, vol.6
, Issue.9
, pp. 943-962
-
-
Agarwal, A.1
Kranz, D.A.2
Natarajan, V.3
-
10
-
-
0028482686
-
(pen)-ultimate tiling?
-
P. Boulet, A. Darte, T. Risset, and Y. Robert, "(pen)-ultimate tiling?" Integr. VLSI J., vol. 17, no. 1, pp. 33-51, 1994.
-
(1994)
Integr. VLSI J
, vol.17
, Issue.1
, pp. 33-51
-
-
Boulet, P.1
Darte, A.2
Risset, T.3
Robert, Y.4
-
11
-
-
0029218339
-
Precise tiling for uniform loop nests
-
Washington, DC, USA: IEEE Computer Society
-
P.-Y. Calland and T. Risset, "Precise tiling for uniform loop nests," in ASAP '95: Proceedings of the IEEE International Conference on Application Specific Array Processors. Washington, DC, USA: IEEE Computer Society, 1995, p. 330.
-
(1995)
ASAP '95: Proceedings of the IEEE International Conference on Application Specific Array Processors
, pp. 330
-
-
Calland, P.-Y.1
Risset, T.2
-
12
-
-
0032069399
-
On supernode transformation with minimized total running time
-
E. Hodzic and W. Shang, "On supernode transformation with minimized total running time," IEEE Trans. Parallel Distrib. Syst., vol. 9, no. 5, pp. 417-428, 1998.
-
(1998)
IEEE Trans. Parallel Distrib. Syst
, vol.9
, Issue.5
, pp. 417-428
-
-
Hodzic, E.1
Shang, W.2
-
13
-
-
0029181784
-
Optimal tile size adjustment in compiling general DOACROSS loop nests
-
New York, NY, USA: ACM Press
-
H. Ohta, Y. Saito, M. Kainaga, and H. Ono, "Optimal tile size adjustment in compiling general DOACROSS loop nests," in ICS '95: Proceedings of the 9th international conference on Supercomputing. New York, NY, USA: ACM Press, 1995, pp. 270-279.
-
(1995)
ICS '95: Proceedings of the 9th international conference on Supercomputing
, pp. 270-279
-
-
Ohta, H.1
Saito, Y.2
Kainaga, M.3
Ono, H.4
-
14
-
-
38249009019
-
Tiling multidimensional itertion spaces for multicomputers
-
J. Ramanujam and P. Sadayappan, "Tiling multidimensional itertion spaces for multicomputers." J. Parallel Distrib. Comput., vol. 16, no. 2, pp. 108-120, 1992.
-
(1992)
J. Parallel Distrib. Comput
, vol.16
, Issue.2
, pp. 108-120
-
-
Ramanujam, J.1
Sadayappan, P.2
-
15
-
-
0142134964
-
Optimal semioblique tiling
-
R. Andonov, S. Balev, S. V. Rajopadhye, and N. Yanev, "Optimal semioblique tiling." IEEE Trans. Parallel Distrib. Syst., vol. 14, no. 9, pp. 944-960, 2003.
-
(2003)
IEEE Trans. Parallel Distrib. Syst
, vol.14
, Issue.9
, pp. 944-960
-
-
Andonov, R.1
Balev, S.2
Rajopadhye, S.V.3
Yanev, N.4
-
16
-
-
0003125942
-
Communication-minimal tiling of uniform dependence loops
-
J. Xue, "Communication-minimal tiling of uniform dependence loops," J. Parallel Distrib. Comput., vol. 42, no. 1, pp. 42-59, 1997.
-
(1997)
J. Parallel Distrib. Comput
, vol.42
, Issue.1
, pp. 42-59
-
-
Xue, J.1
-
17
-
-
0036601528
-
Time-minimal tiling when rise is larger than zero
-
J. Xue and W. Cai, "Time-minimal tiling when rise is larger than zero," Parallel Comput., vol. 28, no. 6, pp. 915-939, 2002.
-
(2002)
Parallel Comput
, vol.28
, Issue.6
, pp. 915-939
-
-
Xue, J.1
Cai, W.2
-
19
-
-
1242291539
-
-
CS Dept, Rutgers University, Tech. Rep. DCS-TR-401, Oct
-
C. Hsu and U. Kremer, "Tile selection algorithms and their performance models," CS Dept., Rutgers University, Tech. Rep. DCS-TR-401, Oct. 1999.
-
(1999)
Tile selection algorithms and their performance models
-
-
Hsu, C.1
Kremer, U.2
-
22
-
-
0036565622
-
Automatic partitioning of parallel loops with parallelepiped-shaped tiles
-
F. Rastello and Y. Robert, "Automatic partitioning of parallel loops with parallelepiped-shaped tiles," IEEE Trans. Parallel Distrib. Syst., vol. 13, no. 5, pp. 460-470, 2002.
-
(2002)
IEEE Trans. Parallel Distrib. Syst
, vol.13
, Issue.5
, pp. 460-470
-
-
Rastello, F.1
Robert, Y.2
-
23
-
-
70449690852
-
Optimal tile size selection guided by analytical models
-
B. B. Fraguela, M. G. Carmueja, and D. Andrade, "Optimal tile size selection guided by analytical models." in PARCO, 2005, pp. 565-572.
-
(2005)
PARCO
, pp. 565-572
-
-
Fraguela, B.B.1
Carmueja, M.G.2
Andrade, D.3
-
25
-
-
0038895757
-
Register tiling in nonrectangular iteration spaces
-
M. Jiménez, J. M. Llabería, and A. Fernández, "Register tiling in nonrectangular iteration spaces." ACM Trans. Program. Lang. Syst., vol. 24, no. 4, pp. 409-453, 2002.
-
(2002)
ACM Trans. Program. Lang. Syst
, vol.24
, Issue.4
, pp. 409-453
-
-
Jiménez, M.1
Llabería, J.M.2
Fernández, A.3
-
26
-
-
0026137116
-
The cache performance and optimizations of blocked algorithms
-
ACM Press
-
M. D. Lam, E. E. Rothberg, and M. E. Wolf, "The cache performance and optimizations of blocked algorithms," in Proceedings of the fourth international conference on Architectural support for programming languages and operating systems. ACM Press, 1991, pp. 63-74.
-
(1991)
Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
, pp. 63-74
-
-
Lam, M.D.1
Rothberg, E.E.2
Wolf, M.E.3
-
27
-
-
33748307622
-
An analytical model for loop tiling and its solution
-
V. Sarkar and N. Megiddo, "An analytical model for loop tiling and its solution," in Proceedings of ISPASS, 2000.
-
(2000)
Proceedings of ISPASS
-
-
Sarkar, V.1
Megiddo, N.2
-
28
-
-
0032308685
-
Quantifying the multi-level nature of tiling interactions
-
N. Mitchell, N. Hogstedt, L. Carter, and J. Ferrante, "Quantifying the multi-level nature of tiling interactions," International Journal of Parallel Programming, vol. 26, no. 6, pp. 641-670, 1998.
-
(1998)
International Journal of Parallel Programming
, vol.26
, Issue.6
, pp. 641-670
-
-
Mitchell, N.1
Hogstedt, N.2
Carter, L.3
Ferrante, J.4
-
30
-
-
0030661485
-
Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology
-
ACM Press
-
J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel, "Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology," in Proceedings of the 11th international conference on Supercomputing. ACM Press, 1997, pp. 340-347.
-
(1997)
Proceedings of the 11th international conference on Supercomputing
, pp. 340-347
-
-
Bilmes, J.1
Asanovic, K.2
Chin, C.-W.3
Demmel, J.4
-
31
-
-
0034512401
-
-
T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle, Combined selection of tile sizes and unroll factors using iterative compilation, in PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques. Washington, DC, USA: IEEE Computer Society, 2000, p. 237.
-
T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle, "Combined selection of tile sizes and unroll factors using iterative compilation," in PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques. Washington, DC, USA: IEEE Computer Society, 2000, p. 237.
-
-
-
-
32
-
-
33646828918
-
Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy
-
Washington, DC, USA: IEEE Computer Society
-
C. Chen, J. Chame, and M. Hall, "Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy," in CGO '05: Proceedings of the international symposium on Code generation and optimization. Washington, DC, USA: IEEE Computer Society, 2005, pp. 111-122.
-
(2005)
CGO '05: Proceedings of the international symposium on Code generation and optimization
, pp. 111-122
-
-
Chen, C.1
Chame, J.2
Hall, M.3
-
33
-
-
32844469358
-
Think globally, search locally
-
New York, NY, USA: ACM
-
K. Yotov, K. Pingali, and P. Stodghill, "Think globally, search locally," in ICS '05: Proceedings of the 19th annual international conference on Supercomputing. New York, NY, USA: ACM, 2005, pp. 141-150.
-
(2005)
ICS '05: Proceedings of the 19th annual international conference on Supercomputing
, pp. 141-150
-
-
Yotov, K.1
Pingali, K.2
Stodghill, P.3
-
34
-
-
0442295621
-
The effect of cache models on iterative compilation for combined tiling and unrolling: Research articles
-
P. M.W. Knijnenburg, T. Kisuki, K. Gallivan, and M. F. P. O'Boyle, "The effect of cache models on iterative compilation for combined tiling and unrolling: Research articles," Concurr. Comput. : Pract. Exper., vol. 16, no. 2-3, pp. 247-270, 2004.
-
(2004)
Concurr. Comput. : Pract. Exper
, vol.16
, Issue.2-3
, pp. 247-270
-
-
Knijnenburg, P.M.W.1
Kisuki, T.2
Gallivan, K.3
O'Boyle, M.F.P.4
-
35
-
-
0000214041
-
Optimal orthogonal tiling of 2-D iterations
-
September
-
R. Andonov and S. Rajopadhye, "Optimal orthogonal tiling of 2-D iterations," Journal of Parallel and Distributed Computing, vol. 45, no. 2, pp. 159-165, September 1997.
-
(1997)
Journal of Parallel and Distributed Computing
, vol.45
, Issue.2
, pp. 159-165
-
-
Andonov, R.1
Rajopadhye, S.2
-
36
-
-
0043048462
-
An infeasible interior-point algorithm for solving primal and dual geometric programs
-
K. O. Kortanek, X. Xu, and Y. Ye, "An infeasible interior-point algorithm for solving primal and dual geometric programs," Math. Program., vol. 76, no. 1, pp. 155-181, 1997.
-
(1997)
Math. Program
, vol.76
, Issue.1
, pp. 155-181
-
-
Kortanek, K.O.1
Xu, X.2
Ye, Y.3
-
37
-
-
0004055894
-
-
Online version available at:, 2004
-
S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press. (Online version available at: http://www.stanford.edu/ ̃boyd/cvxbook.html), 2004.
-
Convex Optimization
-
-
Boyd, S.1
Vandenberghe, L.2
-
39
-
-
33745784791
-
A tutorial on Geometric Programming
-
S. Boyd, S. J. Kim, L. Vandenberghe, and A. Hassibi, "A tutorial on Geometric Programming," To appear in Optimization and Engineering, 2006.
-
(2006)
To appear in Optimization and Engineering
-
-
Boyd, S.1
Kim, S.J.2
Vandenberghe, L.3
Hassibi, A.4
-
40
-
-
1242352552
-
A quantitative analysis of tile size selection algorithms
-
C. hsing Hsu and U. Kremer, "A quantitative analysis of tile size selection algorithms," J. Supercomput., vol. 27, no. 3, pp. 279-294, 2004.
-
(2004)
J. Supercomput
, vol.27
, Issue.3
, pp. 279-294
-
-
hsing Hsu, C.1
Kremer, U.2
-
41
-
-
0003455775
-
Improving data locality for caches,
-
Master's thesis, Rice University, September
-
K. Esseghir, "Improving data locality for caches," Master's thesis, Rice University, September 1993.
-
(1993)
-
-
Esseghir, K.1
-
45
-
-
0031140581
-
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
-
V. Sarkar, "Automatic selection of high-order transformations in the IBM XL FORTRAN compilers," IBM J. Res. Dev., vol. 41, no. 3, pp. 233-264, 1997.
-
(1997)
IBM J. Res. Dev
, vol.41
, Issue.3
, pp. 233-264
-
-
Sarkar, V.1
-
46
-
-
84934300040
-
A geometric programming framework for optimal multi-level tiling
-
Washington, DC, USA: IEEE Computer Society
-
L. Renganarayana and S. Rajopadhye, "A geometric programming framework for optimal multi-level tiling," in SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing. Washington, DC, USA: IEEE Computer Society, 2004, p. 18.
-
(2004)
SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing
, pp. 18
-
-
Renganarayana, L.1
Rajopadhye, S.2
-
48
-
-
0037962984
-
On the parallel execution time of tiled loops
-
K. Hogstedt, L. Carter, and J. Ferrante, "On the parallel execution time of tiled loops," IEEE Trans. Parallel Distrib. Syst., vol. 14, no. 3, pp. 307-321, 2003.
-
(2003)
IEEE Trans. Parallel Distrib. Syst
, vol.14
, Issue.3
, pp. 307-321
-
-
Hogstedt, K.1
Carter, L.2
Ferrante, J.3
-
49
-
-
0025447908
-
Improving register allocation for subscripted variables
-
New York, NY, USA: ACM Press
-
D. Callahan, S. Carr, and K. Kennedy, "Improving register allocation for subscripted variables," in PLDI '90: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation. New York, NY, USA: ACM Press, 1990, pp. 53-65.
-
(1990)
PLDI '90: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
, pp. 53-65
-
-
Callahan, D.1
Carr, S.2
Kennedy, K.3
-
50
-
-
85015240805
-
On estimating and enhancing cache effectiveness
-
Fourth International Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds, Springer Verlag, August
-
J. Ferrante, V. Sarkar, and W. Thrash, "On estimating and enhancing cache effectiveness," in Fourth International Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds. Lecture Notes on Computer Science 589, Springer Verlag, August 1991, pp. 328-343.
-
(1991)
Lecture Notes on Computer Science
, vol.589
, pp. 328-343
-
-
Ferrante, J.1
Sarkar, V.2
Thrash, W.3
-
51
-
-
19044386208
-
An updated set of basic linear algebra subprograms (BLAS)
-
"An updated set of basic linear algebra subprograms (BLAS)," ACM Trans. Math. Softw., vol. 28, no. 2, pp. 135-151, 2002.
-
(2002)
ACM Trans. Math. Softw
, vol.28
, Issue.2
, pp. 135-151
-
-
-
52
-
-
20344396845
-
YALMIP : A toolbox for modeling and optimization in MATLAB
-
Taipei, Taiwan, available from
-
J. Löfberg, "YALMIP : A toolbox for modeling and optimization in MATLAB," in Proceedings of the CACSD Conference, Taipei, Taiwan, 2004, available from http://control.ee.ethz.ch/̃joloef/yalmip.php.
-
(2004)
Proceedings of the CACSD Conference
-
-
Löfberg, J.1
-
53
-
-
74049164978
-
A practical and fully automatic polyhedral program optimization system
-
Jun
-
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, "A practical and fully automatic polyhedral program optimization system," in ACM SIGPLAN PLDI, Jun. 2008.
-
(2008)
ACM SIGPLAN PLDI
-
-
Bondhugula, U.1
Hartono, A.2
Ramanujam, J.3
Sadayappan, P.4
-
54
-
-
0029717349
-
Counting solutions to linear and nonlinear constraints through ehrhart polynomials: Applications to analyze and transform scientific programs
-
ACM Press
-
P. Clauss, "Counting solutions to linear and nonlinear constraints through ehrhart polynomials: applications to analyze and transform scientific programs," in Proceedings of the 10th international conference on Supercomputing. ACM Press, 1996, pp. 278-285.
-
(1996)
Proceedings of the 10th international conference on Supercomputing
, pp. 278-285
-
-
Clauss, P.1
-
55
-
-
0001714824
-
Cache miss equations: A compiler framework for analyzing and tuning memory behavior
-
S. Ghosh, M. Martonosi, and S. Malik, "Cache miss equations: a compiler framework for analyzing and tuning memory behavior," ACM Trans. Program. Lang. Syst., vol. 21, no. 4, pp. 703-746, 1999.
-
(1999)
ACM Trans. Program. Lang. Syst
, vol.21
, Issue.4
, pp. 703-746
-
-
Ghosh, S.1
Martonosi, M.2
Malik, S.3
-
56
-
-
0034832018
-
Exact analysis of the cache behavior of nested loops
-
ACM Press
-
S. Chatterjee, E. Parker, P. J. Hanlon, and A. R. Lebeck, "Exact analysis of the cache behavior of nested loops," in Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation. ACM Press, 2001, pp. 286-297.
-
(2001)
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
, pp. 286-297
-
-
Chatterjee, S.1
Parker, E.2
Hanlon, P.J.3
Lebeck, A.R.4
|