-
1
-
-
20744439191
-
-
[Online]
-
ATLAS home page [Online]. Available: http.//math-atlas.sourceforge.net/
-
ATLAS Home Page
-
-
-
2
-
-
20744450712
-
-
[Online]
-
PHiPAC home page [Online]. Available: http://www.icsi.berkeley.edu/ ~bilmes/phipac
-
PHiPAC Home Page
-
-
-
3
-
-
0028427170
-
Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch
-
R. C. Agarwal, F. G. Gustavson, and M. Zubair, "Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch," IBM J. Res. Develop., vol. 38, no. 3, pp. 265-275, 1994.
-
(1994)
IBM J. Res. Develop.
, vol.38
, Issue.3
, pp. 265-275
-
-
Agarwal, R.C.1
Gustavson, F.G.2
Zubair, M.3
-
5
-
-
0003207812
-
Unimodular transformations of double loops
-
Cambridge, MA: MIT Press, ch. 10
-
U. Banerjee, "Unimodular transformations of double loops," in Advances in Languages and Compilers for Parallel Processing. Cambridge, MA: MIT Press, 1991, ch. 10, pp. 192-219.
-
(1991)
Advances in Languages and Compilers for Parallel Processing
, pp. 192-219
-
-
Banerjee, U.1
-
6
-
-
0030661485
-
Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology
-
Vienna, Austria
-
J. Bilmes, K. Asanović, C.-w. Chin, and J. Demmel, "Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology," presented at the Int. Conf. Supercomputing, Vienna, Austria, 1997.
-
(1997)
Int. Conf. Supercomputing
-
-
Bilmes, J.1
Asanović, K.2
Chin, C.-W.3
Demmel, J.4
-
7
-
-
0028482686
-
(Pen)-ultimate tiling?
-
P. Boulet, A. Darte, T. Risset, and Y. Robert, "(Pen)-ultimate tiling?," Integration VLSI J., vol. 17, pp. 33-51, 1994.
-
(1994)
Integration VLSI J.
, vol.17
, pp. 33-51
-
-
Boulet, P.1
Darte, A.2
Risset, T.3
Robert, Y.4
-
8
-
-
0000493064
-
Estimating interlock and improving balance for pipelined architectures
-
D. Callahan, J. Cocke, and K. Kennedy, "Estimating interlock and improving balance for pipelined architectures," J. Parallel Distrib. Comput., vol. 5, no. 4, pp. 334-358, 1988.
-
(1988)
J. Parallel Distrib. Comput.
, vol.5
, Issue.4
, pp. 334-358
-
-
Callahan, D.1
Cocke, J.2
Kennedy, K.3
-
9
-
-
0025447908
-
Improving register allocation for subscripted variables
-
D. Callahan, S. Carr, and K. Kennedy, "Improving register allocation for subscripted variables," in Proc. SIGPLAN Conf. Programming Language Design and Implementation, 1990, pp. 53-65.
-
(1990)
Proc. SIGPLAN Conf. Programming Language Design and Implementation
, pp. 53-65
-
-
Callahan, D.1
Carr, S.2
Kennedy, K.3
-
10
-
-
18844387390
-
Exact analysis of the cache behavior of nested loops
-
S. Chatterjee, E. Parker, P. J. Hanlon, and A. R. Lebeck, "Exact analysis of the cache behavior of nested loops," in Proc. ACM SIGPLAN 2001 Conf. Programming Language Design and Implementation, 2001, pp. 286-297.
-
(2001)
Proc. ACM SIGPLAN 2001 Conf. Programming Language Design and Implementation
, pp. 286-297
-
-
Chatterjee, S.1
Parker, E.2
Hanlon, P.J.3
Lebeck, A.R.4
-
13
-
-
0026933251
-
Some efficient solutions to the affine scheduling problem - Part 1: One dimensional time
-
Oct.
-
P. Feautrier, "Some efficient solutions to the affine scheduling problem - Part 1: One dimensional time," Int. J. Parallel Program., vol. 1, no. 5, pp. 313-348, Oct. 1992.
-
(1992)
Int. J. Parallel Program.
, vol.1
, Issue.5
, pp. 313-348
-
-
Feautrier, P.1
-
14
-
-
0036575993
-
Yet another optimization article
-
May/Jun.
-
M. Fowler, "Yet another optimization article," IEEE Softw., vol. 19, no. 3, pp. 20-21, May/Jun. 2002.
-
(2002)
IEEE Softw.
, vol.19
, Issue.3
, pp. 20-21
-
-
Fowler, M.1
-
15
-
-
0033358624
-
Automatic analytical modeling for the estimation of cache misses
-
B. B. Fraguela, R. Doallo, and E. Zapata, "Automatic analytical modeling for the estimation of cache misses," in Proc. Int. Conf. Parallel Architectures and Compilation Techniques (PACT), 1999, pp. 221-231.
-
(1999)
Proc. Int. Conf. Parallel Architectures and Compilation Techniques (PACT)
, pp. 221-231
-
-
Fraguela, B.B.1
Doallo, R.2
Zapata, E.3
-
16
-
-
0031636309
-
FFTW: An adaptive software architecture for the FFT
-
M. Frigo and S. G. Johnson, "FFTW: An adaptive software architecture for the FFT," in Proc. IEEE Intl. Conf. Acoustics, Speech, and Signal Processing, vol. 3, 1998, pp. 1381-1384.
-
(1998)
Proc. IEEE Intl. Conf. Acoustics, Speech, and Signal Processing
, vol.3
, pp. 1381-1384
-
-
Frigo, M.1
Johnson, S.G.2
-
17
-
-
20744449792
-
The design and implementation of FFTW3
-
Feb.
-
_, "The design and implementation of FFTW3," Proc. IEEE, vol. 93, no. 2, pp. 216-231, Feb. 2005.
-
(2005)
Proc. IEEE
, vol.93
, Issue.2
, pp. 216-231
-
-
-
19
-
-
1542392269
-
On reducing TLB misses in matrix multiplication
-
Dept. Comput. Sci., Univ. Texas, Austin
-
K. Goto and R. van de Geijn, "On reducing TLB misses in matrix multiplication," Dept. Comput. Sci., Univ. Texas, Austin, Tech. Rep. TR-2002-55, 2002.
-
(2002)
Tech. Rep.
, vol.TR-2002-55
-
-
Goto, K.1
Van De Geijn, R.2
-
20
-
-
84949665448
-
A family of high-performance matrix algorithms
-
J. A. Gunnels, G. M. Henry, and R. A. van de Geijn, "A family of high-performance matrix algorithms," in Proc. Int. Conf, Computational Science (ICCS 2001), pp. 51-60.
-
Proc. Int. Conf, Computational Science (ICCS 2001)
, pp. 51-60
-
-
Gunnels, J.A.1
Henry, G.M.2
Van De Geijn, R.A.3
-
21
-
-
20744436023
-
-
private communication
-
F. Gustavson, private communication, 2004.
-
(2004)
-
-
Gustavson, F.1
-
23
-
-
20744456697
-
Flexible High-Performance Matrix Multiply via Self-Modifying Runtime Code
-
Dept. Comput. Sci., Univ. Texas, Austin, Dec.
-
"Flexible High-Performance Matrix Multiply via Self-Modifying Runtime Code," Dept. Comput. Sci., Univ. Texas, Austin, Tech. Rep. CS-TR-01-44, Dec. 2001.
-
(2001)
Tech. Rep.
, vol.CS-TR-01-44
-
-
-
25
-
-
20744435859
-
Searching for the best FFT formulas with the SPL compiler
-
J. Johnson, R. W. Johnson, D. A. Padua, and J. Xiong, "Searching for the best FFT formulas with the SPL compiler," in Proc. 13th Int. Workshop on Languages and Compilers for Parallel Computing, 2000, pp. 109-124.
-
(2000)
Proc. 13th Int. Workshop on Languages and Compilers for Parallel Computing
, pp. 109-124
-
-
Johnson, J.1
Johnson, R.W.2
Padua, D.A.3
Xiong, J.4
-
26
-
-
0347304618
-
Data-centric multi-level blocking
-
I. Kodukula, N. Ahmed, and K. Pingali, "Data-centric multi-level blocking," in Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1997, pp. 346-357.
-
(1997)
Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation
, pp. 346-357
-
-
Kodukula, I.1
Ahmed, N.2
Pingali, K.3
-
27
-
-
10844294800
-
Imperfectly nested loop transformations for memory hierarchy management
-
Rhodes, Greece, June
-
I. Kodukula and K. Pingali, "Imperfectly nested loop transformations for memory hierarchy management," presented at the Int. Conf. Supercomputing, Rhodes, Greece, June 1999.
-
(1999)
Int. Conf. Supercomputing
-
-
Kodukula, I.1
Pingali, K.2
-
28
-
-
0027694019
-
Access normalization: Loop restructuring for NUMA compilers
-
W. Li and K. Pingali, "Access normalization: Loop restructuring for NUMA compilers," ACM Trans. Comput. Syst., 1993.
-
(1993)
ACM Trans. Comput. Syst.
-
-
Li, W.1
Pingali, K.2
-
29
-
-
0014701246
-
Evaluation techniques for storage hierarchies
-
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger, "Evaluation techniques for storage hierarchies," IBM Syst. J., vol. 9, no. 2, pp. 78-92, 1970.
-
(1970)
IBM Syst. J.
, vol.9
, Issue.2
, pp. 78-92
-
-
Mattson, R.L.1
Gecsei, J.2
Slutz, D.R.3
Traiger, I.L.4
-
30
-
-
84945709131
-
Organizing matrices and matrix operations for paged memory systems
-
A. C. McKellar and E. G. Coffman Jr., "Organizing matrices and matrix operations for paged memory systems," Commun. ACM, vol. 12, no. 3, pp. 153-165, 1969.
-
(1969)
Commun. ACM
, vol.12
, Issue.3
, pp. 153-165
-
-
McKellar, A.C.1
Coffman Jr., E.G.2
-
31
-
-
0022874874
-
Advanced compiler optimization for supercomputers
-
Dec
-
D. Padua and M. Wolfe, "Advanced compiler optimization for supercomputers," Commun. ACM, vol. 29, no. 12, pp. 1184-1201, Dec, 1986.
-
(1986)
Commun. ACM
, vol.29
, Issue.12
, pp. 1184-1201
-
-
Padua, D.1
Wolfe, M.2
-
32
-
-
19344368072
-
SPIRAL: Code generation for DSP transforms
-
Feb.
-
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, "SPIRAL: Code generation for DSP transforms," Proc. IEEE, vol. 93, no. 2, pp. 232-275, Feb. 2005.
-
(2005)
Proc. IEEE
, vol.93
, Issue.2
, pp. 232-275
-
-
Püschel, M.1
Moura, J.M.F.2
Johnson, J.3
Padua, D.4
Veloso, M.5
Singer, B.W.6
Xiong, J.7
Franchetti, F.8
Gačić, A.9
Voronenko, Y.10
Chen, K.11
Johnson, R.W.12
Rizzolo, N.13
-
33
-
-
0024898517
-
Engineering and scientific subroutine library release 3 for IBM ES/3090 vector multiprocessors
-
J. McComb, R. C. Agarwal, F. G. Gustavson, and S. Schmidt, "Engineering and scientific subroutine library release 3 for IBM ES/3090 vector multiprocessors," IBM Syst. J., vol. 28, no. 2, pp. 345-350, 1989.
-
(1989)
IBM Syst. J.
, vol.28
, Issue.2
, pp. 345-350
-
-
McComb, J.1
Agarwal, R.C.2
Gustavson, F.G.3
Schmidt, S.4
-
35
-
-
0003929457
-
Automatic blocking of nested loops
-
Univ. Tennessee, Knoxville
-
R. Schreiber and J. Dongarra, "Automatic blocking of nested loops," Univ. Tennessee, Knoxville, Tech. Rep. CS-90-108, 1990.
-
(1990)
Tech. Rep.
, vol.CS-90-108
-
-
Schreiber, R.1
Dongarra, J.2
-
36
-
-
20744440107
-
-
private communication
-
R. C. Whaley, private communication, 2004.
-
(2004)
-
-
Whaley, R.C.1
-
37
-
-
20744443273
-
-
[Online]
-
_, x86 optimizations, part 1. [Online], Available: http://sourceforge. net/mailarchive/forum.php?thread_id=1569256&forum_id=426
-
X86 Optimizations, Part 1
-
-
-
38
-
-
13244261416
-
-
[Online]
-
_, User contribution to ATLAS. [Online]. Available: http://math-atlas. sourceforge.net/devel/atlas_contrib
-
User Contribution to ATLAS
-
-
-
39
-
-
13244279577
-
Minimizing development and maintenance costs in supporting persistently optimized BLAS
-
to be published
-
R. C. Whaley and A. Petitet, "Minimizing development and maintenance costs in supporting persistently optimized BLAS," Softw. Pract. Exper., to be published.
-
Softw. Pract. Exper.
-
-
Whaley, R.C.1
Petitet, A.2
-
40
-
-
0343462141
-
Automated empirical optimization of software and the ATLAS project
-
R. C. Whaley, A. Petitet, and J. J. Dongarra, "Automated empirical optimization of software and the ATLAS project," Parallel Comput, vol. 27, no. 1-2, pp. 3-35, 2001.
-
(2001)
Parallel Comput
, vol.27
, Issue.1-2
, pp. 3-35
-
-
Whaley, R.C.1
Petitet, A.2
Dongarra, J.J.3
-
43
-
-
20744433639
-
X-Ray: A tool for automatic measurement of architectural parameters
-
K. Yotov, K. Pingali, and P. Stodghill, "X-Ray: A tool for automatic measurement of architectural parameters," Comput. Sci., Cornell Univ., Tech. Rep. TR2004-1966, 2004.
-
(2004)
Comput. Sci., Cornell Univ., Tech. Rep.
, vol.TR2004-1966
-
-
Yotov, K.1
Pingali, K.2
Stodghill, P.3
|