-
1
-
-
0012417357
-
Tuning strassen's matrix multiplication for memory efficiency
-
Nov.
-
M. Thottethodi, S. Chatterjee, and A.R. Lebeck, "Tuning Strassen's Matrix Multiplication for Memory Efficiency," Proc. SC98 (CD-ROM), Nov. 1998, available from http://www.supercomp.org/sc98.
-
(1998)
Proc. SC98 (CD-ROM)
-
-
Thottethodi, M.1
Chatterjee, S.2
Lebeck, A.R.3
-
2
-
-
0032652980
-
Nonlinear array layouts for hierarchical memory systems
-
June
-
S. Chatterjee, V.V. Jain, A.R. Lebeck, Mundhra, S. and M. Thottethodi, "Nonlinear Array Layouts for Hierarchical Memory Systems," Proc. 1999 ACM Int'l Conf. Supercomputing, pp. 444-453, June 1999.
-
(1999)
Proc. 1999 ACM Int'l Conf. Supercomputing
, pp. 444-453
-
-
Chatterjee, S.1
Jain, V.V.2
Lebeck, A.R.3
Mundhra, S.4
Thottethodi, M.5
-
3
-
-
0032659795
-
Recursive array layouts and fast parallel matrix multiplication
-
June
-
S. Chatterjee, A.R. Lebeck, P.K. Patnala, and M. Thottethodi, "Recursive Array Layouts and Fast Parallel Matrix Multiplication," Proc. 11th Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 222-231 June 1999.
-
(1999)
Proc. 11th Ann. ACM Symp. Parallel Algorithms and Architectures
, pp. 222-231
-
-
Chatterjee, S.1
Lebeck, A.R.2
Patnala, P.K.3
Thottethodi, M.4
-
4
-
-
0025402476
-
A set of level 3 basic linear algebra subprograms
-
Jan.
-
J.J. Dongara, J. Du Croz, I.S. Duff, and S. Hammarling, "A Set of Level 3 Basic Linear Algebra Subprograms," ACM Trans. Math. Software, vol. 16, no. 1, pp. 1-17, Jan. 1990.
-
(1990)
ACM Trans. Math. Software
, vol.16
, Issue.1
, pp. 1-17
-
-
Dongara, J.J.1
Du Croz, J.2
Duff, I.S.3
Hammarling, S.4
-
5
-
-
33846316938
-
Eber Stetige Abbildung Einer Linie Auf Ein Flächenstück
-
D. Hilbert, "Eber Stetige Abbildung Einer Linie Auf Ein Flächenstück," Mathematische Annalen, vol. 38, pp. 459-460, 1891.
-
(1891)
Mathematische Annalen
, vol.38
, pp. 459-460
-
-
Hilbert, D.1
-
6
-
-
0005083863
-
Sur une Courbe Qui Remplit Toute une Aire Plaine
-
G. Peano, "Sur une Courbe Qui Remplit Toute une Aire Plaine," Mathematische Annalen, vol. 36, pp. 157-160, 1890.
-
(1890)
Mathematische Annalen
, vol.36
, pp. 157-160
-
-
Peano, G.1
-
7
-
-
0027747808
-
A parallel hashed oct-tree N-body algorithm
-
Nov.
-
M.S. Warren and J.K. Salmon, "A Parallel Hashed Oct-Tree N-Body Algorithm," Proc. Supercomputing '93, pp. 12-21, Nov. 1993.
-
(1993)
Proc. Supercomputing '93
, pp. 12-21
-
-
Warren, M.S.1
Salmon, J.K.2
-
8
-
-
0029429935
-
Balancing processor loads and exploiting data locality in N-body simulations
-
I. Banicescu and S.F. Hummel, "Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations," Proc. Supercomputing '95 (CD-ROM), Dec. 1995, available from http://www.supercomp.org/sc95proceedings/594_BHUM/SC95.HTM.
-
Proc. Supercomputing '95 (CD-ROM), Dec. 1995
-
-
Banicescu, I.1
Hummel, S.F.2
-
9
-
-
84875636475
-
Load balancing and data locality via fractiling: An experimental study
-
S.F. Hummel, I. Banicescu, C.-T. Wang, and J. Wein, "Load Balancing and Data Locality via Fractiling: An Experimental Study," Language, Compilers, and Run-Time Systems for Scalable Computers, 1995.
-
(1995)
Language, Compilers, and Run-Time Systems for Scalable Computers
-
-
Hummel, S.F.1
Banicescu, I.2
Wang, C.-T.3
Wein, J.4
-
10
-
-
0030699816
-
High performance fortran for highly irregular problems
-
June
-
Y.C. Hu, S.L. Johnsson, and S.-H. Teng, "High Performance Fortran for Highly Irregular Problems," Proc. the Sixth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 13-24, June 1997.
-
(1997)
Proc. the Sixth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming
, pp. 13-24
-
-
Hu, Y.C.1
Johnsson, S.L.2
Teng, S.-H.3
-
11
-
-
0030105726
-
Dynamic partitioning of non-uniform structured workloads with spacefilling curves
-
Mar.
-
J.R. Pilkington and S.B. Baden, "Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 3, pp. 288-300, Mar. 1996.
-
(1996)
IEEE Trans. Parallel and Distributed Systems
, vol.7
, Issue.3
, pp. 288-300
-
-
Pilkington, J.R.1
Baden, S.B.2
-
12
-
-
0027735063
-
An empirical comparison of the kendall square research KSR-1 and the stanford DASH multiprocessors
-
Nov.
-
J.P. Singh, T. Joe, J.L. Hennessy, and A. Gupta, "An Empirical Comparison of the Kendall Square Research KSR-1 and the Stanford DASH Multiprocessors," Proc. Supercomputing '93, pp. 214-225, Nov. 1993.
-
(1993)
Proc. Supercomputing '93
, pp. 214-225
-
-
Singh, J.P.1
Joe, T.2
Hennessy, J.L.3
Gupta, A.4
-
13
-
-
0014600391
-
Space-filling curves: Their generation and their application to bandwidth reduction
-
Nov.
-
T. Bially, "Space-Filling Curves: Their Generation and Their Application to Bandwidth Reduction," IEEE Trans. Information Theory, vol. 15, no. 6, pp. 658-664, Nov. 1969.
-
(1969)
IEEE Trans. Information Theory
, vol.15
, Issue.6
, pp. 658-664
-
-
Bially, T.1
-
14
-
-
0021082182
-
Optimizing raster storage: An examination of four alternatives
-
Oct.
-
M.F. Goodchild and A.W. Grandfield, "Optimizing Raster Storage: An Examination of Four Alternatives," Proc. Auto-Carto 6, vol. 1, pp. 400-407, Oct. 1983.
-
(1983)
Proc. Auto-Carto 6
, vol.1
, pp. 400-407
-
-
Goodchild, M.F.1
Grandfield, A.W.2
-
15
-
-
0022193652
-
Graphical data bases built on peano space-filling curves
-
C.E. Vandoni, ed.
-
R. Laurini, "Graphical Data Bases Built on Peano Space-Filling Curves," Proc. EUROGRAPHICS '85 Conf., C.E. Vandoni, ed., pp. 327-338, 1985.
-
(1985)
Proc. EUROGRAPHICS '85 Conf.
, pp. 327-338
-
-
Laurini, R.1
-
16
-
-
0025446215
-
Linear clustering of objects with multiple attributes
-
H. Garcia-Molina and H.V. Jagadish, eds.; May
-
H.V. Jagadish, "Linear Clustering of Objects with Multiple Attributes," Proc. 1990 ACM SIGMOD Int'l Conf. Management of Data, H. Garcia-Molina and H.V. Jagadish, eds., pp. 332-342, May 1990.
-
(1990)
Proc. 1990 ACM SIGMOD Int'l Conf. Management of Data
, pp. 332-342
-
-
Jagadish, H.V.1
-
18
-
-
0010020992
-
Ahnentafel indexing into morton-ordered arrays, or matrix locality for free
-
A. Bode, T. Ludwig, W. Karl, and R. Wismüller, eds.
-
D.S. Wise, "Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free," Euro-Par 2000 Parallel Processing, A. Bode, T. Ludwig, W. Karl, and R. Wismüller, eds., 2000.
-
(2000)
Euro-Par 2000 Parallel Processing
-
-
Wise, D.S.1
-
19
-
-
0034819362
-
Language support for morton-order matrices
-
June
-
D.S. Wise, J.D. Frens, Y. Gu, and G.A. Alexander, "Language Support for Morton-Order Matrices," Proc. Eighth ACM SIGPLAN Symp. Principles and Practices of Parallel Programming, pp. 24-33, June 2001.
-
(2001)
Proc. Eighth ACM SIGPLAN Symp. Principles and Practices of Parallel Programming
, pp. 24-33
-
-
Wise, D.S.1
Frens, J.D.2
Gu, Y.3
Alexander, G.A.4
-
20
-
-
0009871982
-
Analysis of the clustering properting of hilbert space-filling curve
-
Technical Report CS-TR-3611, Computer Science Dept., Univ. of Maryland, College Park
-
B. Moon, H.V. Jagadish, C. Faloutsos, and J.H. Saltz, "Analysis of the Clustering Properting of Hilbert Space-Filling Curve," Technical Report CS-TR-3611, Computer Science Dept., Univ. of Maryland, College Park, 1996.
-
(1996)
-
-
Moon, B.1
Jagadish, H.V.2
Faloutsos, C.3
Saltz, J.H.4
-
21
-
-
0026137116
-
The cache performance and optimizations of blocked algorithms
-
Apr.
-
M.S. Lam, E.E. Rothberg, and M.E. Wolf, "The Cache Performance and Optimizations of Blocked Algorithms," Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 63-74, Apr. 1991.
-
(1991)
Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems
, pp. 63-74
-
-
Lam, M.S.1
Rothberg, E.E.2
Wolf, M.E.3
-
22
-
-
0012415981
-
Basic linear algebra subroutine technical (BLAST) forum standard
-
Basic Linear Algebra Subroutine Technical (BLAST) Forum; Aug.
-
Basic Linear Algebra Subroutine Technical (BLAST) Forum, "Basic Linear Algebra Subroutine Technical (BLAST) Forum Standard," http://www.netlib.org/blas/blast-forum/, Aug. 2001.
-
(2001)
-
-
-
23
-
-
4243799034
-
Personal communication
-
Aug.
-
C.E. Leiserson, "Personal Communication," Aug. 1998.
-
(1998)
-
-
Leiserson, C.E.1
-
24
-
-
34250487811
-
Gaussian elimination is not optimal
-
V. Strassen, "Gaussian Elimination is Not Optimal," Numerical Mathematics, vol. 13, pp. 354-356, 1969.
-
(1969)
Numerical Mathematics
, vol.13
, pp. 354-356
-
-
Strassen, V.1
-
25
-
-
0029191296
-
Cilk: An efficient multithreaded runtime system
-
July
-
R.D. Blumofe, C.F. Joerg, B.C. Kuszmaul, C.E. Leiserson, K.H. Randall, and Y. Zhou, "Cilk: An Efficient Multithreaded Runtime System," Proc. Fifth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 207-216, July 1995, also see http://theory.lcs.mit.edu/~cilk.
-
(1995)
Proc. Fifth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming
, pp. 207-216
-
-
Blumofe, R.D.1
Joerg, C.F.2
Kuszmaul, B.C.3
Leiserson, C.E.4
Randall, K.H.5
Zhou, Y.6
-
27
-
-
0024903997
-
Evaluating associativity in CPU caches
-
Dec.
-
M.D. Hill and A.J. Smith, "Evaluating Associativity in CPU Caches," IEEE Trans. Computers, vol. 38, no. 12, pp. 1612-1630, Dec. 1989.
-
(1989)
IEEE Trans. Computers
, vol.38
, Issue.12
, pp. 1612-1630
-
-
Hill, M.D.1
Smith, A.J.2
-
31
-
-
0003487717
-
-
Scientific and Eng. Computation, Cambridge, Mass.: MIT Press
-
C.H. Koelbel, D.B. Loveman, R.S. Schreiber, G.L. Steele Jr., and M.E. Zosel, The High Performance Fortran Handbook. Scientific and Eng. Computation, Cambridge, Mass.: MIT Press, 1994.
-
(1994)
The High Performance Fortran Handbook
-
-
Koelbel, C.H.1
Loveman, D.B.2
Schreiber, R.S.3
Steele G.L., Jr.4
Zosel, M.E.5
-
34
-
-
33947652414
-
Implementation of strassen's algorithm for matrix multiplication
-
S. Huss-Lederman, E.M. Jacobson, J.R. Johnson, A. Tsao, and T. Turnbull, "Implementation of Strassen's Algorithm for Matrix Multiplication," Proc. Supercomputing '96, 1996.
-
(1996)
Proc. Supercomputing '96
-
-
Huss-Lederman, S.1
Jacobson, E.M.2
Johnson, J.R.3
Tsao, A.4
Turnbull, T.5
-
35
-
-
0002663082
-
GEMMW: A portable level 3 BLAS winograd variant of strassen's matrix-matrix multiply algorithm
-
C. Douglas, M. Heroux, G. Slishman, and R.M. Smith, "GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm," J. Computational Physics, vol. 110, pp. 1-10, 1994.
-
(1994)
J. Computational Physics
, vol.110
, pp. 1-10
-
-
Douglas, C.1
Heroux, M.2
Slishman, G.3
Smith, R.M.4
-
36
-
-
0030661485
-
Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology
-
July
-
J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel, "Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology," Proc. Int'l Conf. Supercomputing, pp. 340-347, July 1997.
-
(1997)
Proc. Int'l Conf. Supercomputing
, pp. 340-347
-
-
Bilmes, J.1
Asanovic, K.2
Chin, C.-W.3
Demmel, J.4
-
37
-
-
84943297310
-
Automatically tuned linear algebra software
-
R. C. Whaley, "Automatically Tuned Linear Algebra Software," Proc. Conf. Supercomputing, 1998.
-
(1998)
Proc. Conf. Supercomputing
-
-
Whaley, R.C.1
-
38
-
-
0031636309
-
FFTW: An adaptive software architecure for the FFT
-
M. Frigo and S.G. Johnson, "FFTW: An Adaptive Software Architecure for the FFT," Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 3, p. 1381, 1998.
-
(1998)
Proc. Int'l Conf. Acoustics, Speech, and Signal Processing
, vol.3
, pp. 1381
-
-
Frigo, M.1
Johnson, S.G.2
-
39
-
-
0031496750
-
Locality of reference in LU decomposition with partial pivoting
-
Oct.
-
S. Toledo, "Locality of Reference in LU Decomposition with Partial Pivoting," SIAM J. Matrix Analysis and Applications, vol. 18, no. 4, pp. 1065-1081, Oct. 1997.
-
(1997)
SIAM J. Matrix Analysis and Applications
, vol.18
, Issue.4
, pp. 1065-1081
-
-
Toledo, S.1
-
40
-
-
0031273280
-
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
-
Nov.
-
F.G. Gustavson, "Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms," IBM J. Research and Development, vol. 41, no. 6, pp. 737-755, Nov. 1997.
-
(1997)
IBM J. Research and Development
, vol.41
, Issue.6
, pp. 737-755
-
-
Gustavson, F.G.1
-
41
-
-
0034224207
-
Applying recursion to serial and parallel QR factorization leads to better performance
-
July
-
E. Elmroth and F. Gustavson, "Applying Recursion to Serial and Parallel QR Factorization Leads to Better Performance," IBM J. Research and Development, vol. 44, no. 4, pp. 605-624, July 2000.
-
(2000)
IBM J. Research and Development
, vol.44
, Issue.4
, pp. 605-624
-
-
Elmroth, E.1
Gustavson, F.2
-
42
-
-
0012454461
-
Recursive formulation of colesky algorithm in fortran 90
-
B. Kågström, J. Dongarra, E. Elmroth, and J. Wasniewski, eds., June
-
J. Wasniewski, B.S. Anderson, and F. Gustavson, "Recursive Formulation of Colesky Algorithm in Fortran 90," Proc. Fourth Int'l Workshop, Applied Parallel Computing, Large Scale Scientific and Industrial Problems, PARA '98, 'B. Kågström, J. Dongarra, E. Elmroth, and J. Wasniewski, eds., June 1998.
-
(1998)
Proc. Fourth Int'l Workshop, Applied Parallel Computing, Large Scale Scientific and Industrial Problems, PARA '98
-
-
Wasniewski, J.1
Anderson, B.S.2
Gustavson, F.3
-
43
-
-
84948647315
-
Recursive formulation of some dense linear algebra algorithms
-
B. Hendrickson, K.A. Yelick, C.H. Bischof, I.S. Duff, A.S. Edelman, G.A. Geist, M.T. Heath, M.A. Heroux, C. Koelbel, R.S. Schreiber, R.F. Sincovec, and M.F. Wheeler, eds., Mar.
-
B.S. Andersen, F. Gustavson, J. Wasniewski, and P.Y. Yalamov, "Recursive Formulation of Some Dense Linear Algebra Algorithms," Proc. Ninth SIAM Conf. Parallel Processing for Scientific Computing (PPSC '99), B. Hendrickson, K.A. Yelick, C.H. Bischof, I.S. Duff, A.S. Edelman, G.A. Geist, M.T. Heath, M.A. Heroux, C. Koelbel, R.S. Schreiber, R.F. Sincovec, and M.F. Wheeler, eds., Mar. 1999.
-
(1999)
Proc. Ninth SIAM Conf. Parallel Processing for Scientific Computing (PPSC '99)
-
-
Andersen, B.S.1
Gustavson, F.2
Wasniewski, J.3
Yalamov, P.Y.4
-
45
-
-
0012379036
-
New generalized data structures for matrices lead to a variety of high-performance algorithms
-
B. Engquist, L. Johnson, M. Hammill, and F. Short, eds.
-
F.G. Gustavson, "New Generalized Data Structures for Matrices Lead to a Variety of High-Performance Algorithms," Simulation and Visualization on the Grid, B. Engquist, L. Johnson, M. Hammill, and F. Short, eds., 2000.
-
(2000)
Simulation and Visualization on the Grid
-
-
Gustavson, F.G.1
-
46
-
-
0012474071
-
Techniques for improving the data locality of iterative methods
-
Technical Report MRR97-038, Institut für Mathematik, Universität Augsburg, Germany, Oct.
-
L. Stals and U. Rüde, "Techniques for Improving the Data Locality of Iterative Methods," Technical Report MRR97-038, Institut für Mathematik, Universität Augsburg, Germany, Oct. 1997.
-
(1997)
-
-
Stals, L.1
Rüde, U.2
-
47
-
-
0001255125
-
Report of the working group on storage I/O for large-scale computing
-
Dec.
-
G. Gibson, J.S. Vitter, and J. Wilkes, "Report of the Working Group on Storage I/O for Large-Scale Computing," ACM Computing Surveys, Dec. 1996.
-
(1996)
ACM Computing Surveys
-
-
Gibson, G.1
Vitter, J.S.2
Wilkes, J.3
-
48
-
-
0031122907
-
Efficient out-of-core algorithms for linear relaxation using blocking covers
-
C.E. Leiserson, S. Rao, and S. Toledo, "Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers," J. Computer and System Sciences, vol. 54, no. 2, pp. 332-344, 1997.
-
(1997)
J. Computer and System Sciences
, vol.54
, Issue.2
, pp. 332-344
-
-
Leiserson, C.E.1
Rao, S.2
Toledo, S.3
-
50
-
-
0024935630
-
More iteration space tiling
-
Nov.
-
M.J. Wolfe, "More Iteration Space Tiling," Proc. Supercomputing '89, pp. 655-664, Nov. 1989.
-
(1989)
Proc. Supercomputing '89
, pp. 655-664
-
-
Wolfe, M.J.1
-
52
-
-
84976831704
-
Compiler optimizations for improving data locality
-
Oct.
-
S. Carr, K.S. McKinley, and C.-W. Tseng, "Compiler Optimizations for Improving Data Locality," Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 252-262, Oct. 1994.
-
(1994)
Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems
, pp. 252-262
-
-
Carr, S.1
McKinley, K.S.2
Tseng, C.-W.3
-
55
-
-
0025381427
-
Data optimization: Allocation of arrays to reduce communication on SIMD machines
-
Feb.
-
K. Knobe, J.D. Lukas, and G.L. Steele, Jr., "Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines," J. Parallel and Distributed Computing, vol. 8, no. 2, pp. 102-118, Feb. 1990.
-
(1990)
J. Parallel and Distributed Computing
, vol.8
, Issue.2
, pp. 102-118
-
-
Knobe, K.1
Lukas, J.D.2
Steele G.L., Jr.3
-
56
-
-
0004067602
-
Automatic data partitioning on distributed memory multicomputers
-
PhD Thesis, University of Illinois at Urbana-Champaign, Urbana, Sept.
-
M. Gupta, "Automatic Data Partitioning on Distributed Memory Multicomputers," PhD Thesis, University of Illinois at Urbana-Champaign, Urbana, Sept. 1992.
-
(1992)
-
-
Gupta, M.1
-
57
-
-
0029238937
-
Optimal evaluation of array expressions on massively parallel machines
-
Jan.
-
S. Chatterjee, J.R. Gilbert, R. Schreiber, and S.-H. Teng, "Optimal Evaluation of Array Expressions on Massively Parallel Machines," ACM Trans. Programming Languages and Systems, vol. 17, no. 1, pp. 123-156, Jan. 1995.
-
(1995)
ACM Trans. Programming Languages and Systems
, vol.17
, Issue.1
, pp. 123-156
-
-
Chatterjee, S.1
Gilbert, J.R.2
Schreiber, R.3
Teng, S.-H.4
|