메뉴 건너뛰기




Volumn 13, Issue 11, 2002, Pages 1105-1123

Recursive array layouts and fast matrix multiplication

Author keywords

Data layout; Matrix multiplication

Indexed keywords

ALGORITHMS; COMPUTATIONAL COMPLEXITY; COMPUTATIONAL METHODS; DATA STRUCTURES; LINEAR ALGEBRA; PROGRAM COMPILERS; STORAGE ALLOCATION (COMPUTER);

EID: 0036870763     PISSN: 10459219     EISSN: None     Source Type: Journal    
DOI: 10.1109/TPDS.2002.1058095     Document Type: Article
Times cited : (64)

References (58)
  • 1
    • 0012417357 scopus 로고    scopus 로고
    • Tuning strassen's matrix multiplication for memory efficiency
    • Nov.
    • M. Thottethodi, S. Chatterjee, and A.R. Lebeck, "Tuning Strassen's Matrix Multiplication for Memory Efficiency," Proc. SC98 (CD-ROM), Nov. 1998, available from http://www.supercomp.org/sc98.
    • (1998) Proc. SC98 (CD-ROM)
    • Thottethodi, M.1    Chatterjee, S.2    Lebeck, A.R.3
  • 5
    • 33846316938 scopus 로고
    • Eber Stetige Abbildung Einer Linie Auf Ein Flächenstück
    • D. Hilbert, "Eber Stetige Abbildung Einer Linie Auf Ein Flächenstück," Mathematische Annalen, vol. 38, pp. 459-460, 1891.
    • (1891) Mathematische Annalen , vol.38 , pp. 459-460
    • Hilbert, D.1
  • 6
    • 0005083863 scopus 로고
    • Sur une Courbe Qui Remplit Toute une Aire Plaine
    • G. Peano, "Sur une Courbe Qui Remplit Toute une Aire Plaine," Mathematische Annalen, vol. 36, pp. 157-160, 1890.
    • (1890) Mathematische Annalen , vol.36 , pp. 157-160
    • Peano, G.1
  • 7
    • 0027747808 scopus 로고
    • A parallel hashed oct-tree N-body algorithm
    • Nov.
    • M.S. Warren and J.K. Salmon, "A Parallel Hashed Oct-Tree N-Body Algorithm," Proc. Supercomputing '93, pp. 12-21, Nov. 1993.
    • (1993) Proc. Supercomputing '93 , pp. 12-21
    • Warren, M.S.1    Salmon, J.K.2
  • 8
    • 0029429935 scopus 로고    scopus 로고
    • Balancing processor loads and exploiting data locality in N-body simulations
    • I. Banicescu and S.F. Hummel, "Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations," Proc. Supercomputing '95 (CD-ROM), Dec. 1995, available from http://www.supercomp.org/sc95proceedings/594_BHUM/SC95.HTM.
    • Proc. Supercomputing '95 (CD-ROM), Dec. 1995
    • Banicescu, I.1    Hummel, S.F.2
  • 11
    • 0030105726 scopus 로고    scopus 로고
    • Dynamic partitioning of non-uniform structured workloads with spacefilling curves
    • Mar.
    • J.R. Pilkington and S.B. Baden, "Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 3, pp. 288-300, Mar. 1996.
    • (1996) IEEE Trans. Parallel and Distributed Systems , vol.7 , Issue.3 , pp. 288-300
    • Pilkington, J.R.1    Baden, S.B.2
  • 12
    • 0027735063 scopus 로고
    • An empirical comparison of the kendall square research KSR-1 and the stanford DASH multiprocessors
    • Nov.
    • J.P. Singh, T. Joe, J.L. Hennessy, and A. Gupta, "An Empirical Comparison of the Kendall Square Research KSR-1 and the Stanford DASH Multiprocessors," Proc. Supercomputing '93, pp. 214-225, Nov. 1993.
    • (1993) Proc. Supercomputing '93 , pp. 214-225
    • Singh, J.P.1    Joe, T.2    Hennessy, J.L.3    Gupta, A.4
  • 13
    • 0014600391 scopus 로고
    • Space-filling curves: Their generation and their application to bandwidth reduction
    • Nov.
    • T. Bially, "Space-Filling Curves: Their Generation and Their Application to Bandwidth Reduction," IEEE Trans. Information Theory, vol. 15, no. 6, pp. 658-664, Nov. 1969.
    • (1969) IEEE Trans. Information Theory , vol.15 , Issue.6 , pp. 658-664
    • Bially, T.1
  • 14
    • 0021082182 scopus 로고
    • Optimizing raster storage: An examination of four alternatives
    • Oct.
    • M.F. Goodchild and A.W. Grandfield, "Optimizing Raster Storage: An Examination of Four Alternatives," Proc. Auto-Carto 6, vol. 1, pp. 400-407, Oct. 1983.
    • (1983) Proc. Auto-Carto 6 , vol.1 , pp. 400-407
    • Goodchild, M.F.1    Grandfield, A.W.2
  • 15
    • 0022193652 scopus 로고
    • Graphical data bases built on peano space-filling curves
    • C.E. Vandoni, ed.
    • R. Laurini, "Graphical Data Bases Built on Peano Space-Filling Curves," Proc. EUROGRAPHICS '85 Conf., C.E. Vandoni, ed., pp. 327-338, 1985.
    • (1985) Proc. EUROGRAPHICS '85 Conf. , pp. 327-338
    • Laurini, R.1
  • 16
    • 0025446215 scopus 로고
    • Linear clustering of objects with multiple attributes
    • H. Garcia-Molina and H.V. Jagadish, eds.; May
    • H.V. Jagadish, "Linear Clustering of Objects with Multiple Attributes," Proc. 1990 ACM SIGMOD Int'l Conf. Management of Data, H. Garcia-Molina and H.V. Jagadish, eds., pp. 332-342, May 1990.
    • (1990) Proc. 1990 ACM SIGMOD Int'l Conf. Management of Data , pp. 332-342
    • Jagadish, H.V.1
  • 18
    • 0010020992 scopus 로고    scopus 로고
    • Ahnentafel indexing into morton-ordered arrays, or matrix locality for free
    • A. Bode, T. Ludwig, W. Karl, and R. Wismüller, eds.
    • D.S. Wise, "Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free," Euro-Par 2000 Parallel Processing, A. Bode, T. Ludwig, W. Karl, and R. Wismüller, eds., 2000.
    • (2000) Euro-Par 2000 Parallel Processing
    • Wise, D.S.1
  • 20
    • 0009871982 scopus 로고    scopus 로고
    • Analysis of the clustering properting of hilbert space-filling curve
    • Technical Report CS-TR-3611, Computer Science Dept., Univ. of Maryland, College Park
    • B. Moon, H.V. Jagadish, C. Faloutsos, and J.H. Saltz, "Analysis of the Clustering Properting of Hilbert Space-Filling Curve," Technical Report CS-TR-3611, Computer Science Dept., Univ. of Maryland, College Park, 1996.
    • (1996)
    • Moon, B.1    Jagadish, H.V.2    Faloutsos, C.3    Saltz, J.H.4
  • 22
    • 0012415981 scopus 로고    scopus 로고
    • Basic linear algebra subroutine technical (BLAST) forum standard
    • Basic Linear Algebra Subroutine Technical (BLAST) Forum; Aug.
    • Basic Linear Algebra Subroutine Technical (BLAST) Forum, "Basic Linear Algebra Subroutine Technical (BLAST) Forum Standard," http://www.netlib.org/blas/blast-forum/, Aug. 2001.
    • (2001)
  • 23
    • 4243799034 scopus 로고    scopus 로고
    • Personal communication
    • Aug.
    • C.E. Leiserson, "Personal Communication," Aug. 1998.
    • (1998)
    • Leiserson, C.E.1
  • 24
    • 34250487811 scopus 로고
    • Gaussian elimination is not optimal
    • V. Strassen, "Gaussian Elimination is Not Optimal," Numerical Mathematics, vol. 13, pp. 354-356, 1969.
    • (1969) Numerical Mathematics , vol.13 , pp. 354-356
    • Strassen, V.1
  • 27
    • 0024903997 scopus 로고
    • Evaluating associativity in CPU caches
    • Dec.
    • M.D. Hill and A.J. Smith, "Evaluating Associativity in CPU Caches," IEEE Trans. Computers, vol. 38, no. 12, pp. 1612-1630, Dec. 1989.
    • (1989) IEEE Trans. Computers , vol.38 , Issue.12 , pp. 1612-1630
    • Hill, M.D.1    Smith, A.J.2
  • 35
    • 0002663082 scopus 로고
    • GEMMW: A portable level 3 BLAS winograd variant of strassen's matrix-matrix multiply algorithm
    • C. Douglas, M. Heroux, G. Slishman, and R.M. Smith, "GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm," J. Computational Physics, vol. 110, pp. 1-10, 1994.
    • (1994) J. Computational Physics , vol.110 , pp. 1-10
    • Douglas, C.1    Heroux, M.2    Slishman, G.3    Smith, R.M.4
  • 36
    • 0030661485 scopus 로고    scopus 로고
    • Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology
    • July
    • J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel, "Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology," Proc. Int'l Conf. Supercomputing, pp. 340-347, July 1997.
    • (1997) Proc. Int'l Conf. Supercomputing , pp. 340-347
    • Bilmes, J.1    Asanovic, K.2    Chin, C.-W.3    Demmel, J.4
  • 37
    • 84943297310 scopus 로고    scopus 로고
    • Automatically tuned linear algebra software
    • R. C. Whaley, "Automatically Tuned Linear Algebra Software," Proc. Conf. Supercomputing, 1998.
    • (1998) Proc. Conf. Supercomputing
    • Whaley, R.C.1
  • 39
    • 0031496750 scopus 로고    scopus 로고
    • Locality of reference in LU decomposition with partial pivoting
    • Oct.
    • S. Toledo, "Locality of Reference in LU Decomposition with Partial Pivoting," SIAM J. Matrix Analysis and Applications, vol. 18, no. 4, pp. 1065-1081, Oct. 1997.
    • (1997) SIAM J. Matrix Analysis and Applications , vol.18 , Issue.4 , pp. 1065-1081
    • Toledo, S.1
  • 40
    • 0031273280 scopus 로고    scopus 로고
    • Recursion leads to automatic variable blocking for dense linear-algebra algorithms
    • Nov.
    • F.G. Gustavson, "Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms," IBM J. Research and Development, vol. 41, no. 6, pp. 737-755, Nov. 1997.
    • (1997) IBM J. Research and Development , vol.41 , Issue.6 , pp. 737-755
    • Gustavson, F.G.1
  • 41
    • 0034224207 scopus 로고    scopus 로고
    • Applying recursion to serial and parallel QR factorization leads to better performance
    • July
    • E. Elmroth and F. Gustavson, "Applying Recursion to Serial and Parallel QR Factorization Leads to Better Performance," IBM J. Research and Development, vol. 44, no. 4, pp. 605-624, July 2000.
    • (2000) IBM J. Research and Development , vol.44 , Issue.4 , pp. 605-624
    • Elmroth, E.1    Gustavson, F.2
  • 43
    • 84948647315 scopus 로고    scopus 로고
    • Recursive formulation of some dense linear algebra algorithms
    • B. Hendrickson, K.A. Yelick, C.H. Bischof, I.S. Duff, A.S. Edelman, G.A. Geist, M.T. Heath, M.A. Heroux, C. Koelbel, R.S. Schreiber, R.F. Sincovec, and M.F. Wheeler, eds., Mar.
    • B.S. Andersen, F. Gustavson, J. Wasniewski, and P.Y. Yalamov, "Recursive Formulation of Some Dense Linear Algebra Algorithms," Proc. Ninth SIAM Conf. Parallel Processing for Scientific Computing (PPSC '99), B. Hendrickson, K.A. Yelick, C.H. Bischof, I.S. Duff, A.S. Edelman, G.A. Geist, M.T. Heath, M.A. Heroux, C. Koelbel, R.S. Schreiber, R.F. Sincovec, and M.F. Wheeler, eds., Mar. 1999.
    • (1999) Proc. Ninth SIAM Conf. Parallel Processing for Scientific Computing (PPSC '99)
    • Andersen, B.S.1    Gustavson, F.2    Wasniewski, J.3    Yalamov, P.Y.4
  • 45
    • 0012379036 scopus 로고    scopus 로고
    • New generalized data structures for matrices lead to a variety of high-performance algorithms
    • B. Engquist, L. Johnson, M. Hammill, and F. Short, eds.
    • F.G. Gustavson, "New Generalized Data Structures for Matrices Lead to a Variety of High-Performance Algorithms," Simulation and Visualization on the Grid, B. Engquist, L. Johnson, M. Hammill, and F. Short, eds., 2000.
    • (2000) Simulation and Visualization on the Grid
    • Gustavson, F.G.1
  • 46
    • 0012474071 scopus 로고    scopus 로고
    • Techniques for improving the data locality of iterative methods
    • Technical Report MRR97-038, Institut für Mathematik, Universität Augsburg, Germany, Oct.
    • L. Stals and U. Rüde, "Techniques for Improving the Data Locality of Iterative Methods," Technical Report MRR97-038, Institut für Mathematik, Universität Augsburg, Germany, Oct. 1997.
    • (1997)
    • Stals, L.1    Rüde, U.2
  • 47
    • 0001255125 scopus 로고    scopus 로고
    • Report of the working group on storage I/O for large-scale computing
    • Dec.
    • G. Gibson, J.S. Vitter, and J. Wilkes, "Report of the Working Group on Storage I/O for Large-Scale Computing," ACM Computing Surveys, Dec. 1996.
    • (1996) ACM Computing Surveys
    • Gibson, G.1    Vitter, J.S.2    Wilkes, J.3
  • 48
    • 0031122907 scopus 로고    scopus 로고
    • Efficient out-of-core algorithms for linear relaxation using blocking covers
    • C.E. Leiserson, S. Rao, and S. Toledo, "Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers," J. Computer and System Sciences, vol. 54, no. 2, pp. 332-344, 1997.
    • (1997) J. Computer and System Sciences , vol.54 , Issue.2 , pp. 332-344
    • Leiserson, C.E.1    Rao, S.2    Toledo, S.3
  • 50
    • 0024935630 scopus 로고
    • More iteration space tiling
    • Nov.
    • M.J. Wolfe, "More Iteration Space Tiling," Proc. Supercomputing '89, pp. 655-664, Nov. 1989.
    • (1989) Proc. Supercomputing '89 , pp. 655-664
    • Wolfe, M.J.1
  • 55
    • 0025381427 scopus 로고
    • Data optimization: Allocation of arrays to reduce communication on SIMD machines
    • Feb.
    • K. Knobe, J.D. Lukas, and G.L. Steele, Jr., "Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines," J. Parallel and Distributed Computing, vol. 8, no. 2, pp. 102-118, Feb. 1990.
    • (1990) J. Parallel and Distributed Computing , vol.8 , Issue.2 , pp. 102-118
    • Knobe, K.1    Lukas, J.D.2    Steele G.L., Jr.3
  • 56
    • 0004067602 scopus 로고
    • Automatic data partitioning on distributed memory multicomputers
    • PhD Thesis, University of Illinois at Urbana-Champaign, Urbana, Sept.
    • M. Gupta, "Automatic Data Partitioning on Distributed Memory Multicomputers," PhD Thesis, University of Illinois at Urbana-Champaign, Urbana, Sept. 1992.
    • (1992)
    • Gupta, M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.