메뉴 건너뛰기




Volumn 93, Issue 2, 2005, Pages 358-385

Is search really necessary to generate high-performance BLAS?

Author keywords

Basic Linear Algebra Subprograms (BLAS); Compilers; Empirical optimization; High performance computing; Library generators; Model driven optimization; Program optimization

Indexed keywords

COMPUTER OPERATING SYSTEMS; COMPUTER SOFTWARE; MATHEMATICAL TRANSFORMATIONS; OPTIMIZATION; PARAMETER ESTIMATION; PROGRAM COMPILERS;

EID: 20744459570     PISSN: 00189219     EISSN: None     Source Type: Journal    
DOI: 10.1109/JPROC.2004.840444     Document Type: Conference Paper
Times cited : (108)

References (43)
  • 1
    • 20744439191 scopus 로고    scopus 로고
    • [Online]
    • ATLAS home page [Online]. Available: http.//math-atlas.sourceforge.net/
    • ATLAS Home Page
  • 2
    • 20744450712 scopus 로고    scopus 로고
    • [Online]
    • PHiPAC home page [Online]. Available: http://www.icsi.berkeley.edu/ ~bilmes/phipac
    • PHiPAC Home Page
  • 3
    • 0028427170 scopus 로고
    • Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch
    • R. C. Agarwal, F. G. Gustavson, and M. Zubair, "Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch," IBM J. Res. Develop., vol. 38, no. 3, pp. 265-275, 1994.
    • (1994) IBM J. Res. Develop. , vol.38 , Issue.3 , pp. 265-275
    • Agarwal, R.C.1    Gustavson, F.G.2    Zubair, M.3
  • 6
    • 0030661485 scopus 로고    scopus 로고
    • Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology
    • Vienna, Austria
    • J. Bilmes, K. Asanović, C.-w. Chin, and J. Demmel, "Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology," presented at the Int. Conf. Supercomputing, Vienna, Austria, 1997.
    • (1997) Int. Conf. Supercomputing
    • Bilmes, J.1    Asanović, K.2    Chin, C.-W.3    Demmel, J.4
  • 8
    • 0000493064 scopus 로고
    • Estimating interlock and improving balance for pipelined architectures
    • D. Callahan, J. Cocke, and K. Kennedy, "Estimating interlock and improving balance for pipelined architectures," J. Parallel Distrib. Comput., vol. 5, no. 4, pp. 334-358, 1988.
    • (1988) J. Parallel Distrib. Comput. , vol.5 , Issue.4 , pp. 334-358
    • Callahan, D.1    Cocke, J.2    Kennedy, K.3
  • 13
    • 0026933251 scopus 로고
    • Some efficient solutions to the affine scheduling problem - Part 1: One dimensional time
    • Oct.
    • P. Feautrier, "Some efficient solutions to the affine scheduling problem - Part 1: One dimensional time," Int. J. Parallel Program., vol. 1, no. 5, pp. 313-348, Oct. 1992.
    • (1992) Int. J. Parallel Program. , vol.1 , Issue.5 , pp. 313-348
    • Feautrier, P.1
  • 14
    • 0036575993 scopus 로고    scopus 로고
    • Yet another optimization article
    • May/Jun.
    • M. Fowler, "Yet another optimization article," IEEE Softw., vol. 19, no. 3, pp. 20-21, May/Jun. 2002.
    • (2002) IEEE Softw. , vol.19 , Issue.3 , pp. 20-21
    • Fowler, M.1
  • 17
    • 20744449792 scopus 로고    scopus 로고
    • The design and implementation of FFTW3
    • Feb.
    • _, "The design and implementation of FFTW3," Proc. IEEE, vol. 93, no. 2, pp. 216-231, Feb. 2005.
    • (2005) Proc. IEEE , vol.93 , Issue.2 , pp. 216-231
  • 19
    • 1542392269 scopus 로고    scopus 로고
    • On reducing TLB misses in matrix multiplication
    • Dept. Comput. Sci., Univ. Texas, Austin
    • K. Goto and R. van de Geijn, "On reducing TLB misses in matrix multiplication," Dept. Comput. Sci., Univ. Texas, Austin, Tech. Rep. TR-2002-55, 2002.
    • (2002) Tech. Rep. , vol.TR-2002-55
    • Goto, K.1    Van De Geijn, R.2
  • 21
    • 20744436023 scopus 로고    scopus 로고
    • private communication
    • F. Gustavson, private communication, 2004.
    • (2004)
    • Gustavson, F.1
  • 23
    • 20744456697 scopus 로고    scopus 로고
    • Flexible High-Performance Matrix Multiply via Self-Modifying Runtime Code
    • Dept. Comput. Sci., Univ. Texas, Austin, Dec.
    • "Flexible High-Performance Matrix Multiply via Self-Modifying Runtime Code," Dept. Comput. Sci., Univ. Texas, Austin, Tech. Rep. CS-TR-01-44, Dec. 2001.
    • (2001) Tech. Rep. , vol.CS-TR-01-44
  • 24
  • 27
    • 10844294800 scopus 로고    scopus 로고
    • Imperfectly nested loop transformations for memory hierarchy management
    • Rhodes, Greece, June
    • I. Kodukula and K. Pingali, "Imperfectly nested loop transformations for memory hierarchy management," presented at the Int. Conf. Supercomputing, Rhodes, Greece, June 1999.
    • (1999) Int. Conf. Supercomputing
    • Kodukula, I.1    Pingali, K.2
  • 28
    • 0027694019 scopus 로고
    • Access normalization: Loop restructuring for NUMA compilers
    • W. Li and K. Pingali, "Access normalization: Loop restructuring for NUMA compilers," ACM Trans. Comput. Syst., 1993.
    • (1993) ACM Trans. Comput. Syst.
    • Li, W.1    Pingali, K.2
  • 29
    • 0014701246 scopus 로고
    • Evaluation techniques for storage hierarchies
    • R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger, "Evaluation techniques for storage hierarchies," IBM Syst. J., vol. 9, no. 2, pp. 78-92, 1970.
    • (1970) IBM Syst. J. , vol.9 , Issue.2 , pp. 78-92
    • Mattson, R.L.1    Gecsei, J.2    Slutz, D.R.3    Traiger, I.L.4
  • 30
    • 84945709131 scopus 로고
    • Organizing matrices and matrix operations for paged memory systems
    • A. C. McKellar and E. G. Coffman Jr., "Organizing matrices and matrix operations for paged memory systems," Commun. ACM, vol. 12, no. 3, pp. 153-165, 1969.
    • (1969) Commun. ACM , vol.12 , Issue.3 , pp. 153-165
    • McKellar, A.C.1    Coffman Jr., E.G.2
  • 31
    • 0022874874 scopus 로고
    • Advanced compiler optimization for supercomputers
    • Dec
    • D. Padua and M. Wolfe, "Advanced compiler optimization for supercomputers," Commun. ACM, vol. 29, no. 12, pp. 1184-1201, Dec, 1986.
    • (1986) Commun. ACM , vol.29 , Issue.12 , pp. 1184-1201
    • Padua, D.1    Wolfe, M.2
  • 33
    • 0024898517 scopus 로고
    • Engineering and scientific subroutine library release 3 for IBM ES/3090 vector multiprocessors
    • J. McComb, R. C. Agarwal, F. G. Gustavson, and S. Schmidt, "Engineering and scientific subroutine library release 3 for IBM ES/3090 vector multiprocessors," IBM Syst. J., vol. 28, no. 2, pp. 345-350, 1989.
    • (1989) IBM Syst. J. , vol.28 , Issue.2 , pp. 345-350
    • McComb, J.1    Agarwal, R.C.2    Gustavson, F.G.3    Schmidt, S.4
  • 35
    • 0003929457 scopus 로고
    • Automatic blocking of nested loops
    • Univ. Tennessee, Knoxville
    • R. Schreiber and J. Dongarra, "Automatic blocking of nested loops," Univ. Tennessee, Knoxville, Tech. Rep. CS-90-108, 1990.
    • (1990) Tech. Rep. , vol.CS-90-108
    • Schreiber, R.1    Dongarra, J.2
  • 36
    • 20744440107 scopus 로고    scopus 로고
    • private communication
    • R. C. Whaley, private communication, 2004.
    • (2004)
    • Whaley, R.C.1
  • 37
    • 20744443273 scopus 로고    scopus 로고
    • [Online]
    • _, x86 optimizations, part 1. [Online], Available: http://sourceforge. net/mailarchive/forum.php?thread_id=1569256&forum_id=426
    • X86 Optimizations, Part 1
  • 38
    • 13244261416 scopus 로고    scopus 로고
    • [Online]
    • _, User contribution to ATLAS. [Online]. Available: http://math-atlas. sourceforge.net/devel/atlas_contrib
    • User Contribution to ATLAS
  • 39
    • 13244279577 scopus 로고    scopus 로고
    • Minimizing development and maintenance costs in supporting persistently optimized BLAS
    • to be published
    • R. C. Whaley and A. Petitet, "Minimizing development and maintenance costs in supporting persistently optimized BLAS," Softw. Pract. Exper., to be published.
    • Softw. Pract. Exper.
    • Whaley, R.C.1    Petitet, A.2
  • 40
    • 0343462141 scopus 로고    scopus 로고
    • Automated empirical optimization of software and the ATLAS project
    • R. C. Whaley, A. Petitet, and J. J. Dongarra, "Automated empirical optimization of software and the ATLAS project," Parallel Comput, vol. 27, no. 1-2, pp. 3-35, 2001.
    • (2001) Parallel Comput , vol.27 , Issue.1-2 , pp. 3-35
    • Whaley, R.C.1    Petitet, A.2    Dongarra, J.J.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.