메뉴 건너뛰기




Volumn 34, Issue 4, 2008, Pages

Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy

Author keywords

Floating point; Iterative refinement; Linear systems; Precision

Indexed keywords

FIELD PROGRAMMABLE GATE ARRAY (FPGA); FLOATING POINT ARITHMETICS; GRAPHICAL PROCESSING UNITS (GPU); ITERATIVE TECHNIQUES; KRYLOV SUBSPACE METHODS; SPARSE LINEAR ALGEBRA; SPARSE MATRIX COMPUTATIONS;

EID: 35548992612     PISSN: 00983500     EISSN: 15577295     Source Type: Journal    
DOI: 10.1145/1377596.1377597     Document Type: Article
Times cited : (77)

References (51)
  • 1
    • 0033884893 scopus 로고    scopus 로고
    • Multifrontal parallel distributed symmetric and unsymmetric solvers
    • AMESTOY, P. R., DUFF, I. S., AND L'EXCELLENT, J.-Y. 2000. Multifrontal parallel distributed symmetric and unsymmetric solvers. Comput. Meth. Appl. Mech. Eng. 184, 501-520.
    • (2000) Comput. Meth. Appl. Mech. Eng , vol.184 , pp. 501-520
    • AMESTOY, P.R.1    DUFF, I.S.2    L'EXCELLENT, J.-Y.3
  • 2
    • 0036060795 scopus 로고    scopus 로고
    • A fully asynchronous multifrontal solver using distributed dynamic scheduling
    • AMESTOY, P. R., DUFF, I. S., L'EXCELLENT, J.-Y., AND KOSTER, J. 2001. A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J. Matrix Anal. Appl. 23, 15-41.
    • (2001) SIAM J. Matrix Anal. Appl , vol.23 , pp. 15-41
    • AMESTOY, P.R.1    DUFF, I.S.2    L'EXCELLENT, J.-Y.3    KOSTER, J.4
  • 5
    • 84972716214 scopus 로고
    • Progress in sparse matrix methods in large sparse linear systems on vector supercomputers
    • ASHCRAFT, C., GRIMES, R., LEWIS, J., PEYTON, B. W., AND SIMON, H. 1987. Progress in sparse matrix methods in large sparse linear systems on vector supercomputers. Intern. J. of Super-comput. Appl. 1, 10-30.
    • (1987) Intern. J. of Super-comput. Appl , vol.1 , pp. 10-30
    • ASHCRAFT, C.1    GRIMES, R.2    LEWIS, J.3    PEYTON, B.W.4    SIMON, H.5
  • 6
    • 0001116752 scopus 로고
    • A black box generalized conjugate gradient solver with inner iterations and variable-step preconditioning
    • AXELSSON, O. AND VASSILEVSKI, P. S. 1991. A black box generalized conjugate gradient solver with inner iterations and variable-step preconditioning. SIAM J. Matrix Anal. Appl. 12, 4, 625-644.
    • (1991) SIAM J. Matrix Anal. Appl , vol.12 , Issue.4 , pp. 625-644
    • AXELSSON, O.1    VASSILEVSKI, P.S.2
  • 8
    • 48249146098 scopus 로고    scopus 로고
    • BARRETT, R., BERRY, M., CHAN, T. F., DEMMEL, J., DONATO, J. M., DONGARRA, J., EIJKHOUT, V., POZO, R., ROMINE, C., AND DER VORST, H. V. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial and Applied Mathematics, Philadelphia, PA. http://www.netlib.org/templates/Templates.html.
    • BARRETT, R., BERRY, M., CHAN, T. F., DEMMEL, J., DONATO, J. M., DONGARRA, J., EIJKHOUT, V., POZO, R., ROMINE, C., AND DER VORST, H. V. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial and Applied Mathematics, Philadelphia, PA. http://www.netlib.org/templates/Templates.html.
  • 9
    • 0012094472 scopus 로고
    • Iterative refinement and reliable computing
    • M. G. Cox and S. Hammarling, Eds. Oxford University Press, Oxford, UK
    • BJÖRCK, A. 1990. Iterative refinement and reliable computing. In Reliable Numerical Computation, M. G. Cox and S. Hammarling, Eds. Oxford University Press, Oxford, UK, 249-266.
    • (1990) Reliable Numerical Computation , pp. 249-266
    • BJÖRCK, A.1
  • 10
    • 48249100035 scopus 로고    scopus 로고
    • BUTTARI, A., DONGARRA, J., KURZAK, J., LUSZCZEK, P., AND TOMOV, S. 2006. Computations to enhance the performance while achieving the 64-bit accuracy. Tech. rep. UT-CS-06-584, University of Tennessee Knoxville. LAPACK Working Note 180.
    • BUTTARI, A., DONGARRA, J., KURZAK, J., LUSZCZEK, P., AND TOMOV, S. 2006. Computations to enhance the performance while achieving the 64-bit accuracy. Tech. rep. UT-CS-06-584, University of Tennessee Knoxville. LAPACK Working Note 180.
  • 11
    • 0001946784 scopus 로고    scopus 로고
    • A combined unifrontal/multifrontal method for unsymmetric sparse matrices
    • DAVIS, T. A. 1999. A combined unifrontal/multifrontal method for unsymmetric sparse matrices. ACM Trans. Math. Softw. 25, 1-19.
    • (1999) ACM Trans. Math. Softw , vol.25 , pp. 1-19
    • DAVIS, T.A.1
  • 12
    • 2942655475 scopus 로고    scopus 로고
    • A column pre-ordering strategy for the unsymmetric-pattern multifrontal method
    • DAVIS, T. A. 2004. A column pre-ordering strategy for the unsymmetric-pattern multifrontal method. ACM Trans. Math. Softw. 30, 196-199.
    • (2004) ACM Trans. Math. Softw , vol.30 , pp. 196-199
    • DAVIS, T.A.1
  • 14
    • 0003424372 scopus 로고    scopus 로고
    • Society for Industrial and Applied Mathematics, Philadelphia, PA
    • DEMMEL, J. W. 1997. Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA.
    • (1997) Applied Numerical Linear Algebra
    • DEMMEL, J.W.1
  • 16
    • 0033242228 scopus 로고    scopus 로고
    • A asynchronous parallel supernodal algorithm for sparse gaussian elimination
    • DEMMEL, J. W., GILBERT, J. R., AND LI, X. S. 1999b. A asynchronous parallel supernodal algorithm for sparse gaussian elimination. SIAM J. Matrix Anal. Appl. 20, 3, 915-952.
    • (1999) SIAM J. Matrix Anal. Appl , vol.20 , Issue.3 , pp. 915-952
    • DEMMEL, J.W.1    GILBERT, J.R.2    LI, X.S.3
  • 17
    • 48249134675 scopus 로고    scopus 로고
    • No. 4, Article 17, Pub. date
    • ACM Transactions on Mathematical Software, July
    • ACM Transactions on Mathematical Software, Vol. 34, No. 4, Article 17, Pub. date: July 2008.
    • (2008)
  • 18
    • 48249127530 scopus 로고    scopus 로고
    • DONGARRA, J. J. AND EIJKHOUT, V. 2002. Self-adapting numerical software for next generation applications. Tech. rep. ICL-UT-02-07, Innovative Computing Lab, University of Tennessee, La-pack Working Note 157. http://icl.cs.utk.edu/iclprojects/pages/sans.html.
    • DONGARRA, J. J. AND EIJKHOUT, V. 2002. Self-adapting numerical software for next generation applications. Tech. rep. ICL-UT-02-07, Innovative Computing Lab, University of Tennessee, La-pack Working Note 157. http://icl.cs.utk.edu/iclprojects/pages/sans.html.
  • 19
    • 0020822138 scopus 로고
    • The multifrontal solution of indefinite sparse symmetric linear equations
    • DUFF, I. S. AND REID, J. K. 1983. The multifrontal solution of indefinite sparse symmetric linear equations. ACM Trans. Math. Softw. 9, 3, 302-325.
    • (1983) ACM Trans. Math. Softw , vol.9 , Issue.3 , pp. 302-325
    • DUFF, I.S.1    REID, J.K.2
  • 20
    • 0031541111 scopus 로고    scopus 로고
    • An unsymmetric-pattern multifrontal method for sparse LU factorization
    • DUFF, T A. D. I. S. 1997. An unsymmetric-pattern multifrontal method for sparse LU factorization. SIAMJ. Matrix Anal. Appl. 18, 140-158.
    • (1997) SIAMJ. Matrix Anal. Appl , vol.18 , pp. 140-158
    • DUFF, T.A.D.I.S.1
  • 21
    • 0038718852 scopus 로고    scopus 로고
    • The tortoise and the hare restart gmres
    • EMBREE, M. 2003. The tortoise and the hare restart gmres. SIAM Rev. 45, 259-266.
    • (2003) SIAM Rev , vol.45 , pp. 259-266
    • EMBREE, M.1
  • 23
    • 48249124525 scopus 로고    scopus 로고
    • GODDEKE, D., STRZODKA, R., AND TUREK, S. 2005. Accelerating double precision FEM simulations with GPUs. In Simulationstechnique 18th Symposium in Erlangen. F. Hülsemann, M. Kowarschik, and U. Rude, Eds. Frontiers in Simulation. SCS Publishing House e.V., 139-144.
    • GODDEKE, D., STRZODKA, R., AND TUREK, S. 2005. Accelerating double precision FEM simulations with GPUs. In Simulationstechnique 18th Symposium in Erlangen. F. Hülsemann, M. Kowarschik, and U. Rude, Eds. Vol. Frontiers in Simulation. SCS Publishing House e.V., 139-144.
  • 24
    • 0004236492 scopus 로고
    • 2nd Ed. Johns Hopkins University Press, Baltimore, MD
    • GOLUB, G. H. AND LOAN, C. F. V. 1989. Matrix Computations 2nd Ed. Johns Hopkins University Press, Baltimore, MD.
    • (1989) Matrix Computations
    • GOLUB, G.H.1    LOAN, C.F.V.2
  • 25
    • 0033293064 scopus 로고    scopus 로고
    • Inexact preconditioned conjugate gradient method with innerouter iteration
    • GOLUB, G. H. AND Ye, Q. 2000. Inexact preconditioned conjugate gradient method with innerouter iteration. SIAMJ. Scie. Comput. 21, 4, 1305-1320.
    • (2000) SIAMJ. Scie. Comput , vol.21 , Issue.4 , pp. 1305-1320
    • GOLUB, G.H.1    Ye, Q.2
  • 26
    • 48249109471 scopus 로고    scopus 로고
    • GROPP, W. D., KAUSHIK, D. K., KEYES, D. E., AND SMITH, B. F. 2000. Latency, bandwidth, and concurrent issue limitations in high-performance CFD. Tech. rep. ANL/MCS-P850-1000, Argonne National Laboratory.
    • GROPP, W. D., KAUSHIK, D. K., KEYES, D. E., AND SMITH, B. F. 2000. Latency, bandwidth, and concurrent issue limitations in high-performance CFD. Tech. rep. ANL/MCS-P850-1000, Argonne National Laboratory.
  • 29
    • 0003360974 scopus 로고
    • Multigrid Methods and Applications
    • Springer-Verlag, Berlin, Germany
    • HACKBUSCH, W. 1985. Multigrid Methods and Applications. Springer Series in Computational Mathematics, Vol. 4, Springer-Verlag, Berlin, Germany.
    • (1985) Springer Series in Computational Mathematics , vol.4
    • HACKBUSCH, W.1
  • 32
    • 48249134304 scopus 로고    scopus 로고
    • Ph.D. thesis, Computer Science Department, University of California at Berkeley
    • LI, X. S. 1996. SuperLU software, Ph.D. thesis, Computer Science Department, University of California at Berkeley. http://www.nersc.gov/xiaoye/ SuperLU/.
    • (1996) SuperLU software
    • LI, X.S.1
  • 33
    • 0038621899 scopus 로고    scopus 로고
    • SuperLU-DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
    • LI, X. S. AND DEMMEL, J. W. 2003. SuperLU-DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans. Math. Softw. 29, 110-140.
    • (2003) ACM Trans. Math. Softw , vol.29 , pp. 110-140
    • LI, X.S.1    DEMMEL, J.W.2
  • 34
    • 0001467517 scopus 로고
    • Iterative refinement in floating point
    • MOLER, C. B. 1967. Iterative refinement in floating point. J. ACM 14, 2, 316-321.
    • (1967) J. ACM , vol.14 , Issue.2 , pp. 316-321
    • MOLER, C.B.1
  • 35
    • 0034880095 scopus 로고    scopus 로고
    • Flexible conjugate gradients
    • NOTAY, Y. 2000. Flexible conjugate gradients. SIAMJ. Scie. Comput. 22, 1444-1460.
    • (2000) SIAMJ. Scie. Comput , vol.22 , pp. 1444-1460
    • NOTAY, Y.1
  • 37
    • 18744400955 scopus 로고
    • A flexible inner-outer preconditioned GMRES algorithm
    • Tech. rep. 91-279, Department of Computer Science and Egineering, University of Minnesota, Minneapolis, MN
    • SAAD, Y. 1991. A flexible inner-outer preconditioned GMRES algorithm. Tech. rep. 91-279, Department of Computer Science and Egineering, University of Minnesota, Minneapolis, MN
    • (1991)
    • SAAD, Y.1
  • 38
    • 1842829625 scopus 로고    scopus 로고
    • Society for Industrial and Applied Mathematics, Philadelphia, PA
    • SAAD, Y. 2003. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA.
    • (2003) Iterative Methods for Sparse Linear Systems
    • SAAD, Y.1
  • 39
    • 0000048673 scopus 로고
    • GMRES: A generalized minimal residual method for solving nonsymmetric linear systems
    • Saad, Y. AND Schultz, M. H. 1986. GMRES: A generalized minimal residual method for solving nonsymmetric linear systems. SIAMJ. Sci. Statist. Comput., 856-869.
    • (1986) SIAMJ. Sci. Statist. Comput , pp. 856-869
    • Saad, Y.1    Schultz, M.H.2
  • 40
    • 21344474772 scopus 로고    scopus 로고
    • DQGMRES: A direct quasi-minimal residual algorithm based on incomplete orthogonalization
    • Saad, Y. AND WAU, K. 1996. DQGMRES: a direct quasi-minimal residual algorithm based on incomplete orthogonalization. Num. Linear Algeb. Appl. 3, 4, 329-343.
    • (1996) Num. Linear Algeb. Appl , vol.3 , Issue.4 , pp. 329-343
    • Saad, Y.1    WAU, K.2
  • 41
    • 0142064936 scopus 로고    scopus 로고
    • Theory of inexact Krylov subspace methods and applications to scientific computing
    • Tech. rep. 02-4-12, Department of Mathematics, Temple University
    • SIMONCINI, V. AND SZYLD, D. 2002a. Theory of inexact Krylov subspace methods and applications to scientific computing. Tech. rep. 02-4-12, Department of Mathematics, Temple University.
    • (2002)
    • SIMONCINI, V.1    SZYLD, D.2
  • 42
    • 0347337972 scopus 로고    scopus 로고
    • Flexible inner-outer Krylov subspace methods
    • SIMONCINI, V. AND SZYLD, D. B. 2002b. Flexible inner-outer Krylov subspace methods. SIAMJ. Numer. Anal. 40, 6, 2219-2239.
    • (2002) SIAMJ. Numer. Anal , vol.40 , Issue.6 , pp. 2219-2239
    • SIMONCINI, V.1    SZYLD, D.B.2
  • 43
    • 20044382367 scopus 로고    scopus 로고
    • The effect of non-optimal bases on the convergence of Krylov subspace methods
    • SIMONCINI, V. AND SZYLD, D. B. 2005. The effect of non-optimal bases on the convergence of Krylov subspace methods. Numer. Math. 100, 4, 711-733.
    • (2005) Numer. Math , vol.100 , Issue.4 , pp. 711-733
    • SIMONCINI, V.1    SZYLD, D.B.2
  • 44
    • 0004094905 scopus 로고    scopus 로고
    • Society for Industrial and Applied Mathematics, Philadelphia, PA
    • STEWART, G. W. 2001. Matrix algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA.
    • (2001) Matrix algorithms
    • STEWART, G.W.1
  • 45
    • 48249150668 scopus 로고    scopus 로고
    • STRZODKA, R. AND GÖDDEKE, D. 2006a. Mixed precision methods for convergent iterative schemes. EDGE'06, 23.-24. Chapel Hill, NC.
    • STRZODKA, R. AND GÖDDEKE, D. 2006a. Mixed precision methods for convergent iterative schemes. EDGE'06, 23.-24. Chapel Hill, NC.
  • 46
    • 34547415101 scopus 로고    scopus 로고
    • Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components
    • IEEE Computer Society Press. To appear
    • STRZODKA, R. AND GÖDDEKE, D. 2006b. Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components. In IEEE Proceedings on Field - Programmable Custom Computing Machines (FCCM'06). IEEE Computer Society Press. To appear.
    • (2006) IEEE Proceedings on Field - Programmable Custom Computing Machines (FCCM'06)
    • STRZODKA, R.1    GÖDDEKE, D.2
  • 47
    • 0001996967 scopus 로고
    • Efficient high accuracy solutions with gmres(m)
    • TURNER, K. AND WALKER, H. F. 1992. Efficient high accuracy solutions with gmres(m). SIAM J. Sci. Stat. Comput. 13, 3, 815-825.
    • (1992) SIAM J. Sci. Stat. Comput , vol.13 , Issue.3 , pp. 815-825
    • TURNER, K.1    WALKER, H.F.2
  • 48
    • 48249148677 scopus 로고    scopus 로고
    • VAN DEN ESHOF, J., SLEIJPEN, G. L. G., AND VAN GIJZEN, M. B. 2003. Relaxation strategies for nested Krylov methods. Technical report TR/PA/03/27, CERFACS, Toulouse, France.
    • VAN DEN ESHOF, J., SLEIJPEN, G. L. G., AND VAN GIJZEN, M. B. 2003. Relaxation strategies for nested Krylov methods. Technical report TR/PA/03/27, CERFACS, Toulouse, France.
  • 49
    • 84985336395 scopus 로고
    • GMRESR: A family of nested GMRES methods
    • VAN DER VORST, H. A. AND VUIK, C. 1994. GMRESR: a family of nested GMRES methods. Num. Linear Algeb. Appl. 1, 4, 369-386.
    • (1994) Num. Linear Algeb. Appl , vol.1 , Issue.4 , pp. 369-386
    • VAN DER VORST, H.A.1    VUIK, C.2
  • 50
    • 0029425967 scopus 로고
    • New insights in gmres-like methods with variable preconditioners
    • VUIK, C. 1995. New insights in gmres-like methods with variable preconditioners. J. Comput. Appl. Math. 61, 2, 189-204.
    • (1995) J. Comput. Appl. Math , vol.61 , Issue.2 , pp. 189-204
    • VUIK, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.