메뉴 건너뛰기




Volumn 61, Issue 1, 2012, Pages 60-72

FPGA-based high-performance and scalable block LU decomposition architecture

Author keywords

ATLAS; block LU; floating point arithmetics; FPGA; GPU; hardware acceleration; Intel MKL; LU decomposition; scaling; single double precision

Indexed keywords

ATLAS; FLOATING-POINT ARITHMETIC; GPU; HARDWARE ACCELERATION; INTEL-MKL; LU DECOMPOSITION; SCALING; SINGLE/DOUBLE PRECISION;

EID: 82555168407     PISSN: 00189340     EISSN: None     Source Type: Journal    
DOI: 10.1109/TC.2011.24     Document Type: Article
Times cited : (58)

References (35)
  • 1
    • 84973744454 scopus 로고
    • Large dense numerical linear algebra in 1993: The parallel computing influence
    • A. Edelman, "Large Dense Numerical Linear Algebra in 1993: The Parallel Computing Influence," Int'l J. Supercomputer Applications, vol. 7, pp. 113-128, 1993.
    • (1993) Int'l J. Supercomputer Applications , vol.7 , pp. 113-128
    • Edelman, A.1
  • 2
    • 0029324485 scopus 로고
    • Software libraries for linear algebra computations on high performance computers
    • J. J. Dongarra and D. W. Walker, "Software Libraries for Linear Algebra Computations on High Performance Computers," SIAM Rev., vol. 37, pp. 151-180, 1995.
    • (1995) SIAM Rev. , vol.37 , pp. 151-180
    • Dongarra, J.J.1    Walker, D.W.2
  • 3
    • 0000667923 scopus 로고
    • The torus-wrap mapping for dense matrix calculations on massively parallel computers
    • B. A. Hendrickson and D. E. Womble, "The Torus-Wrap Mapping for Dense Matrix Calculations on Massively Parallel Computers," SIAM J. Scientific Computing, vol. 15, no. 5, pp. 1201-1226, 1994.
    • (1994) SIAM J. Scientific Computing , vol.15 , Issue.5 , pp. 1201-1226
    • Hendrickson, B.A.1    Womble, D.E.2
  • 4
    • 0025448609 scopus 로고
    • Origin and development of the method of moments for field computation
    • DOI 10.1109/74.80522
    • R. Harrington, "Origin and Development of the Method of Moments for Field Computation," IEEE Antennas and Propagation Magazine, vol. 32, no. 3, pp. 31-35, June 1990. (Pubitemid 20725243)
    • (1990) IEEE Antennas and Propagation Magazine , vol.32 , Issue.3 , pp. 31-35
    • Harrington Roger1
  • 5
    • 0000589993 scopus 로고
    • Panel methods in computational fluid dynamics
    • Jan.
    • J. L. Hess, "Panel Methods in Computational Fluid Dynamics," Ann. Rev. of Fluid Mechanics, vol. 22, pp. 225-274, Jan. 1990.
    • (1990) Ann. Rev. of Fluid Mechanics , vol.22 , pp. 225-274
    • Hess, J.L.1
  • 8
    • 0026913668 scopus 로고
    • Stability of block algorithms with fast level-3 BLAS
    • Sept.
    • J. W. Demmel and N. J. Higham, "Stability of Block Algorithms with Fast Level-3 BLAS," ACM Trans. Math. Software, vol. 18, no. 3, pp. 274-291, Sept. 1992.
    • (1992) ACM Trans. Math. Software , vol.18 , Issue.3 , pp. 274-291
    • Demmel, J.W.1    Higham, N.J.2
  • 11
    • 0026283229 scopus 로고
    • A new approach for automatic parallelization of blocked linear algebra computations
    • H. T. Kung and J. Subhlok, "A New Approach for Automatic Parallelization of Blocked Linear Algebra Computations," Supercomputing'91: Proc. ACM/IEEE Conf. Supercomputing, pp. 122-129, 1991.
    • (1991) Supercomputing'91: Proc. ACM/IEEE Conf. Supercomputing , pp. 122-129
    • Kung, H.T.1    Subhlok, J.2
  • 13
    • 45449117672 scopus 로고    scopus 로고
    • Implementation and optimization of dense LU decomposition on the stream processor
    • R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, eds., Springer
    • Y. Zhang, T. Tang, G. Li, and X. Yang, "Implementation and Optimization of Dense LU Decomposition on the Stream Processor," Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, eds., pp. 78-88, Springer, 2008.
    • (2008) Parallel Processing and Applied Mathematics , pp. 78-88
    • Zhang, Y.1    Tang, T.2    Li, G.3    Yang, X.4
  • 18
    • 47049109081 scopus 로고    scopus 로고
    • High-performance designs for linear algebra operations on reconfigurable hardware
    • Aug.
    • L. Zhuo and V. K. Prasanna, "High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware," IEEE Trans. Computers, vol. 57, no. 8, pp. 1057-1071, Aug. 2008.
    • (2008) IEEE Trans. Computers , vol.57 , Issue.8 , pp. 1057-1071
    • Zhuo, L.1    Prasanna, V.K.2
  • 20
  • 29
    • 82555163061 scopus 로고    scopus 로고
    • QDR II SRAM interface for virtex-5 devices
    • Oct.
    • L. Gopalakrishnan, "QDR II SRAM Interface for Virtex-5 Devices," Xilinx Application Note (XAPP853), http://www.xilinx.com/support/ documentation/application-notes/xapp853.pdf, Oct. 2008.
    • (2008) Xilinx Application Note (XAPP853)
    • Gopalakrishnan, L.1
  • 30
    • 57049186554 scopus 로고    scopus 로고
    • High-performance mixed-precision linear solver for FPGAs
    • Dec.
    • J. Sun, G. Peterson, and O. Storaasli, "High-Performance Mixed-Precision Linear Solver for FPGAs," IEEE Trans. Computers, vol. 57, no. 12, pp. 1614-1623, Dec. 2008.
    • (2008) IEEE Trans. Computers , vol.57 , Issue.12 , pp. 1614-1623
    • Sun, J.1    Peterson, G.2    Storaasli, O.3
  • 31
    • 82555169975 scopus 로고    scopus 로고
    • "AMD Core Math Library (ACML)," http://developer.amd.com/cpu/ Libraries/acml/Pages/default.aspx, 2011.
    • (2011) AMD Core Math Library (ACML)


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.