SCOPUS 정보 검색 플랫폼

Volumn 61, Issue 1, 2012, Pages 60-72

FPGA-based high-performance and scalable block LU decomposition architecture

(2) Jaiswal, Manish Kumar a Chandrachoodan, Nitin b

b INDIAN INSTITUTE OF TECHNOLOGY MADRAS (India)

Author keywords

ATLAS; block LU; floating point arithmetics; FPGA; GPU; hardware acceleration; Intel MKL; LU decomposition; scaling; single double precision

Indexed keywords

ATLAS; FLOATING-POINT ARITHMETIC; GPU; HARDWARE ACCELERATION; INTEL-MKL; LU DECOMPOSITION; SCALING; SINGLE/DOUBLE PRECISION;

ALGORITHMS; FIELD PROGRAMMABLE GATE ARRAYS (FPGA); SCALE (DEPOSITS);

HARDWARE;

EID: 82555168407 PISSN: 00189340 EISSN: None Source Type: Journal
DOI: 10.1109/TC.2011.24 Document Type: Article

Times cited : (58)

References (35)

1
- 84973744454
- Large dense numerical linear algebra in 1993: The parallel computing influence
- A. Edelman, "Large Dense Numerical Linear Algebra in 1993: The Parallel Computing Influence," Int'l J. Supercomputer Applications, vol. 7, pp. 113-128, 1993.
- (1993) Int'l J. Supercomputer Applications , vol.7 , pp. 113-128
- Edelman, A.¹

2
- 0029324485
- Software libraries for linear algebra computations on high performance computers
- J. J. Dongarra and D. W. Walker, "Software Libraries for Linear Algebra Computations on High Performance Computers," SIAM Rev., vol. 37, pp. 151-180, 1995.
- (1995) SIAM Rev. , vol.37 , pp. 151-180
- Dongarra, J.J.¹ Walker, D.W.²

3
- 0000667923
- The torus-wrap mapping for dense matrix calculations on massively parallel computers
- B. A. Hendrickson and D. E. Womble, "The Torus-Wrap Mapping for Dense Matrix Calculations on Massively Parallel Computers," SIAM J. Scientific Computing, vol. 15, no. 5, pp. 1201-1226, 1994.
- (1994) SIAM J. Scientific Computing , vol.15 , Issue.5 , pp. 1201-1226
- Hendrickson, B.A.¹ Womble, D.E.²

4
- 0025448609
- Origin and development of the method of moments for field computation
- DOI 10.1109/74.80522
- R. Harrington, "Origin and Development of the Method of Moments for Field Computation," IEEE Antennas and Propagation Magazine, vol. 32, no. 3, pp. 31-35, June 1990. (Pubitemid 20725243)
- (1990) IEEE Antennas and Propagation Magazine , vol.32 , Issue.3 , pp. 31-35
- Harrington Roger¹

5
- 0000589993
- Panel methods in computational fluid dynamics
- Jan.
- J. L. Hess, "Panel Methods in Computational Fluid Dynamics," Ann. Rev. of Fluid Mechanics, vol. 22, pp. 225-274, Jan. 1990.
- (1990) Ann. Rev. of Fluid Mechanics , vol.22 , pp. 225-274
- Hess, J.L.¹

6
- 46249103564
- High-performance and parameterized matrix factorization on FPGAs
- Aug.
- L. Zhuo and V. K. Prasanna, "High-Performance and Parameterized Matrix Factorization on FPGAs," Proc. Int'l Conf. Field Programmable Logic and Applications (FPL'06), pp. 1-6, Aug. 2006.
- (2006) Proc. Int'l Conf. Field Programmable Logic and Applications (FPL'06) , pp. 1-6
- Zhuo, L.¹ Prasanna, V.K.²

7
- 84985321100
- Stability of block LU factorization
- J. W. Demmel, N. J. Higham, and R. S. Schreiber, "Stability of Block LU Factorization," Numerical Linear Algebra with Applications, vol. 2, no. 2, pp. 173-190, 1995.
- (1995) Numerical Linear Algebra with Applications , vol.2 , Issue.2 , pp. 173-190
- Demmel, J.W.¹ Higham, N.J.² Schreiber, R.S.³

8
- 0026913668
- Stability of block algorithms with fast level-3 BLAS
- Sept.
- J. W. Demmel and N. J. Higham, "Stability of Block Algorithms with Fast Level-3 BLAS," ACM Trans. Math. Software, vol. 18, no. 3, pp. 274-291, Sept. 1992.
- (1992) ACM Trans. Math. Software , vol.18 , Issue.3 , pp. 274-291
- Demmel, J.W.¹ Higham, N.J.²

9
- 82555169977
- A high performance implementation of LU decomposition on FPGA
- July
- M. K. Jaiswal and N. Chandrachoodan, "A High Performance Implementation of LU Decomposition on FPGA," Proc. 13th VLSI Design and Test Symp. (VDAT'09), pp. 124-134, July 2009.
- (2009) Proc. 13th VLSI Design and Test Symp. (VDAT'09) , pp. 124-134
- Jaiswal, M.K.¹ Chandrachoodan, N.²

10
- 85032398621
- "Automatically Tuned Linear Algebra Software (ATLAS)," http://www.netlib.org/atlas/, 2011.
- (2011) Automatically Tuned Linear Algebra Software (ATLAS)

11
- 0026283229
- A new approach for automatic parallelization of blocked linear algebra computations
- H. T. Kung and J. Subhlok, "A New Approach for Automatic Parallelization of Blocked Linear Algebra Computations," Supercomputing'91: Proc. ACM/IEEE Conf. Supercomputing, pp. 122-129, 1991.
- (1991) Supercomputing'91: Proc. ACM/IEEE Conf. Supercomputing , pp. 122-129
- Kung, H.T.¹ Subhlok, J.²

12
- 0039821550
- On the parallelization of blocked LU factorization algorithms on distributed memory architectures
- G. von Laszewski, M. Parashar, A. G. Mohamed, and G. C. Fox, "On the Parallelization of Blocked LU Factorization Algorithms on Distributed Memory Architectures," Supercomputing'92: Proc. ACM/IEEE Conf. Supercomputing, pp. 170-179, 1992.
- (1992) Supercomputing'92: Proc. ACM/IEEE Conf. Supercomputing , pp. 170-179
- Von Laszewski, G.¹ Parashar, M.² Mohamed, A.G.³ Fox, G.C.⁴

13
- 45449117672
- Implementation and optimization of dense LU decomposition on the stream processor
- R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, eds., Springer
- Y. Zhang, T. Tang, G. Li, and X. Yang, "Implementation and Optimization of Dense LU Decomposition on the Stream Processor," Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, eds., pp. 78-88, Springer, 2008.
- (2008) Parallel Processing and Applied Mathematics , pp. 78-88
- Zhang, Y.¹ Tang, T.² Li, G.³ Yang, X.⁴

14
- 82555164731
- Multi-FPGA based high performance LU decomposition
- Sept.
- A. Sudarsanam, S. Young, A. Dasu, and T. Hauser, "Multi-FPGA Based High Performance LU Decomposition," Proc. 10th High Performance Embedded Computing (HPEC) Workshop, Sept. 2006.
- (2006) Proc. 10th High Performance Embedded Computing (HPEC) Workshop
- Sudarsanam, A.¹ Young, S.² Dasu, A.³ Hauser, T.⁴

15
- 35248821411
- Time and energy efficient matrix factorization using FPGA
- Sept.
- S. Choi and V. K. Prasanna, "Time and Energy Efficient Matrix Factorization Using FPGA," Proc. Int'l Conf. Field-Programmable Logic and Applications (FPL'03), vol. 2278, pp. 507-519, Sept. 2003.
- (2003) Proc. Int'l Conf. Field-Programmable Logic and Applications (FPL'03) , vol.2278 , pp. 507-519
- Choi, S.¹ Prasanna, V.K.²

16
- 33745830023
- Efficient floating-point based block LU decomposition on FPGAs
- Apr.
- G. Govindu, S. Choi, and V. K. Prasanna, "Efficient Floating-Point Based Block LU Decomposition on FPGAs," Proc. 11th Reconfigurable Architectures Workshop, Apr. 2004.
- (2004) Proc. 11th Reconfigurable Architectures Workshop
- Govindu, G.¹ Choi, S.² Prasanna, V.K.³

17
- 12444323064
- A high-performance and energy-efficient architecture for floating-point based LU decomposition on FPGAs
- Apr.
- G. Govindu, S. Choi, V. Prasanna, V. Daga, S. Gangadharpalli, and V. Sridhar, "A High-Performance and Energy-Efficient Architecture for Floating-Point Based LU Decomposition on FPGAs," Proc. 18th Int'l Parallel and Distributed Processing Symp., p. 149, Apr. 2004.
- (2004) Proc. 18th Int'l Parallel and Distributed Processing Symp. , pp. 149
- Govindu, G.¹ Choi, S.² Prasanna, V.³ Daga, V.⁴ Gangadharpalli, S.⁵ Sridhar, V.⁶

18
- 47049109081
- High-performance designs for linear algebra operations on reconfigurable hardware
- Aug.
- L. Zhuo and V. K. Prasanna, "High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware," IEEE Trans. Computers, vol. 57, no. 8, pp. 1057-1071, Aug. 2008.
- (2008) IEEE Trans. Computers , vol.57 , Issue.8 , pp. 1057-1071
- Zhuo, L.¹ Prasanna, V.K.²

19
- 63049121558
- Portable and scalable FPGA-based acceleration of a direct linear system solver
- Dec.
- W. Zhang, V. Betz, and J. Rose, "Portable and Scalable FPGA-Based Acceleration of a Direct Linear System Solver," Proc. Int'l Conf. Field-Programmable Technology (FPT'08), pp. 17-24, Dec. 2008.
- (2008) Proc. Int'l Conf. Field-Programmable Technology (FPT'08) , pp. 17-24
- Zhang, W.¹ Betz, V.² Rose, J.³

20
- 82555171856
- "SRC Supercomputers," http://www.srccomp.com/, 2008.
- (2008) SRC Supercomputers

21
- 82555169978
- "SGI Supercomputers," http://www.sgi.com/, 2011.
- (2011) SGI Supercomputers

22
- 82555164732
- "Cray XD1 Supercomputers," http://www.cray.com/, 2008.
- (2008) Cray XD1 Supercomputers

23
- 33845468997
- LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware
- Nov.
- N. Galoppo, N. Govindaraju, M. Henson, and D. Manocha, "LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware," Proc. ACM/IEEE Conf. Supercomputing (SC), p. 3, Nov. 2005.
- (2005) Proc. ACM/IEEE Conf. Supercomputing (SC) , pp. 3
- Galoppo, N.¹ Govindaraju, N.² Henson, M.³ Manocha, D.⁴

24
- 70350771131
- Benchmarking GPUs to tune dense linear algebra
- V. Volkov and J. W. Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra," SC'08: Proc. ACM/IEEE Conf. Supercomputing, pp. 1-11, 2008.
- (2008) SC'08: Proc. ACM/IEEE Conf. Supercomputing , pp. 1-11
- Volkov, V.¹ Demmel, J.W.²

25
- 33646731004
- Performance study of LU decomposition on the programmable GPU
- F. Ino, M. Matsui, K. Goda, and K. Hagihara, "Performance Study of LU Decomposition on the Programmable GPU," Proc. Int'l Conf. High Performance Computing (HiPC), vol. 3769, pp. 83-94, 2005.
- (2005) Proc. Int'l Conf. High Performance Computing (HiPC) , vol.3769 , pp. 83-94
- Ino, F.¹ Matsui, M.² Goda, K.³ Hagihara, K.⁴

26
- 77954080759
- Dense linear algebra solvers for multicore with GPU accelerators
- Jan.
- S. Tomov, R. Nath, H. Ltaief, and J. Dongarra, "Dense Linear Algebra Solvers for Multicore with GPU Accelerators," Proc. Int'l Workshop High-Level Parallel Programming Models and Supportive Environments (HIPS'10), Jan. 2010.
- (2010) Proc. Int'l Workshop High-Level Parallel Programming Models and Supportive Environments (HIPS'10)
- Tomov, S.¹ Nath, R.² Ltaief, H.³ Dongarra, J.⁴

27
- 62949108527
- Efficient implementation of floating-point reciprocator on FPGA
- M. K. Jaiswal and N. Chandrachoodan, "Efficient Implementation of Floating-Point Reciprocator on FPGA," Proc. 22nd Int'l Conf. VLSI Design (VLSID'09). pp. 267-271, 2009.
- (2009) Proc. 22nd Int'l Conf. VLSI Design (VLSID'09) , pp. 267-271
- Jaiswal, M.K.¹ Chandrachoodan, N.²

28
- 64949126596
- Efficient implementation of IEEE double precision floating-point multiplier on FPGA
- Dec.
- M. K. Jaiswal and N. Chandrachoodan, "Efficient Implementation of IEEE Double Precision Floating-Point Multiplier on FPGA," Proc. IEEE Region 10 and the Third Int'l Conf. Industrial and Information Systems (ICIIS'08), pp. 1-4, Dec. 2008.
- (2008) Proc. IEEE Region 10 and the Third Int'l Conf. Industrial and Information Systems (ICIIS'08) , pp. 1-4
- Jaiswal, M.K.¹ Chandrachoodan, N.²

29
- 82555163061
- QDR II SRAM interface for virtex-5 devices
- Oct.
- L. Gopalakrishnan, "QDR II SRAM Interface for Virtex-5 Devices," Xilinx Application Note (XAPP853), http://www.xilinx.com/support/ documentation/application-notes/xapp853.pdf, Oct. 2008.
- (2008) Xilinx Application Note (XAPP853)
- Gopalakrishnan, L.¹

30
- 57049186554
- High-performance mixed-precision linear solver for FPGAs
- Dec.
- J. Sun, G. Peterson, and O. Storaasli, "High-Performance Mixed-Precision Linear Solver for FPGAs," IEEE Trans. Computers, vol. 57, no. 12, pp. 1614-1623, Dec. 2008.
- (2008) IEEE Trans. Computers , vol.57 , Issue.12 , pp. 1614-1623
- Sun, J.¹ Peterson, G.² Storaasli, O.³

31
- 82555169975
- "AMD Core Math Library (ACML)," http://developer.amd.com/cpu/ Libraries/acml/Pages/default.aspx, 2011.
- (2011) AMD Core Math Library (ACML)

32
- 82555169970
- Intel Corporation
- Intel Corporation "Intel Math Kernel Library (Intel MKL) 10.2 In-Depth," http://software.intel.com/sites/products/collateral/hpc/mkl/mkl- indepth.pdf, 2009.
- (2009) Intel Math Kernel Library (Intel MKL) 10.2 In-Depth

33
- 77953997924
- Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects
- E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov, "Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects," J. Physics: Conference Series, vol. 180, 2009.
- (2009) J. Physics: Conference Series , vol.180
- Agullo, E.¹ Demmel, J.² Dongarra, J.³ Hadri, B.⁴ Kurzak, J.⁵ Langou, J.⁶ Ltaief, H.⁷ Luszczek, P.⁸ Tomov, S.⁹

34
- 82555163062
- June
- J. Dongarra, "LINPACK Benchmarking and beyond," http://www.netlib.org/utk/people/JackDongarra/SLIDES/dod-0610. df, June 2010.
- (2010) LINPACK Benchmarking and beyond
- Dongarra, J.¹

35
- 82555163059
- J. Humphrey, "CULA 2.2 Sneak Preview," http://www.culatools. com/blog/2010/09/10/cula-2-2-sneak-preview/, 2010.
- (2010) CULA 2.2 Sneak Preview
- Humphrey, J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.