SCOPUS 정보 검색 플랫폼

IEEE Transactions on Computers

Volumn 57, Issue 8, 2008, Pages 1057-1071

High-performance designs for linear algebra operations on reconfigurable hardware

(2) Zhuo, Ling a Prasanna, Viktor K a

a University of Southern California (United States)

Author keywords

Computations on matrices; Parallel algorithms; Reconfigurable hardware

Indexed keywords

ALGEBRA; BOOLEAN ALGEBRA; DIGITAL ARITHMETIC; EIGENVALUES AND EIGENFUNCTIONS; FIELD PROGRAMMABLE GATE ARRAYS (FPGA); GENERAL PURPOSE COMPUTERS; MATRIX ALGEBRA;

CHIP AREAS; FIELD PROGRAMMABLE GATE ARRAY (FPGA); FLOATING-POINT UNIT (FPU); GENERAL PURPOSE PROCESSOR (GPP); HARDWARE ACCELERATIONS; HARDWARE CONSTRAINTS; HARDWARE RESOURCES; HIGH-PERFORMANCE DESIGNS; I/O PINS; LINEAR ALGEBRA OPERATIONS; MATRIX FACTORIZATIONS; MATRIX MULTIPLICATIONS; MATRIX VECTOR MULTIPLICATION (MVM); MEMORY BANDWIDTHS; NUMERICAL LINEAR ALGEBRA; PERFORMANCE OPTIMIZATIONS; RECONFIGURABLE HARDWARE (RH); XILINX VIRTEX;

LINEAR ALGEBRA;

EID: 47049109081 PISSN: 00189340 EISSN: None Source Type: Journal
DOI: 10.1109/TC.2008.55 Document Type: Article

Times cited : (72)

References (46)

1
- 47049106841
- Xilinx Incorporated
- Xilinx Incorporated, http://www.xilinx.com, 2008.
- (2008)

2
- 62949084719
- Scientific Computations on a NASA Reconfigurable Hypercomputer
- Sept
- O. Storaasli, R.C. Singleterry, and S. Brown, "Scientific Computations on a NASA Reconfigurable Hypercomputer," Proc. Fifth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices, Sept. 2002.
- (2002) Proc. Fifth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices
- Storaasli, O.¹ Singleterry, R.C.² Brown, S.³

3
- 18644362801
- Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance
- Apr
- K.D. Underwood and K.S. Hemmert, "Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance," Proc. 12th Ann. IEEE Symp. Field-Programmable Custom Computing Machines, Apr. 2004.
- (2004) Proc. 12th Ann. IEEE Symp. Field-Programmable Custom Computing Machines
- Underwood, K.D.¹ Hemmert, K.S.²

4
- 33746273833
- Accelerating Scientific Applications with the SRC-6 Reconfigurable Computer: Methodologies and Analysis
- Apr
- M. Smith, J. Vetter, and X. Liang, "Accelerating Scientific Applications with the SRC-6 Reconfigurable Computer: Methodologies and Analysis," Proc. 19th IEEE Int'l Parallel and Distributed Processing Symp., Apr. 2005.
- (2005) Proc. 19th IEEE Int'l Parallel and Distributed Processing Symp
- Smith, M.¹ Vetter, J.² Liang, X.³

5
- 2442575888
- A Quantitative Analysis of the Speedup Factors of FPGAs over Processors
- Feb
- Z. Guo, W. Najjar, F. Vahid, and K. Vissers, "A Quantitative Analysis of the Speedup Factors of FPGAs over Processors," Proc. 12th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, pp. 162-170, Feb. 2004.
- (2004) Proc. 12th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays , pp. 162-170
- Guo, Z.¹ Najjar, W.² Vahid, F.³ Vissers, K.⁴

6
- 47049092496
- Reconfigurable Computing with Multiscale Data Fusion for Remote Sensing
- Feb
- V. Aggarwal, A. George, and K. Slatton, "Reconfigurable Computing with Multiscale Data Fusion for Remote Sensing," Proc. 14th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, p. 235, Feb. 2006.
- (2006) Proc. 14th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays , pp. 235
- Aggarwal, V.¹ George, A.² Slatton, K.³

7
- 47049118159
- n) in Optimal Normal Basis on a Reconfigurable Computer
- Feb
- n) in Optimal Normal Basis on a Reconfigurable Computer," Proc. 12th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, Feb. 2004.
- (2004) Proc. 12th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays
- Bajracharya, S.¹ Shu, C.² Gaj, K.³ El-Ghazawi, T.⁴

8
- 47049105602
- Reconfigurable Computing Applied to Problems in Communications Security
- Sept
- D.A. Buell and J.P. Davis, "Reconfigurable Computing Applied to Problems in Communications Security," Proc. Fifth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices, Sept. 2002.
- (2002) Proc. Fifth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices
- Buell, D.A.¹ Davis, J.P.²

9
- 1142263481
- A Fast Parallel Reed-Solomon Decoder on a Reconfigurable Architecture
- Oct
- A. Koohi, N. Bagherzadeh, and C. Pan, "A Fast Parallel Reed-Solomon Decoder on a Reconfigurable Architecture," Proc. First IEEE/ACM/IFIP Int'l Conf. Hardware/Software Codesign and System Synthesis, Oct. 2003.
- (2003) Proc. First IEEE/ACM/IFIP Int'l Conf. Hardware/Software Codesign and System Synthesis
- Koohi, A.¹ Bagherzadeh, N.² Pan, C.³

10
- 47049130960
- Cray Inc
- Cray Inc., http://www.cray.com/, 2008.
- (2008)

11
- 47049098886
- SRC Computers, Inc
- SRC Computers, Inc., http://www.srccomp.com/, 2008.
- (2008)

12
- 47049109366
- Silicon Graphics, Inc
- Silicon Graphics, Inc., http://www.sgi.com/, 2008.
- (2008)

13
- 24944539760
- High-Performance Algorithm Engineering for Parallel Computation
- D. Bader, B. Moret, and P. Sanders, "High-Performance Algorithm Engineering for Parallel Computation," Lecture Notes in Computer Science, vol. 2547, pp. 1-23, 2002.
- (2002) Lecture Notes in Computer Science , vol.2547 , pp. 1-23
- Bader, D.¹ Moret, B.² Sanders, P.³

14
- 0018515759
- Basic Linear Algebra Subprograms for FORTRAN Usage
- C. Lawson, R. Hanson, D. Kincaid, and F. Krogh, "Basic Linear Algebra Subprograms for FORTRAN Usage," ACM Trans. Math. Software, vol. 5, no. 3, pp. 308-323, 1979.
- (1979) ACM Trans. Math. Software , vol.5 , Issue.3 , pp. 308-323
- Lawson, C.¹ Hanson, R.² Kincaid, D.³ Krogh, F.⁴

15
- 12444290912
- Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on FPGAs
- Apr
- L. Zhuo and V.K. Prasanna, "Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on FPGAs," Proc. 18th Int'l Parallel and Distributed Processing Symp., Apr. 2004.
- (2004) Proc. 18th Int'l Parallel and Distributed Processing Symp
- Zhuo, L.¹ Prasanna, V.K.²

16
- 77957916282
- Scientific Computing Beyond CPUs: FPGA Implementations of Common Scientific Kernels
- Sept
- M. Smith, J. Vetter, and S. Alam, "Scientific Computing Beyond CPUs: FPGA Implementations of Common Scientific Kernels," Proc. Eighth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices, Sept. 2005.
- (2005) Proc. Eighth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices
- Smith, M.¹ Vetter, J.² Alam, S.³

17
- 0003473816
- second ed. SIAM
- R. Barrett, M. Berry, T.F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H.V. der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, second ed. SIAM, 1994.
- (1994) Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
- Barrett, R.¹ Berry, M.² Chan, T.F.³ Demmel, J.⁴ Donato, J.⁵ Dongarra, J.⁶ Eijkhout, V.⁷ Pozo, R.⁸ Romine, C.⁹ der Vorst, H.V.¹⁰

18
- 0004161838
- Cambridge Univ. Press
- W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing. Cambridge Univ. Press, 1992.
- (1992) Numerical Recipes in C: The Art of Scientific Computing
- Press, W.H.¹ Flannery, B.P.² Teukolsky, S.A.³ Vetterling, W.T.⁴

19
- 12444329131
- IEEE
- IEEE 754 Standard for Binary Floating-Point Arithmetic, IEEE, 1984.
- (1984) IEEE 754 Standard for Binary Floating-Point Arithmetic

20
- 0343462141
- R.C. Whaley, A. Petitet, and J.J. Dongarra, Automated Empirical Optimization of Software and the ATLAS Project, Parallel Computing 27, nos. 1-2, pp. 3-35, also available as Univ. of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000 (www.netlib.org/lapack/lawns/lawn147.ps), 2001.
- R.C. Whaley, A. Petitet, and J.J. Dongarra, "Automated Empirical Optimization of Software and the ATLAS Project," Parallel Computing vol. 27, nos. 1-2, pp. 3-35, also available as Univ. of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000 (www.netlib.org/lapack/lawns/lawn147.ps), 2001.

21
- 0003706460
- Aug. 1999
- E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J.D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, "LAPACK User's Guide Third Edition," http://www.netlib.org/lapack/lug/lapack_lug.html, Aug. 1999.
- LAPACK User's Guide Third Edition
- Anderson, E.¹ Bai, Z.² Bischof, C.³ Blackford, S.⁴ Demmel, J.⁵ Dongarra, J.⁶ Croz, J.D.⁷ Greenbaum, A.⁸ Hammarling, S.⁹ McKenney, A.¹⁰ Sorensen, D.¹¹

22
- 0003615167
- SIAM
- L.S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, ScaLAPACK Users' Guide, SIAM, 1997.
- (1997) ScaLAPACK Users' Guide
- Blackford, L.S.¹ Choi, J.² Cleary, A.³ D'Azevedo, E.⁴ Demmel, J.⁵ Dhillon, I.⁶ Dongarra, J.⁷ Hammarling, S.⁸ Henry, G.⁹ Petitet, A.¹⁰ Stanley, K.¹¹ Walker, D.¹² Whaley, R.C.¹³

23
- 0031221523
- Parallel Implementation of BLAS: General Techniques for Level 3 BLAS
- A. Chtchelkanova, J. Gunnels, G. Morrow, J. Overfelt, and R. van de Geijn, "Parallel Implementation of BLAS: General Techniques for Level 3 BLAS," Concurrency: Practice and Experience, vol. 9, no. 9, pp. 837-857, 1997.
- (1997) Concurrency: Practice and Experience , vol.9 , Issue.9 , pp. 837-857
- Chtchelkanova, A.¹ Gunnels, J.² Morrow, G.³ Overfelt, J.⁴ van de Geijn, R.⁵

24
- 0000227930
- Reconfigurable Computing: A Survey of Systems and Software
- June
- K. Compton and S. Hauck, "Reconfigurable Computing: A Survey of Systems and Software," ACM Computing Surveys, vol. 34, no. 2, pp. 171-210, June 2002.
- (2002) ACM Computing Surveys , vol.34 , Issue.2 , pp. 171-210
- Compton, K.¹ Hauck, S.²

25
- 60749122141
- A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing
- June
- G. Govindu, R. Scrofano, and V.K. Prasanna, "A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing," Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms, June 2005.
- (2005) Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms
- Govindu, G.¹ Scrofano, R.² Prasanna, V.K.³

26
- 34547396571
- Advanced Components in the Variable Precision Floating-Point Library
- Apr
- X. Wang, S. Braganza, and M. Leeser, "Advanced Components in the Variable Precision Floating-Point Library," Proc. 14th Ann. IEEE Symp. Field-Programmable Custom Computing Machines, Apr. 2006.
- (2006) Proc. 14th Ann. IEEE Symp. Field-Programmable Custom Computing Machines
- Wang, X.¹ Braganza, S.² Leeser, M.³

27
- 34147177681
- Using FPGA Devices to Accelerate Biomolecular Simulations
- Mar
- S. Alam, P. Agarwal, M. Smith, J. Vetter, and D. Caliga, "Using FPGA Devices to Accelerate Biomolecular Simulations," Computer, vol. 40, no. 3, pp. 66-73, Mar. 2007.
- (2007) Computer , vol.40 , Issue.3 , pp. 66-73
- Alam, S.¹ Agarwal, P.² Smith, M.³ Vetter, J.⁴ Caliga, D.⁵

28
- 0033488513
- Optimizing FPGA-Based Vector Product Designs
- Apr
- D. Benyamin, W. Luk, and J. Villasenor, "Optimizing FPGA-Based Vector Product Designs," Proc. Seventh Ann. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 188-197, Apr. 1999.
- (1999) Proc. Seventh Ann. IEEE Symp. Field-Programmable Custom Computing Machines , pp. 188-197
- Benyamin, D.¹ Luk, W.² Villasenor, J.³

29
- 84962886861
- Area and Time Efficient Implementation of Matrix Multiplication on FPGAs
- Dec
- J.W. Jang, S. Choi, and V.K. Prasanna, "Area and Time Efficient Implementation of Matrix Multiplication on FPGAs," Proc. First IEEE Int'l Conf. Field Programmable Technology, Dec. 2002.
- (2002) Proc. First IEEE Int'l Conf. Field Programmable Technology
- Jang, J.W.¹ Choi, S.² Prasanna, V.K.³

30
- 67649222053
- Time and Energy Efficient Matrix Factorization Using FPGAs
- Sept
- S. Choi and V.K. Prasanna, "Time and Energy Efficient Matrix Factorization Using FPGAs," Proc. 13th Int'l Conf. Field Programmable Logic and Applications, Sept. 2003.
- (2003) Proc. 13th Int'l Conf. Field Programmable Logic and Applications
- Choi, S.¹ Prasanna, V.K.²

31
- 20344376214
- 64-Bit Floating-Point FPGA Matrix Multiplication
- Feb
- Y. Dou, S. Vassiliadis, G. Kuzmanov, and G. Gaydadjiev, "64-Bit Floating-Point FPGA Matrix Multiplication," Proc. 13th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, Feb. 2005.
- (2005) Proc. 13th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays
- Dou, Y.¹ Vassiliadis, S.² Kuzmanov, G.³ Gaydadjiev, G.⁴

32
- 20344389052
- Sparse Matrix-Vector Multiplication on FPGAs
- Feb
- L. Zhuo and V.K. Prasanna, "Sparse Matrix-Vector Multiplication on FPGAs," Proc. 13th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, Feb. 2005.
- (2005) Proc. 13th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays
- Zhuo, L.¹ Prasanna, V.K.²

33
- 47349126591
- Sparse Matrix-Vector Multiplication Design on FPGAs
- Apr
- J. Sun, G. Peterson, and O. Storaasli, "Sparse Matrix-Vector Multiplication Design on FPGAs," Proc. 15th Ann. IEEE Symp. Field-Programmable Custom Computing Machines, Apr. 2007.
- (2007) Proc. 15th Ann. IEEE Symp. Field-Programmable Custom Computing Machines
- Sun, J.¹ Peterson, G.² Storaasli, O.³

34
- 20244390636
- Floating-Point Sparse Matrix-Vector Multiply for FPGAs
- Feb
- M. deLorimier and A. DeHon, "Floating-Point Sparse Matrix-Vector Multiply for FPGAs," Proc. 13th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, Feb. 2005.
- (2005) Proc. 13th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays
- deLorimier, M.¹ DeHon, A.²

35
- 34147110975
- Sparse Matrix-Vector Multiplication Kernel on a Reconfigurable Computer
- Sept
- S. Akella, M. Smith, R. Mills, S. Alam, R. Barrett, and J. Vetter, "Sparse Matrix-Vector Multiplication Kernel on a Reconfigurable Computer," Proc. Workshop High Performance Embedded Computing, Sept. 2005.
- (2005) Proc. Workshop High Performance Embedded Computing
- Akella, S.¹ Smith, M.² Mills, R.³ Alam, S.⁴ Barrett, R.⁵ Vetter, J.⁶

36
- 12444323064
- A High-Performance and Energy-Efficient Architecture for Floating-Point Based LU Decomposition on FPGAs
- June
- G. Govindu, S. Choi, V.K. Prasanna, V. Daga, S. Gangadharpalli, and V. Sridhar, "A High-Performance and Energy-Efficient Architecture for Floating-Point Based LU Decomposition on FPGAs," Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms, June 2004.
- (2004) Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms
- Govindu, G.¹ Choi, S.² Prasanna, V.K.³ Daga, V.⁴ Gangadharpalli, S.⁵ Sridhar, V.⁶

37
- 47049096571
- V,. Daga, G. Govindu, S. Gangadharpalli, V. Sridhar, and V.K. Prasanna, Efficient Floating-Point Based Block LU Decomposition on FPGAs, Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms, June 2004.
- V,. Daga, G. Govindu, S. Gangadharpalli, V. Sridhar, and V.K. Prasanna, "Efficient Floating-Point Based Block LU Decomposition on FPGAs," Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms, June 2004.

38
- 33745127034
- Design Tradeoffs for BLAS Operations on Reconfigurable Hardware
- June
- L. Zhuo and V.K. Prasanna, "Design Tradeoffs for BLAS Operations on Reconfigurable Hardware," Proc. 34th Int'l Conf. Parallel Processing June 2005.
- (2005) Proc. 34th Int'l Conf. Parallel Processing
- Zhuo, L.¹ Prasanna, V.K.²

39
- 33847218692
- High-Performance and Area-Efficient Reduction Circuits on FPGAs
- Oct
- L. Zhuo and V.K. Prasanna, "High-Performance and Area-Efficient Reduction Circuits on FPGAs," Proc. 17th Int'l Symp. Computer Architecture and High Performance Computing, Oct. 2005.
- (2005) Proc. 17th Int'l Symp. Computer Architecture and High Performance Computing
- Zhuo, L.¹ Prasanna, V.K.²

40
- 84971853043
- I/O Complexity: The Red Blue Pebble Game
- May
- J. Hong and H. Kung, "I/O Complexity: The Red Blue Pebble Game," Proc. 13th Ann. ACM Symp. Theory of Computing, pp. 326-333, May 1981.
- (1981) Proc. 13th Ann. ACM Symp. Theory of Computing , pp. 326-333
- Hong, J.¹ Kung, H.²

41
- 34047144377
- Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems
- Apr
- L. Zhuo and V. Prasanna, "Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems," IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 4, pp. 433-448, Apr. 2007.
- (2007) IEEE Trans. Parallel and Distributed Systems , vol.18 , Issue.4 , pp. 433-448
- Zhuo, L.¹ Prasanna, V.²

42
- 0004116989
- second ed. The MIT Press
- T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed. The MIT Press, 2001.
- (2001) Introduction to Algorithms
- Cormen, T.H.¹ Leiserson, C.E.² Rivest, R.L.³ Stein, C.⁴

43
- 85064764845
- Out of Core, Out of Mind: Practical Parallel I/O
- D. Womble, D. Greenberg, R. Riesen, and S. Wheat, "Out of Core, Out of Mind: Practical Parallel I/O," Proc. Scalable Parallel Libraries Conf., pp. 10-16, citeseer.ist.psu.edu/womble93out.html, 1993.
- (1993) Proc. Scalable Parallel Libraries Conf , pp. 10-16
- Womble, D.¹ Greenberg, D.² Riesen, R.³ Wheat, S.⁴

44
- 47049101775
- Mentor Graphics Corp
- Mentor Graphics Corp., http://www.mentor.com/, 2008.
- (2008)

45
- 47049114782
- AMD Core Math Library, http://developer.amd.com/acml.aspx, 2008.
- (2008) AMD Core Math Library

46
- 47049122028
- Automatic Tuning of PDGEMM Towards Optimal Performance
- Aug
- S. Hunold and T. Rauber, "Automatic Tuning of PDGEMM Towards Optimal Performance," Proc. European Conf. Parallel Processing, Aug. 2005.
- (2005) Proc. European Conf. Parallel Processing
- Hunold, S.¹ Rauber, T.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.