SCOPUS 정보 검색 플랫폼

Volumn 93, Issue 2, 2005, Pages 409-424

Efficient utilization of SIMD extensions

(4) Franchetti, Franz a Kral, Stefan a Lorenz, Juergen a Ueberhuber, Christoph W a

a VIENNA UNIVERSITY OF TECHNOLOGY (Austria)

Author keywords

Automatic vectorization; Digital signal processing (DSP); Fast fourier transform (FFT); Multiple data (SIMD); Short vector single instruction; Symbolic vectorization

Indexed keywords

COMPUTER OPERATING SYSTEMS; DATA STORAGE EQUIPMENT; DIGITAL SIGNAL PROCESSING; FAST FOURIER TRANSFORMS; PERFORMANCE; PROGRAM PROCESSORS; VECTORS;

AUTOMATIC VECTORIZATION; MULTIPLE DATA; SHORT VECTOR SINGLE INSTRUCTION; SYMBOLIC VECTORIZATION;

PROGRAM COMPILERS;

EID: 19344363982 PISSN: 00189219 EISSN: None Source Type: Journal
DOI: 10.1109/JPROC.2004.840491 Document Type: Conference Paper

Times cited : (60)

References (54)

1
- 0035023971
- EMMERALD: A fast matrix-matrix multiply using Intel's SSE instructions
- D. Aberdeen and J. Baxter, "EMMERALD: a fast matrix-matrix multiply using Intel's SSE instructions," Concurrency Comput. Practice Exper., vol. 13, no. 2, pp. 103-119, 2001.
- (2001) Concurrency Comput. Practice Exper. , vol.13 , Issue.2 , pp. 103-119
- Aberdeen, D.¹ Baxter, J.²

2
- 0004072686
- Reading, MA: Addison-Wesley
- A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986.
- (1986) Compilers: Principles, Techniques, and Tools.
- Aho, A.V.¹ Sethi, R.² Ullman, J.D.³

3
- 84859277397
- An overview of the BlueGene/L system software organization
- G. Almasi et al., "An overview of the BlueGene/L system software organization," in Proc. Euro-Par'03 Conf. Parallel and Distributed Computing, pp. 147-159.
- Proc. Euro-Par'03 Conf. Parallel and Distributed Computing , pp. 147-159
- Almasi, G.¹

4
- 20744450544
- Advanced Micro Devices Corp., Sunnyvale, CA
- AMD Core Math Library (ACML) Manual, Advanced Micro Devices Corp., Sunnyvale, CA, 2000.
- (2000) AMD Core Math Library (ACML) Manual

5
- 11844297937
- American National Standard Institute (ANSI), New York
- ANSI, "ISO/IEC 9899:1999(E), Programming Languages - C," American National Standard Institute (ANSI), New York, 1999.
- (1999) ISO/IEC 9899:1999(E), Programming Languages - C

6
- 20744436942
- vDSP Library. [Online]
- Apple Computer Inc. (2001) vDSP Library. [Online]. Available: http://developer.apple.com/tml
- (2001)

7
- 0028743437
- Compiler transformations for high-performance computing
- D. F. Bacon, S. L. Graham, and O. J. Sharp, "Compiler transformations for high-performance computing," ACM Comput. Surv., vol. 26, pp. 345-420, 1994.
- (1994) ACM Comput. Surv. , vol.26 , pp. 345-420
- Bacon, D.F.¹ Graham, S.L.² Sharp, O.J.³

8
- 0003003638
- A study of replacement algorithms for virtual storage computers
- July
- L. A. Belady, "A study of replacement algorithms for virtual storage computers," IBM Syst. J., vol. 5, no. 2, pp. 78-101, July 1966.
- (1966) IBM Syst. J. , vol.5 , Issue.2 , pp. 78-101
- Belady, L.A.¹

9
- 0030661485
- Optimizing matrix multiply using PHIPAC: A portable, high-performance, ANSI C coding methodology
- J. Bilmes, K. Asanovic, C. W. Chin, and J. Demmel, "Optimizing matrix multiply using PHIPAC: A portable, high-performance, ANSI C coding methodology," in Proc. Int. Conf. Supercomputing, 1997, pp. 340-347.
- (1997) Proc. Int. Conf. Supercomputing , pp. 340-347
- Bilmes, J.¹ Asanovic, K.² Chin, C.W.³ Demmel, J.⁴

10
- 20744443909
- [Online]
- Codeplay Corp. (2002) VECTOR C. [Online]. Available: http://www.codeplay. com
- (2002) Vector C

11
- 21144437673
- Wavelet transform for large scale image processing on modern microprocessors
- J. M. L. M. Palma et al., Eds. Heidelberg, Germany: Springer-Verlag
- D. Chaver, C. Tenllado, L. Pinjuel, M. Prieto, and F. Tirado et al., "Wavelet transform for large scale image processing on modern microprocessors," in Lecture Notes in Computer Science, VECPAR 2002, J. M. L. M. Palma et al., Eds. Heidelberg, Germany: Springer-Verlag, 2003, vol. 2565, pp. 549-562.
- (2003) Lecture Notes in Computer Science, VECPAR 2002 , vol.2565 , pp. 549-562
- Chaver, D.¹ Tenllado, C.² Pinjuel, L.³ Prieto, M.⁴ Tirado, F.⁵

12
- 0009686023
- Advanced Comput. Group, Apple Computer Inc.
- R. Crandall and J. Klivington, "Supercomputer-Style FFT Library for the Apple G4," Advanced Comput. Group, Apple Computer Inc., 2002.
- (2002) Supercomputer-style FFT Library for the Apple G4
- Crandall, R.¹ Klivington, J.²

13
- 20744452904
- Automatic performance tuning for large scale scientific applications
- Feb.
- J. Demmel, J. Dongarra, V. Eijkhout, and K. Yelick, "Automatic performance tuning for large scale scientific applications," Proc. IEEE, vol. 93, no. 2, pp. 293-312, Feb. 2005.
- (2005) Proc. IEEE , vol.93 , Issue.2 , pp. 293-312
- Demmel, J.¹ Dongarra, J.² Eijkhout, V.³ Yelick, K.⁴

14
- 84948965859
- The SCC compiler: SWARing at MMX and 3DNow
- L. Carter and J. Ferrante, Eds.
- R. J. Fisher and H. G. Dietz, "The SCC compiler: SWARing at MMX and 3DNow," in Proc. 12th Annu. Workshop Languages and Compilers for Parallel Computing, L. Carter and J. Ferrante, Eds., 2000, pp. 399-414.
- (2000) Proc. 12th Annu. Workshop Languages and Compilers for Parallel Computing , pp. 399-414
- Fisher, R.J.¹ Dietz, H.G.²

15
- 19344377871
- Compiling for SIMD within a register
- S. Chatterjee, Ed.
- _, "Compiling for SIMD within a register," in Proc. 11th Annu. Workshop Languages and Compilers for Parallel Computing, S. Chatterjee, Ed., 1999, pp. 290-304.
- (1999) Proc. 11th Annu. Workshop Languages and Compilers for Parallel Computing , pp. 290-304

16
- 19344375264
- A portable short vector version of FFTW
- F. Franchetti, "A portable short vector version of FFTW," in Proc. 4th IMACS Symp. Mathematical Modeling (MATHMOD 2003), vol. 2, pp. 1539-1548.
- Proc. 4th IMACS Symp. Mathematical Modeling (MATHMOD 2003) , vol.2 , pp. 1539-1548
- Franchetti, F.¹

17
- 19344368498
- Ph.D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria
- _, "Performance portable short vector transforms," Ph.D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria, 2003.
- (2003) Performance Portable Short Vector Transforms

18
- 0034848812
- Architecture independent short vector FFTs
- F. Franchetti, H. Karner, S. Kral, and C. W. Ueberhuber, "Architecture independent short vector FFTs," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, 2001, pp. 1109-1112.
- (2001) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , vol.2 , pp. 1109-1112
- Franchetti, F.¹ Karner, H.² Kral, S.³ Ueberhuber, C.W.⁴

19
- 84966570185
- A SIMD vectorizing compiler for digital signal processing algorithms
- F. Franchetti and M. Püschel, "A SIMD vectorizing compiler for digital signal processing algorithms," in Proc. Int. Parallel and Distributed Processing Symp., 2002, pp. 20-26.
- (2002) Proc. Int. Parallel and Distributed Processing Symp. , pp. 20-26
- Franchetti, F.¹ Püschel, M.²

20
- 0141676720
- Short vector code generation and adaptation for DSP algorithms
- _, "Short vector code generation and adaptation for DSP algorithms," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP'03), vol. 2, pp. 537-540.
- Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP'03) , vol.2 , pp. 537-540

21
- 84947238038
- Short vector code generation for the discrete Fourier transform
- _, "Short vector code generation for the discrete Fourier transform," in Proc. 17th Int. Parallel and Distributed Processing Symp. (IPDPS'03), pp. 22-26.
- Proc. 17th Int. Parallel and Distributed Processing Symp. (IPDPS'03) , pp. 22-26

22
- 0032681068
- A fast Fourier transform compiler
- M. Frigo, "A fast Fourier transform compiler," in Proc. ACM SIGPLAN '99 Conf. Programming Language Design and Implementation, pp. 169-180.
- Proc. ACM SIGPLAN '99 Conf. Programming Language Design and Implementation , pp. 169-180
- Frigo, M.¹

23
- 0031636309
- FFTW: An adaptive software architecture for the FFT
- M. Frigo and S. G. Johnson, "FFTW: An adaptive software architecture for the FFT," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '98), vol. 3, pp. 1381-1384.
- Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '98) , vol.3 , pp. 1381-1384
- Frigo, M.¹ Johnson, S.G.²

24
- 20744449792
- The design and implementation of FFTW3
- Feb.
- _, "The design and implementation of FFTW3," Proc. IEEE, vol. 93, no. 2, pp. 216-231, Feb. 2005.
- (2005) Proc. IEEE , vol.93 , Issue.2 , pp. 216-231

25
- 20744448168
- The power of Belady's algorithm in register allocation for long basic blocks
- Heidelberg, Germany: Springer-Verlag
- J. Guo, M. Garzarán, and D. Padua, "The power of Belady's algorithm in register allocation for long basic blocks," in Lecture Notes in Computer Science, Languages and Compilers for Parallel Computing. Heidelberg, Germany: Springer-Verlag, 2004, vol. 2958, pp. 374-390.
- (2004) Lecture Notes in Computer Science, Languages and Compilers for Parallel Computing , vol.2958 , pp. 374-390
- Guo, J.¹ Garzarán, M.² Padua, D.³

26
- 0009655771
- [Online]
- Intel Corp. (1999) Split radix fast Fourier transform using streaming SIMD extensions (AP-808). [Online]. Available: http://developer.intel.com/ software/products/itc/strmsimd/808down.htm
- (1999) Split Radix Fast Fourier Transform Using Streaming SIMD Extensions (AP-808).

27
- 20744443121
- [Online]
- _, (2002) Intel C/C++ compiler user's guide. [Online], Available: http://www.ncsa.uiuc.edu/UserInfo/Resources/Software/Intel/Compilers/8.0/c_ug/ index.htm
- (2002) Intel C/C++ Compiler User's Guide

28
- 20744456099
- Math Kernel Library. [Online]
- _, (2002) Math kernel library. [Online]. Available: http://www.intel.com/ software/products/mkl
- (2002)

29
- 0025600627
- A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures
- J. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri, "A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures," Circuits Syst. Signal Process., vol. 9, no. 4, pp. 449-500, 1990.
- (1990) Circuits Syst. Signal Process. , vol.9 , Issue.4 , pp. 449-500
- Johnson, J.¹ Johnson, R.W.² Rodriguez, D.³ Tolimieri, R.⁴

30
- 20744433925
- Available instruction-level parallelism for superscalar and superpipelined machines
- Digital Western Res. Lab., Palo Alto, CA
- N. P. Jouppi and D. W. Wall, "Available instruction-level parallelism for superscalar and superpipelined machines," Digital Western Res. Lab., Palo Alto, CA, WRL Res. Rep. 7, 1989.
- (1989) WRL Res. Rep. , vol.7
- Jouppi, N.P.¹ Wall, D.W.²

31
- 20744451671
- [Online]
- S. Kral. (2003) The FFTW-GEL Web Page. [Online]. Available: http://www.complang.tuwien.ac.at/skral/fftwgel
- (2003) The FFTW-GEL Web Page
- Kral, S.¹

32
- 35048892074
- SIMD vectorization of straight line FFT code
- S. Kral, F. Franchetti, J. Lorenz, and C. Ueberhuber, "SIMD vectorization of straight line FFT code," in Proc. Euro-Par'03 Conf. Parallel and Distributed Computing, pp. 251-260.
- Proc. Euro-Par'03 Conf. Parallel and Distributed Computing , pp. 251-260
- Kral, S.¹ Franchetti, F.² Lorenz, J.³ Ueberhuber, C.⁴

33
- 19344374456
- FFT compiler techniques
- _, "FFT compiler techniques," in Proc. 13th Int. Conf. Compiler Construction, 2004, pp. 217-231.
- (2004) Proc. 13th Int. Conf. Compiler Construction , pp. 217-231

34
- 20744436460
- [Online]
- S. Lamson. (1995) SCIPORT. [Online]. Available: http://www.netlib.org/ scilib/
- (1995) SCIPORT
- Lamson, S.¹

35
- 17144371526
- Exploiting supenvord level parallelism with multimedia instruction sets
- S. Larsen and S. Amarasinghe, "Exploiting supenvord level parallelism with multimedia instruction sets," ACM SIGPLAN Notices, vol. 35, no. 5, pp. 145-156, 2000.
- (2000) ACM SIGPLAN Notices , vol.35 , Issue.5 , pp. 145-156
- Larsen, S.¹ Amarasinghe, S.²

36
- 23044527555
- Graph-based code selection techniques for embedded processors
- R. Leupers and S. Bashford, "Graph-based code selection techniques for embedded processors," ACM Trans. Design Autom. Electron. Syst., vol. 5, no. 4, pp. 794-814, 2000.
- (2000) ACM Trans. Design Autom. Electron. Syst. , vol.5 , Issue.4 , pp. 794-814
- Leupers, R.¹ Bashford, S.²

37
- 19344373862
- Ph. D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria
- J. Lorenz, "Automatic SIMD vectorization," Ph. D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria, 2004.
- (2004) Automatic SIMD Vectorization
- Lorenz, J.¹

38
- 0036974959
- Energy aware compilation for DSP's with SIMD instructions
- M. Lorenz, L. Wehmeyer, and T. Draeger, "Energy aware compilation for DSP's with SIMD instructions," in Proc. 2002 Joint Conf. Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems (LCTES'02-SCOPES'02), pp. 94-101.
- Proc. 2002 Joint Conf. Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems (LCTES'02-SCOPES'02) , pp. 94-101
- Lorenz, M.¹ Wehmeyer, L.² Draeger, T.³

39
- 19344368072
- SPIRAL: Code generation for DSP transforms
- Feb.
- M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, "SPIRAL: code generation for DSP transforms," Proc. IEEE, vol. 93, no. 2, pp. 232-275, Feb. 2005.
- (2005) Proc. IEEE , vol.93 , Issue.2 , pp. 232-275
- Püschel, M.¹ Moura, J.M.F.² Johnson, J.³ Padua, D.⁴ Veloso, M.⁵ Singer, B.⁶ Xiong, J.⁷ Franchetti, F.⁸ Gacic, A.⁹ Voronenko, Y.¹⁰ Chen, K.¹¹ Johnson, R.W.¹² Rizzolo, N.¹³

40
- 0003502903
- San Francisco, CA: Morgan Kaufmann
- S. S. Muchniek, Advanced Compiler Design and Implementation. San Francisco, CA: Morgan Kaufmann, 1997.
- (1997) Advanced Compiler Design and Implementation
- Muchniek, S.S.¹

41
- 0037911190
- Radix-4 FFT implementation using SIMD multi-media instructions
- K. Nadehara, T. Miyazaki, and I. Kuroda, "Radix-4 FFT implementation using SIMD multi-media instructions," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '99), pp. 2131-2135.
- Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '99) , pp. 2131-2135
- Nadehara, K.¹ Miyazaki, T.² Kuroda, I.³

42
- 20744456847
- [Online]
- I. Nicholson. (2002) libSIMD. [Online]. Available: http://libsimd. sourceforge.net
- (2002) LibSIMD
- Nicholson, I.¹

43
- 1542396679
- SPIRAL: A generator for platform - Adapted libraries of signal processing algorithms
- M. Püschel, B. Singer, J. Xiong, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, and R. W. Johnson, "SPIRAL: A generator for platform - adapted libraries of signal processing algorithms," J. High Perform. Comput. Appl. (Special Issue on Automatic Performance Tuning), vol. 18, pp. 21-45, 2004.
- (2004) J. High Perform. Comput. Appl. (Special Issue on Automatic Performance Tuning) , vol.18 , pp. 21-45
- Püschel, M.¹ Singer, B.² Xiong, J.³ Moura, J.M.F.⁴ Johnson, J.⁵ Padua, D.⁶ Veloso, M.⁷ Johnson, R.W.⁸

44
- 35048817424
- A preliminary study on the vectorization of multimedia applications for multimedia extensions
- G. Ren, P. Wu, and D. A. Padua, "A preliminary study on the vectorization of multimedia applications for multimedia extensions," in Proc. Int. Workshop Languages and Compilers for Parallel Computing, 2003, pp. 420-435.
- (2003) Proc. Int. Workshop Languages and Compilers for Parallel Computing , pp. 420-435
- Ren, G.¹ Wu, P.² Padua, D.A.³

45
- 0036299697
- A radix-2 FFT algorithm for modern single instruction multiple data (SIMD) architectures
- P. Rodriguez, "A radix-2 FFT algorithm for modern single instruction multiple data (SIMD) architectures," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '02), vol. 3, 2002, pp. III-3220-III-3223.
- (2002) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '02) , vol.3
- Rodriguez, P.¹

46
- 0034249157
- A vectorizing compiler for multimedia extensions
- N. Sreraman and R. Govindarajan, "A vectorizing compiler for multimedia extensions," Int. J. Parallel Program., vol. 28, no. 4, pp. 363-400, 2000.
- (2000) Int. J. Parallel Program , vol.28 , Issue.4 , pp. 363-400
- Sreraman, N.¹ Govindarajan, R.²

47
- 10044288238
- Boca Raton, FL: CRC
- Y. Srikant and P. Shankar, The Compiler Design Handbook. Boca Raton, FL: CRC, 2003.
- (2003) The Compiler Design Handbook
- Srikant, Y.¹ Shankar, P.²

48
- 0021470572
- FFT algorithms for vector computers
- P. N. Swarztrauber, "FFT algorithms for vector computers," Parallel Comput., vol. 1, pp. 45-63, 1984.
- (1984) Parallel Comput. , vol.1 , pp. 45-63
- Swarztrauber, P.N.¹

49
- 0003417587
- ser. Frontiers in Applied Mathematics. Philadelphia, PA: Soc. Ind. Appl. Math.
- C. F. Van Loan, Computational Frameworks for the Fast Fourier Transform, ser. Frontiers in Applied Mathematics. Philadelphia, PA: Soc. Ind. Appl. Math., 1992, vol. 10.
- (1992) Computational Frameworks for the Fast Fourier Transform , vol.10
- Van Loan, C.F.¹

50
- 0003418094
- Automatically tuned linear algebra software
- San Antonio, TX
- R. C. Whaley and J. J. Dongarra, "Automatically Tuned Linear Algebra Software," presented at the 9th SIAM Conf. Parallel Processing for Scientific Computing, San Antonio, TX, 1999.
- (1999) 9th SIAM Conf. Parallel Processing for Scientific Computing
- Whaley, R.C.¹ Dongarra, J.J.²

51
- 0343462141
- Automated empirical optimizations of software and the ATLAS project
- R. C. Whaley, A. Petitet, and J. J. Dongarra, "Automated empirical optimizations of software and the ATLAS project," Parallel Comput., vol. 27, pp. 3-35, 2001.
- (2001) Parallel Comput. , vol.27 , pp. 3-35
- Whaley, R.C.¹ Petitet, A.² Dongarra, J.J.³

52
- 13244261416
- [Online]
- R. C. Whaley. User contribution to ATLAS. [Online]. Available: http://www.cs.utk.edu/~rwhaley/papers/atlascontrib.ps
- User Contribution to ATLAS
- Whaley, R.C.¹

53
- 0034826555
- SPL: A language and compiler for DSP algorithms
- J. Xiong, J. Johnson, R. Johnson, and D. Padua, "SPL: A language and compiler for DSP algorithms," in Proc. Conf. Programming Languages Design and Implementation (PLDI), 2001, pp. 298-308.
- (2001) Proc. Conf. Programming Languages Design and Implementation (PLDI) , pp. 298-308
- Xiong, J.¹ Johnson, J.² Johnson, R.³ Padua, D.⁴

54
- 0003488086
- New York: ACM
- H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers. New York: ACM, 1991.
- (1991) Supercompilers for Parallel and Vector Computers
- Zima, H.¹ Chapman, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.