메뉴 건너뛰기




Volumn 93, Issue 2, 2005, Pages 409-424

Efficient utilization of SIMD extensions

Author keywords

Automatic vectorization; Digital signal processing (DSP); Fast fourier transform (FFT); Multiple data (SIMD); Short vector single instruction; Symbolic vectorization

Indexed keywords

COMPUTER OPERATING SYSTEMS; DATA STORAGE EQUIPMENT; DIGITAL SIGNAL PROCESSING; FAST FOURIER TRANSFORMS; PERFORMANCE; PROGRAM PROCESSORS; VECTORS;

EID: 19344363982     PISSN: 00189219     EISSN: None     Source Type: Journal    
DOI: 10.1109/JPROC.2004.840491     Document Type: Conference Paper
Times cited : (60)

References (54)
  • 1
    • 0035023971 scopus 로고    scopus 로고
    • EMMERALD: A fast matrix-matrix multiply using Intel's SSE instructions
    • D. Aberdeen and J. Baxter, "EMMERALD: a fast matrix-matrix multiply using Intel's SSE instructions," Concurrency Comput. Practice Exper., vol. 13, no. 2, pp. 103-119, 2001.
    • (2001) Concurrency Comput. Practice Exper. , vol.13 , Issue.2 , pp. 103-119
    • Aberdeen, D.1    Baxter, J.2
  • 4
    • 20744450544 scopus 로고    scopus 로고
    • Advanced Micro Devices Corp., Sunnyvale, CA
    • AMD Core Math Library (ACML) Manual, Advanced Micro Devices Corp., Sunnyvale, CA, 2000.
    • (2000) AMD Core Math Library (ACML) Manual
  • 5
    • 11844297937 scopus 로고    scopus 로고
    • American National Standard Institute (ANSI), New York
    • ANSI, "ISO/IEC 9899:1999(E), Programming Languages - C," American National Standard Institute (ANSI), New York, 1999.
    • (1999) ISO/IEC 9899:1999(E), Programming Languages - C
  • 6
    • 20744436942 scopus 로고    scopus 로고
    • vDSP Library. [Online]
    • Apple Computer Inc. (2001) vDSP Library. [Online]. Available: http://developer.apple.com/tml
    • (2001)
  • 7
    • 0028743437 scopus 로고
    • Compiler transformations for high-performance computing
    • D. F. Bacon, S. L. Graham, and O. J. Sharp, "Compiler transformations for high-performance computing," ACM Comput. Surv., vol. 26, pp. 345-420, 1994.
    • (1994) ACM Comput. Surv. , vol.26 , pp. 345-420
    • Bacon, D.F.1    Graham, S.L.2    Sharp, O.J.3
  • 8
    • 0003003638 scopus 로고
    • A study of replacement algorithms for virtual storage computers
    • July
    • L. A. Belady, "A study of replacement algorithms for virtual storage computers," IBM Syst. J., vol. 5, no. 2, pp. 78-101, July 1966.
    • (1966) IBM Syst. J. , vol.5 , Issue.2 , pp. 78-101
    • Belady, L.A.1
  • 9
    • 0030661485 scopus 로고    scopus 로고
    • Optimizing matrix multiply using PHIPAC: A portable, high-performance, ANSI C coding methodology
    • J. Bilmes, K. Asanovic, C. W. Chin, and J. Demmel, "Optimizing matrix multiply using PHIPAC: A portable, high-performance, ANSI C coding methodology," in Proc. Int. Conf. Supercomputing, 1997, pp. 340-347.
    • (1997) Proc. Int. Conf. Supercomputing , pp. 340-347
    • Bilmes, J.1    Asanovic, K.2    Chin, C.W.3    Demmel, J.4
  • 10
    • 20744443909 scopus 로고    scopus 로고
    • [Online]
    • Codeplay Corp. (2002) VECTOR C. [Online]. Available: http://www.codeplay. com
    • (2002) Vector C
  • 11
    • 21144437673 scopus 로고    scopus 로고
    • Wavelet transform for large scale image processing on modern microprocessors
    • J. M. L. M. Palma et al., Eds. Heidelberg, Germany: Springer-Verlag
    • D. Chaver, C. Tenllado, L. Pinjuel, M. Prieto, and F. Tirado et al., "Wavelet transform for large scale image processing on modern microprocessors," in Lecture Notes in Computer Science, VECPAR 2002, J. M. L. M. Palma et al., Eds. Heidelberg, Germany: Springer-Verlag, 2003, vol. 2565, pp. 549-562.
    • (2003) Lecture Notes in Computer Science, VECPAR 2002 , vol.2565 , pp. 549-562
    • Chaver, D.1    Tenllado, C.2    Pinjuel, L.3    Prieto, M.4    Tirado, F.5
  • 13
    • 20744452904 scopus 로고    scopus 로고
    • Automatic performance tuning for large scale scientific applications
    • Feb.
    • J. Demmel, J. Dongarra, V. Eijkhout, and K. Yelick, "Automatic performance tuning for large scale scientific applications," Proc. IEEE, vol. 93, no. 2, pp. 293-312, Feb. 2005.
    • (2005) Proc. IEEE , vol.93 , Issue.2 , pp. 293-312
    • Demmel, J.1    Dongarra, J.2    Eijkhout, V.3    Yelick, K.4
  • 17
    • 19344368498 scopus 로고    scopus 로고
    • Ph.D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria
    • _, "Performance portable short vector transforms," Ph.D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria, 2003.
    • (2003) Performance Portable Short Vector Transforms
  • 24
    • 20744449792 scopus 로고    scopus 로고
    • The design and implementation of FFTW3
    • Feb.
    • _, "The design and implementation of FFTW3," Proc. IEEE, vol. 93, no. 2, pp. 216-231, Feb. 2005.
    • (2005) Proc. IEEE , vol.93 , Issue.2 , pp. 216-231
  • 27
    • 20744443121 scopus 로고    scopus 로고
    • [Online]
    • _, (2002) Intel C/C++ compiler user's guide. [Online], Available: http://www.ncsa.uiuc.edu/UserInfo/Resources/Software/Intel/Compilers/8.0/c_ug/ index.htm
    • (2002) Intel C/C++ Compiler User's Guide
  • 28
    • 20744456099 scopus 로고    scopus 로고
    • Math Kernel Library. [Online]
    • _, (2002) Math kernel library. [Online]. Available: http://www.intel.com/ software/products/mkl
    • (2002)
  • 29
    • 0025600627 scopus 로고
    • A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures
    • J. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri, "A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures," Circuits Syst. Signal Process., vol. 9, no. 4, pp. 449-500, 1990.
    • (1990) Circuits Syst. Signal Process. , vol.9 , Issue.4 , pp. 449-500
    • Johnson, J.1    Johnson, R.W.2    Rodriguez, D.3    Tolimieri, R.4
  • 30
    • 20744433925 scopus 로고
    • Available instruction-level parallelism for superscalar and superpipelined machines
    • Digital Western Res. Lab., Palo Alto, CA
    • N. P. Jouppi and D. W. Wall, "Available instruction-level parallelism for superscalar and superpipelined machines," Digital Western Res. Lab., Palo Alto, CA, WRL Res. Rep. 7, 1989.
    • (1989) WRL Res. Rep. , vol.7
    • Jouppi, N.P.1    Wall, D.W.2
  • 31
    • 20744451671 scopus 로고    scopus 로고
    • [Online]
    • S. Kral. (2003) The FFTW-GEL Web Page. [Online]. Available: http://www.complang.tuwien.ac.at/skral/fftwgel
    • (2003) The FFTW-GEL Web Page
    • Kral, S.1
  • 34
    • 20744436460 scopus 로고
    • [Online]
    • S. Lamson. (1995) SCIPORT. [Online]. Available: http://www.netlib.org/ scilib/
    • (1995) SCIPORT
    • Lamson, S.1
  • 35
    • 17144371526 scopus 로고    scopus 로고
    • Exploiting supenvord level parallelism with multimedia instruction sets
    • S. Larsen and S. Amarasinghe, "Exploiting supenvord level parallelism with multimedia instruction sets," ACM SIGPLAN Notices, vol. 35, no. 5, pp. 145-156, 2000.
    • (2000) ACM SIGPLAN Notices , vol.35 , Issue.5 , pp. 145-156
    • Larsen, S.1    Amarasinghe, S.2
  • 36
    • 23044527555 scopus 로고    scopus 로고
    • Graph-based code selection techniques for embedded processors
    • R. Leupers and S. Bashford, "Graph-based code selection techniques for embedded processors," ACM Trans. Design Autom. Electron. Syst., vol. 5, no. 4, pp. 794-814, 2000.
    • (2000) ACM Trans. Design Autom. Electron. Syst. , vol.5 , Issue.4 , pp. 794-814
    • Leupers, R.1    Bashford, S.2
  • 37
    • 19344373862 scopus 로고    scopus 로고
    • Ph. D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria
    • J. Lorenz, "Automatic SIMD vectorization," Ph. D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria, 2004.
    • (2004) Automatic SIMD Vectorization
    • Lorenz, J.1
  • 42
    • 20744456847 scopus 로고    scopus 로고
    • [Online]
    • I. Nicholson. (2002) libSIMD. [Online]. Available: http://libsimd. sourceforge.net
    • (2002) LibSIMD
    • Nicholson, I.1
  • 46
    • 0034249157 scopus 로고    scopus 로고
    • A vectorizing compiler for multimedia extensions
    • N. Sreraman and R. Govindarajan, "A vectorizing compiler for multimedia extensions," Int. J. Parallel Program., vol. 28, no. 4, pp. 363-400, 2000.
    • (2000) Int. J. Parallel Program , vol.28 , Issue.4 , pp. 363-400
    • Sreraman, N.1    Govindarajan, R.2
  • 48
    • 0021470572 scopus 로고
    • FFT algorithms for vector computers
    • P. N. Swarztrauber, "FFT algorithms for vector computers," Parallel Comput., vol. 1, pp. 45-63, 1984.
    • (1984) Parallel Comput. , vol.1 , pp. 45-63
    • Swarztrauber, P.N.1
  • 51
    • 0343462141 scopus 로고    scopus 로고
    • Automated empirical optimizations of software and the ATLAS project
    • R. C. Whaley, A. Petitet, and J. J. Dongarra, "Automated empirical optimizations of software and the ATLAS project," Parallel Comput., vol. 27, pp. 3-35, 2001.
    • (2001) Parallel Comput. , vol.27 , pp. 3-35
    • Whaley, R.C.1    Petitet, A.2    Dongarra, J.J.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.