-
1
-
-
0035023971
-
EMMERALD: A fast matrix-matrix multiply using Intel's SSE instructions
-
D. Aberdeen and J. Baxter, "EMMERALD: a fast matrix-matrix multiply using Intel's SSE instructions," Concurrency Comput. Practice Exper., vol. 13, no. 2, pp. 103-119, 2001.
-
(2001)
Concurrency Comput. Practice Exper.
, vol.13
, Issue.2
, pp. 103-119
-
-
Aberdeen, D.1
Baxter, J.2
-
2
-
-
0004072686
-
-
Reading, MA: Addison-Wesley
-
A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986.
-
(1986)
Compilers: Principles, Techniques, and Tools.
-
-
Aho, A.V.1
Sethi, R.2
Ullman, J.D.3
-
4
-
-
20744450544
-
-
Advanced Micro Devices Corp., Sunnyvale, CA
-
AMD Core Math Library (ACML) Manual, Advanced Micro Devices Corp., Sunnyvale, CA, 2000.
-
(2000)
AMD Core Math Library (ACML) Manual
-
-
-
5
-
-
11844297937
-
-
American National Standard Institute (ANSI), New York
-
ANSI, "ISO/IEC 9899:1999(E), Programming Languages - C," American National Standard Institute (ANSI), New York, 1999.
-
(1999)
ISO/IEC 9899:1999(E), Programming Languages - C
-
-
-
6
-
-
20744436942
-
-
vDSP Library. [Online]
-
Apple Computer Inc. (2001) vDSP Library. [Online]. Available: http://developer.apple.com/tml
-
(2001)
-
-
-
7
-
-
0028743437
-
Compiler transformations for high-performance computing
-
D. F. Bacon, S. L. Graham, and O. J. Sharp, "Compiler transformations for high-performance computing," ACM Comput. Surv., vol. 26, pp. 345-420, 1994.
-
(1994)
ACM Comput. Surv.
, vol.26
, pp. 345-420
-
-
Bacon, D.F.1
Graham, S.L.2
Sharp, O.J.3
-
8
-
-
0003003638
-
A study of replacement algorithms for virtual storage computers
-
July
-
L. A. Belady, "A study of replacement algorithms for virtual storage computers," IBM Syst. J., vol. 5, no. 2, pp. 78-101, July 1966.
-
(1966)
IBM Syst. J.
, vol.5
, Issue.2
, pp. 78-101
-
-
Belady, L.A.1
-
9
-
-
0030661485
-
Optimizing matrix multiply using PHIPAC: A portable, high-performance, ANSI C coding methodology
-
J. Bilmes, K. Asanovic, C. W. Chin, and J. Demmel, "Optimizing matrix multiply using PHIPAC: A portable, high-performance, ANSI C coding methodology," in Proc. Int. Conf. Supercomputing, 1997, pp. 340-347.
-
(1997)
Proc. Int. Conf. Supercomputing
, pp. 340-347
-
-
Bilmes, J.1
Asanovic, K.2
Chin, C.W.3
Demmel, J.4
-
10
-
-
20744443909
-
-
[Online]
-
Codeplay Corp. (2002) VECTOR C. [Online]. Available: http://www.codeplay. com
-
(2002)
Vector C
-
-
-
11
-
-
21144437673
-
Wavelet transform for large scale image processing on modern microprocessors
-
J. M. L. M. Palma et al., Eds. Heidelberg, Germany: Springer-Verlag
-
D. Chaver, C. Tenllado, L. Pinjuel, M. Prieto, and F. Tirado et al., "Wavelet transform for large scale image processing on modern microprocessors," in Lecture Notes in Computer Science, VECPAR 2002, J. M. L. M. Palma et al., Eds. Heidelberg, Germany: Springer-Verlag, 2003, vol. 2565, pp. 549-562.
-
(2003)
Lecture Notes in Computer Science, VECPAR 2002
, vol.2565
, pp. 549-562
-
-
Chaver, D.1
Tenllado, C.2
Pinjuel, L.3
Prieto, M.4
Tirado, F.5
-
13
-
-
20744452904
-
Automatic performance tuning for large scale scientific applications
-
Feb.
-
J. Demmel, J. Dongarra, V. Eijkhout, and K. Yelick, "Automatic performance tuning for large scale scientific applications," Proc. IEEE, vol. 93, no. 2, pp. 293-312, Feb. 2005.
-
(2005)
Proc. IEEE
, vol.93
, Issue.2
, pp. 293-312
-
-
Demmel, J.1
Dongarra, J.2
Eijkhout, V.3
Yelick, K.4
-
14
-
-
84948965859
-
The SCC compiler: SWARing at MMX and 3DNow
-
L. Carter and J. Ferrante, Eds.
-
R. J. Fisher and H. G. Dietz, "The SCC compiler: SWARing at MMX and 3DNow," in Proc. 12th Annu. Workshop Languages and Compilers for Parallel Computing, L. Carter and J. Ferrante, Eds., 2000, pp. 399-414.
-
(2000)
Proc. 12th Annu. Workshop Languages and Compilers for Parallel Computing
, pp. 399-414
-
-
Fisher, R.J.1
Dietz, H.G.2
-
17
-
-
19344368498
-
-
Ph.D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria
-
_, "Performance portable short vector transforms," Ph.D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria, 2003.
-
(2003)
Performance Portable Short Vector Transforms
-
-
-
18
-
-
0034848812
-
Architecture independent short vector FFTs
-
F. Franchetti, H. Karner, S. Kral, and C. W. Ueberhuber, "Architecture independent short vector FFTs," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, 2001, pp. 1109-1112.
-
(2001)
Proc. Int. Conf. Acoustics, Speech, and Signal Processing
, vol.2
, pp. 1109-1112
-
-
Franchetti, F.1
Karner, H.2
Kral, S.3
Ueberhuber, C.W.4
-
20
-
-
0141676720
-
Short vector code generation and adaptation for DSP algorithms
-
_, "Short vector code generation and adaptation for DSP algorithms," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP'03), vol. 2, pp. 537-540.
-
Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP'03)
, vol.2
, pp. 537-540
-
-
-
23
-
-
0031636309
-
FFTW: An adaptive software architecture for the FFT
-
M. Frigo and S. G. Johnson, "FFTW: An adaptive software architecture for the FFT," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '98), vol. 3, pp. 1381-1384.
-
Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '98)
, vol.3
, pp. 1381-1384
-
-
Frigo, M.1
Johnson, S.G.2
-
24
-
-
20744449792
-
The design and implementation of FFTW3
-
Feb.
-
_, "The design and implementation of FFTW3," Proc. IEEE, vol. 93, no. 2, pp. 216-231, Feb. 2005.
-
(2005)
Proc. IEEE
, vol.93
, Issue.2
, pp. 216-231
-
-
-
25
-
-
20744448168
-
The power of Belady's algorithm in register allocation for long basic blocks
-
Heidelberg, Germany: Springer-Verlag
-
J. Guo, M. Garzarán, and D. Padua, "The power of Belady's algorithm in register allocation for long basic blocks," in Lecture Notes in Computer Science, Languages and Compilers for Parallel Computing. Heidelberg, Germany: Springer-Verlag, 2004, vol. 2958, pp. 374-390.
-
(2004)
Lecture Notes in Computer Science, Languages and Compilers for Parallel Computing
, vol.2958
, pp. 374-390
-
-
Guo, J.1
Garzarán, M.2
Padua, D.3
-
27
-
-
20744443121
-
-
[Online]
-
_, (2002) Intel C/C++ compiler user's guide. [Online], Available: http://www.ncsa.uiuc.edu/UserInfo/Resources/Software/Intel/Compilers/8.0/c_ug/ index.htm
-
(2002)
Intel C/C++ Compiler User's Guide
-
-
-
28
-
-
20744456099
-
-
Math Kernel Library. [Online]
-
_, (2002) Math kernel library. [Online]. Available: http://www.intel.com/ software/products/mkl
-
(2002)
-
-
-
29
-
-
0025600627
-
A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures
-
J. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri, "A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures," Circuits Syst. Signal Process., vol. 9, no. 4, pp. 449-500, 1990.
-
(1990)
Circuits Syst. Signal Process.
, vol.9
, Issue.4
, pp. 449-500
-
-
Johnson, J.1
Johnson, R.W.2
Rodriguez, D.3
Tolimieri, R.4
-
30
-
-
20744433925
-
Available instruction-level parallelism for superscalar and superpipelined machines
-
Digital Western Res. Lab., Palo Alto, CA
-
N. P. Jouppi and D. W. Wall, "Available instruction-level parallelism for superscalar and superpipelined machines," Digital Western Res. Lab., Palo Alto, CA, WRL Res. Rep. 7, 1989.
-
(1989)
WRL Res. Rep.
, vol.7
-
-
Jouppi, N.P.1
Wall, D.W.2
-
31
-
-
20744451671
-
-
[Online]
-
S. Kral. (2003) The FFTW-GEL Web Page. [Online]. Available: http://www.complang.tuwien.ac.at/skral/fftwgel
-
(2003)
The FFTW-GEL Web Page
-
-
Kral, S.1
-
32
-
-
35048892074
-
SIMD vectorization of straight line FFT code
-
S. Kral, F. Franchetti, J. Lorenz, and C. Ueberhuber, "SIMD vectorization of straight line FFT code," in Proc. Euro-Par'03 Conf. Parallel and Distributed Computing, pp. 251-260.
-
Proc. Euro-Par'03 Conf. Parallel and Distributed Computing
, pp. 251-260
-
-
Kral, S.1
Franchetti, F.2
Lorenz, J.3
Ueberhuber, C.4
-
34
-
-
20744436460
-
-
[Online]
-
S. Lamson. (1995) SCIPORT. [Online]. Available: http://www.netlib.org/ scilib/
-
(1995)
SCIPORT
-
-
Lamson, S.1
-
35
-
-
17144371526
-
Exploiting supenvord level parallelism with multimedia instruction sets
-
S. Larsen and S. Amarasinghe, "Exploiting supenvord level parallelism with multimedia instruction sets," ACM SIGPLAN Notices, vol. 35, no. 5, pp. 145-156, 2000.
-
(2000)
ACM SIGPLAN Notices
, vol.35
, Issue.5
, pp. 145-156
-
-
Larsen, S.1
Amarasinghe, S.2
-
36
-
-
23044527555
-
Graph-based code selection techniques for embedded processors
-
R. Leupers and S. Bashford, "Graph-based code selection techniques for embedded processors," ACM Trans. Design Autom. Electron. Syst., vol. 5, no. 4, pp. 794-814, 2000.
-
(2000)
ACM Trans. Design Autom. Electron. Syst.
, vol.5
, Issue.4
, pp. 794-814
-
-
Leupers, R.1
Bashford, S.2
-
37
-
-
19344373862
-
-
Ph. D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria
-
J. Lorenz, "Automatic SIMD vectorization," Ph. D. dissertation, Inst. Appl. Math. Numer. Anal., Vienna Univ. Technol., Vienna, Austria, 2004.
-
(2004)
Automatic SIMD Vectorization
-
-
Lorenz, J.1
-
38
-
-
0036974959
-
Energy aware compilation for DSP's with SIMD instructions
-
M. Lorenz, L. Wehmeyer, and T. Draeger, "Energy aware compilation for DSP's with SIMD instructions," in Proc. 2002 Joint Conf. Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems (LCTES'02-SCOPES'02), pp. 94-101.
-
Proc. 2002 Joint Conf. Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems (LCTES'02-SCOPES'02)
, pp. 94-101
-
-
Lorenz, M.1
Wehmeyer, L.2
Draeger, T.3
-
39
-
-
19344368072
-
SPIRAL: Code generation for DSP transforms
-
Feb.
-
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, "SPIRAL: code generation for DSP transforms," Proc. IEEE, vol. 93, no. 2, pp. 232-275, Feb. 2005.
-
(2005)
Proc. IEEE
, vol.93
, Issue.2
, pp. 232-275
-
-
Püschel, M.1
Moura, J.M.F.2
Johnson, J.3
Padua, D.4
Veloso, M.5
Singer, B.6
Xiong, J.7
Franchetti, F.8
Gacic, A.9
Voronenko, Y.10
Chen, K.11
Johnson, R.W.12
Rizzolo, N.13
-
41
-
-
0037911190
-
Radix-4 FFT implementation using SIMD multi-media instructions
-
K. Nadehara, T. Miyazaki, and I. Kuroda, "Radix-4 FFT implementation using SIMD multi-media instructions," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '99), pp. 2131-2135.
-
Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '99)
, pp. 2131-2135
-
-
Nadehara, K.1
Miyazaki, T.2
Kuroda, I.3
-
42
-
-
20744456847
-
-
[Online]
-
I. Nicholson. (2002) libSIMD. [Online]. Available: http://libsimd. sourceforge.net
-
(2002)
LibSIMD
-
-
Nicholson, I.1
-
43
-
-
1542396679
-
SPIRAL: A generator for platform - Adapted libraries of signal processing algorithms
-
M. Püschel, B. Singer, J. Xiong, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, and R. W. Johnson, "SPIRAL: A generator for platform - adapted libraries of signal processing algorithms," J. High Perform. Comput. Appl. (Special Issue on Automatic Performance Tuning), vol. 18, pp. 21-45, 2004.
-
(2004)
J. High Perform. Comput. Appl. (Special Issue on Automatic Performance Tuning)
, vol.18
, pp. 21-45
-
-
Püschel, M.1
Singer, B.2
Xiong, J.3
Moura, J.M.F.4
Johnson, J.5
Padua, D.6
Veloso, M.7
Johnson, R.W.8
-
44
-
-
35048817424
-
A preliminary study on the vectorization of multimedia applications for multimedia extensions
-
G. Ren, P. Wu, and D. A. Padua, "A preliminary study on the vectorization of multimedia applications for multimedia extensions," in Proc. Int. Workshop Languages and Compilers for Parallel Computing, 2003, pp. 420-435.
-
(2003)
Proc. Int. Workshop Languages and Compilers for Parallel Computing
, pp. 420-435
-
-
Ren, G.1
Wu, P.2
Padua, D.A.3
-
45
-
-
0036299697
-
A radix-2 FFT algorithm for modern single instruction multiple data (SIMD) architectures
-
P. Rodriguez, "A radix-2 FFT algorithm for modern single instruction multiple data (SIMD) architectures," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '02), vol. 3, 2002, pp. III-3220-III-3223.
-
(2002)
Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP '02)
, vol.3
-
-
Rodriguez, P.1
-
46
-
-
0034249157
-
A vectorizing compiler for multimedia extensions
-
N. Sreraman and R. Govindarajan, "A vectorizing compiler for multimedia extensions," Int. J. Parallel Program., vol. 28, no. 4, pp. 363-400, 2000.
-
(2000)
Int. J. Parallel Program
, vol.28
, Issue.4
, pp. 363-400
-
-
Sreraman, N.1
Govindarajan, R.2
-
48
-
-
0021470572
-
FFT algorithms for vector computers
-
P. N. Swarztrauber, "FFT algorithms for vector computers," Parallel Comput., vol. 1, pp. 45-63, 1984.
-
(1984)
Parallel Comput.
, vol.1
, pp. 45-63
-
-
Swarztrauber, P.N.1
-
51
-
-
0343462141
-
Automated empirical optimizations of software and the ATLAS project
-
R. C. Whaley, A. Petitet, and J. J. Dongarra, "Automated empirical optimizations of software and the ATLAS project," Parallel Comput., vol. 27, pp. 3-35, 2001.
-
(2001)
Parallel Comput.
, vol.27
, pp. 3-35
-
-
Whaley, R.C.1
Petitet, A.2
Dongarra, J.J.3
-
53
-
-
0034826555
-
SPL: A language and compiler for DSP algorithms
-
J. Xiong, J. Johnson, R. Johnson, and D. Padua, "SPL: A language and compiler for DSP algorithms," in Proc. Conf. Programming Languages Design and Implementation (PLDI), 2001, pp. 298-308.
-
(2001)
Proc. Conf. Programming Languages Design and Implementation (PLDI)
, pp. 298-308
-
-
Xiong, J.1
Johnson, J.2
Johnson, R.3
Padua, D.4
|